≡
Ø
«
»
0 : Searching Drupal
1 : Searching Drupal
2 : But first.. a story
3 : A story, continued
4 : A story, continued
5 : A story, continued
6 : A story, continued
7 : A story, continued
8 : Moral of the story
9 : Enough already
10 : Two parts of search
11 : Two parts of search: Internal
12 : Two parts of search: External
13 : Two parts of search: Good news!
14 : Search Basics
15 : Internal Search Basics
16 : External Search Basics
17 : Internal Search - Tip
18 : Internal Search - Accuracy
19 : Internal Search - Configuration
20 : Internal Search - Improving Config
21 : Internal Search - Step Forward
22 : Internal Search - Target.com
23 : Internal Search - Target.com
24 : Internal Search - Problems
25 : Internal Search - Solution 1
26 : Internal Search - Problems
27 : Internal Search - Problems
28 : Internal Search - Solution 2
29 : Internal Search - LuceneAPI
30 : Internal Search - Two bugs to note
31 : LuceneAPI Installation
32 : LuceneAPI Configuration
33 : LuceneAPI Content Settings
34 : LuceneAPI Content Settings 2
35 : LuceneAPI Content Settings 3
36 : LuceneAPI Index
37 : LuceneAPI Content Settings 4
38 : LuceneAPI Content Settings 5
39 : LuceneAPI Content Settings 6
40 : LuceneAPI r@wks!
41 : Internal Search - Solution 3
42 : Internal Search - ApacheSolr
43 : External Search
44 : External Search - Yay Drupal
45 : External Search - Analytics
46 : Summary
47 : TTFN
Searching Drupal
Damien McKenna
Mc-Kenna.com
&
Bluespark Labs
Twitter:
DamienMcKenna
Searching Drupal
Taken for granted
Assumption that "it'll just work"
But first.. a story
Hired by Bonnier for 3 month Drupal 5 project
Migrate from proprietary Java CMS
Short development cycle
Made compromises
Made assumptions
A story, continued
Grand assumption..
"Search will work good enough"
"Tweak later"
A story, continued
Put another way...
"Search will work"
A story, continued
Launched site
Seemed OK, could find results
Complaints of search missing content
Total 57,000 nodes - articles, images, etc
Only 7,000 nodes indexed
A story, continued
D5 search engine indexing flawed
Indexing tracks last timestamp, last nid, last comment timestamp...
If data converted, strong chance of missing some
Out of 57,000 nodes..
Only indexed about
7,000
!
A story, continued
Crazy solution...
Use Drupal 6's engine
!
Track each node individually
Recommended for all D5 sites!
Moral of the story
Don't assume it'll work!
Take an hour,
a few small tweaks go a long way
Enough already
On with the show..
Two parts of search
Internal
External
Two parts of search: Internal
Search when already on the site
Note: Only node content
Two parts of search: External
Search from outside
Google, Bing, yadda
Two parts of search: Good news!
Search Basics
Logical content hierarchy
Body structure - h1, h2, h3, etc
Lots of fiddly bits
Drupal SEO book
Internal Search Basics
search.module
Title
field most important
Each node element given different weight
Only
node fields considered
External Search Basics
Page title super important
Considers everything on page:
Views
Blocks
etc
Lots of trick
Internal Search - Tip
Put most important words in node
Title
SkiNet.com products
Search for "k2 skis" - no results
Title
field has ski model name
Word "ski" nowhere to be found
Should be: "[make] [model] ski"
e.g. "K2 Apache Recon ski"
Internal Search - Accuracy
"ski" vs "skis" vs "skiing"
Porter Stemmer module
Breaks search terms down to root form
e.g. "skis" becomes "ski"
Internal Search - Configuration
admin/settings/search
Number to index at a time
Minimum word length
Content weighting
Internal Search - Improving Config
Search Config module
Control Advanced Search fields
Hide vocabs filters
Hide content types
Disable indexing content types
Works pretty well
Internal Search - Step Forward
Faceted Classification
Each content type field selectable
e.g. product color, book publication date, etc
Becoming defacto standard..
Internal Search - Target.com
Internal Search - Target.com
Internal Search - Problems
Limited control on search
Won't work:
apple AND orange
apple OR orange
apple AND (orange OR banana)
No facets
Internal Search - Solution 1
Faceted Search module
Meh
Internal Search - Problems
Internal Search - Problems
Faceted Search module very database intensive
Very slow
Solution:
Separate search to external system
Lots of options...
Google CSE
Sphinx
Internal Search - Solution 2
LuceneAPI module
Tremendous power
Simple to install
All PHP, no crazy extras
Best option for most sites
Internal Search - LuceneAPI
Sorting
Facets
CCK fields
Content type
Taxonomy
"More like this"
@cpliakas
is awesome!
Internal Search - Two bugs to note
"Hide core search"
/search error
core search block error
LuceneAPI Installation
Download modules
luceneapi-lib-6.x-2.0.tar.gz
Inside luceneapi directory
Look for 'lib' directory
LuceneAPI Configuration
Replace search box
Minimum word length
Words to ignore
Error logging
Advanced: file permissions
LuceneAPI Content Settings
admin/settings/luceneapi_node
Results per page
Default: AND vs OR
Tab name
Hide core search (!!!)
Exclude content types
Node access
Language support
LuceneAPI Content Settings 2
Performance tab
"Optimize" button
Optimize after cron runs
Caching
Caching threshold
Cache max size
Number to index
Memory limits
LuceneAPI Content Settings 3
Content bias!!!
Change importance level for
Body, title, author, terms, comment text, HTML tags, sticky, promoted, content type
Good stuff!!
No craziness
LuceneAPI Index
LuceneAPI Content Settings 4
Facets..
Vocabularies
Author
Content type
Display order
Tweaks
LuceneAPI Content Settings 5
More Like This..
Number of items
Word length
Fields to work from
Exclude content types
LuceneAPI Content Settings 6
Did You Mean..
admin/settings/luceneapi_dym
Performance settings
LuceneAPI r@wks!
LuceneAPI module
Use it
It's awesome
@cpliakas
is awesome!
Give him a hand
Internal Search - Solution 3
Apache Solr module
Dries uses it!
Acquia uses it!
Drupal.org uses it!
Internal Search - ApacheSolr
Lucene in Java
Separate to another server
Keep Java developers employed ;-)
Lots of what LuceneAPI has
Requires more infrastructure
Only for VPS / own server(s)
External Search
Google, Yahoo, etc
SEO is king
Standard practices
SEO Checklist
External Search - Yay Drupal
Friendly URLs by default
PathAuto module
- automate URL gen
MetaTags/Nodewords module
- keywords, desc
XMLSiteMap module
- notify engines
SiteMap module
- automated site map
External Search - Analytics
Google Analytics module
Omniture module
Quantcast
Summary
Drupal 6 engine
Title is king
Just use
Porter Stemmer module
Just use
LuceneAPI module
!
Large
Apache Solr module
SEO Checklist
Drupal SEO book
Drupal -
yay
!
TTFN