24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
SilverStripe and
Full Text SearchGiving the people what they want
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
What we’re covering
• What does search give you
• Three ways to get it
• Built in db backed search
• Sphinx module
• Full text search module
Big topic, not much time
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
What we’re not
• Search result visualization
• Search refinement
• Boost, result pre-calculation, faceting, spell checking, real-
time results
• Integrating search with IA
• Measuring search usefulness
• 3rd party modules
But that doesn’t mean they’re not important
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Why add search?
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
What are you trying to do?
• Most people use navigation by preference
• Stats depend on site, but average 70-95% navigation
• Search is primarily used to locate stuff that’s not obvious
how to navigate to
• Deeply nested pages
• Cross-cutting information not provided as an taxonomic structure
• Re-discovering remembered items
• If search doesn’t give immediate results, users fall back to
navigation again
Be aware of the goals of your users
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Getting you there quicker
• Interesting is relative
• Ideally return the page the user is after
• But failing that, at least return a page the user is interested in
• Speed is perception
• Raw speed is rarely noticed (except when it is)
• Ability to understand results is as important as accuracy of results
• A second click is OK, as long as there’s a likely payoff: “did you mean” is fine,
disambiguation is OK, paging is useless
To be used, search has to give interesting pages faster than navigation
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Technology & Tools
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Database internal full text search
• Most databases come with some full text search built in
• Generally work by adding new indexes to a table column
• Can easily combine full text queries with other filters
• But databases aren’t really designed for it
• Poor query language - no booleans
• Poor language processing
• Limited feature set - no field boost, spell checking, search suggestions,
faceting, result fragments, ....
• Sometime costly technically (MyISAM)
It’s just another index
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
External full text indexers
• Given a schema, and a set of documents, builds an index
• Schema gives both text processing and result relevancy rules
• Different engines either retrieve documents themselves or have documents
sent to them
• Indexes might be write-once (rebuild entire index to add changes)
• Gives a language to query those indexes
• Generally query language is engine-specific
Solr, Sphinx, Elastic Search
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
External engines + SilverStripe
• Building schemas is hard, time consuming, annoying when
model changes
• Can build schemas directly off models
• Effectively free - all the necessary information is already present
• Flexible search - can change form structure without index changes
• Inefficient - includes information you won’t search against
• Or can build schemas off query design
• Needs more though around design of query up front
• More efficient, leads to some more powerful abilities
A tale of two abstractions
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
SilverStripe Integration
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Built-in search
✓No external dependancies, separate indexes, schema files or
setup
- Can only search SiteTree and File objects, and only specific
fields
- Quality of results is heavily database dependent
Your database-dependent, barely acceptable default
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Sphinx module
✓ Very little configuration gives great results on moderate
sized sites
✓ Can search any DataObject, but...
- Combining search over multiple DataObjects doesn’t really
work
- Limited real-time update support
- No exact match string mode makes filtering tricky
Easy, quality full text search
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Fulltext search module
✓ Schemas generated from query structure More flexible and efficient than generating from model structure Closer to how external engines work natively
✓ Eventually multiple search backend support Currently: Solr In future: Sphinx, Elastic Search, Zend_Lucene Not intended to allow code-less swapping of backends.
- Currently needs Solr, which is a Java app Loves memory, hates empty disk space
Powerful (eventually) search engine independent toolkit
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Full text search module example
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Define an indexSchema gets generated from this index
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Define a formStandard SilverStripe stuff
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Build a query & apply to an indexFilter and excludes can be build & nested
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Final thoughts
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Search without searching
• Looks like navigation, acts like search
• Instant taxonomies
• Deal with inconsistent data
• Encourages exploration
Search engines as fuzzy matchers
Wednesday, 24 August 2011
24 August, 2011 • SilverStripe Wellington Meetup • Hamish Friedlander
Links
• https://github.com/silverstripe/silverstripe-sphinx
• https://github.com/silverstripe-labs/silverstripe-fulltextsearch
• http://sphinxsearch.com/
• http://lucene.apache.org/solr/
• http://www.elasticsearch.org/
• https://github.com/nyeholt/silverstripe-solr
• http://code.google.com/p/lucene-silverstripe-plugin/
Modules I’ve covered + some other stuff
Wednesday, 24 August 2011
Thank you!
24 August, 2011 • SilverStripe Wellington Meetup •
Hamish Friedlander
Twitter: @hafriedlander
Email: [email protected]
Wednesday, 24 August 2011
Top Related