Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr...
Transcript of Sitecore and Solr - Meetupfiles.meetup.com/10427732/Sitecore and Solr.pdf · • Sitecore and Solr...
Sitecore and SolrIan Mariano and Steven Zhao
NorthPoint Digital
Agenda• Who We Are • Solr Overview • Sitecore and Solr
• Setup • Nuances • Use Cases and Demos
• Scaling Solr • Q & A
Who We Are
NorthPoint
• Based in NYC with offices in BOS and PHL • Agile solutions for Financial and Digital markets • Digital
• Delivering scalable content solutions • Focus on business outcomes and solution platforms • Technology Agnostic • Open Source, Java/Mobile, .NET, Big Data practices
We Lead with Experience
NorthPointSome of Our Clients
NorthPointIan Mariano
Steven Zhao
Project Manager - Digital
Senior Consultant - Digital
20+ years operating in the intersection of technology and man Colleague of instigators, storytellers and purveyors of fine design
Sitecore/.NET developer by day and rookie dad by night Sometimes it is the other way around
@ianmariano - [email protected]
@stevenzhaonps - [email protected]
Solr Overview• Open source search based on Apache Lucene
• Extensible schema • Expanded query language • Faceted search and filtering • Extensible caching • Highly scalable and available • Index external data sources • Expanded update formats • Rich document processing • Multiple search collections
http://lucene.apache.org/solr/
Sitecore and Solr
Sitecore Solr Setup
• Setup Solr • Download and configure • Local / Jetty for development • Use Tomcat / Glassfish / other servlet container for production
• Use dedicated servers (instances) • Think about security
• Create an initial itembuckets collection for Sitecore native
Sitecore Solr Setup
Solr Setup Demo
Sitecore Solr Setup
• Setup Sitecore to Use Solr • Download the appropriate Solr support package from SDN • Generate and install your schema.xml into the itembuckets collection • Configure the Solr endpoint • If needed, choose your IOC container • Re-Index
Sitecore Solr Setup
Sitecore Solr Setup
Sitecore Solr Configuration Demo
Sitecore Solr Nuances
• Item buckets and content editor search • What about missing field(s) from query results? • What are dynamic fields? • What are computed fields?
Sitecore Solr Use Cases• General search
• Faceting • Autocomplete • Boosting
• Search external sites • File crawling • Big data / external data
Sitecore/Solr General Search• Searching can be across any specific field • If no fields are provided, then the default field is used (configurable) • General queries • Pagination • Sorting • Filtering • Return only specific fields
Sitecore/Solr General SearchQuerying!!q=stent!!SolrQueryResults<Content> r = solr.Query(new SolrQuery(“stent”)); !q=title:stent AND summary:aorta!!SolrQueryResults<Content> r = solr.Query( new SolrQueryByField(“title”, “stent”) && new SolrQueryByField(“summary”, “aorta”)); !q=title:stent OR title:aorta!!SolrQueryResults<Content> r = solr.Query( new SolrQueryByField(“title”, “stent”) || new SolrQueryByField(“summary”, “aorta”));
Sitecore/Solr General SearchPagination!!start=x&rows=y (zero-based)!!new QueryOptions { Start = x, Rows = y } !Sorting!!sort=field1 asc, field2 asc, …!!queryOptions.AddOrder(new SortOrder(“field”, Order.ASC));
Sitecore/Solr General SearchFiltering!!fq=type:news&fq:category:us!!queryOptions.FilterQueries = new ISolrQuery[] { new SolrQueryByField(“type”, “news”), new SolrQueryByField(“category”, “us”) }; !fq=+type:news +category:us!!queryOptions.FilterQueries = new ISolrQuery[] { new SolrQueryByField(“type”, “news”) && new SolrQueryByField(“category”, “us”) };
Sitecore/Solr General SearchReturning Specific Fields!!fl=id title summary!!new QueryOptions { Filter = new[] { “id”, “title”, “summary” } } !fl=* score!!new QueryOptions { Filter = new [] { “*”, “score” }}
Solr Faceted Search• Directed search like Amazon / Zappos • Have a faceting strategy in line with your content strategy • Facet by
• Field value • Field range • Subqueries
Solr Faceted Search• Field value (content containing Lincoln faceting on categories and author counts)
• q=lincoln • facet=true • facet.field=category • facet.field=author • facet.mincount=1 !
ISolrOperations<President> solr = ServiceLocator.Current.GetInstance<ISolrOperations<President>>(); SolrQueryResults<President> results = solr.Query ( new SolrQuery("lincoln"), new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetFieldQuery("category"), new SolrFacetFieldQuery("author") }, MinCount = 1 } } );
Solr Faceted Search• Field range (matching products faceting on price ranging from 0 to 1000)
• q=headphones • facet=true • facet.range=price • facet.range.start=0 • facet.range.end=1000 !
ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( new SolrQuery("headphones"), new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetQuery( new SolrQueryByRange<decimal>("price", 0m, 1000m) ) }, MinCount = 1 } } );
Solr Faceted Search• Subqueries (matching products faceting on specific price ranges)
• q=headphones • facet=true • facet.query=price:[0 TO 99] • facet.query=price:[100 TO 199] • facet.query=price:[200 TO *] !
ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( new SolrQuery("headphones"), new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetQuery( new SolrQueryByRange<decimal>("price", 0m, 99m) ), new SolrFacetQuery( new SolrQueryByRange<decimal>("price", 100m, 199m) ), new SolrFacetQuery( new SolrQueryByRange<string>("price", "200", "*") ) }, MinCount = 1 } } );
Solr Autocomplete
• Via faceting or limited fields • Define a field type for autocompletion
• Choose a tokenizer (whitespace) • Filter to lowercase to normalize queries • Filter using EdgeNGramFilterFactory to match word beginnings
Solr Autocomplete• Example: autocomplete list for a category facet
• q=*:* • rows=0 • facet=true • facet.field=category • facet.mincount=1 • facet.limit=5 • facet.prefix=home !
ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( SolrQuery.All, new QueryOptions { Facet = new FacetParameters { Queries = new[] { new SolrFacetFieldQuery("category"), }, MinCount = 1, Limit = 5, Prefix = "home" }, Rows = 0 } );
Solr Autocomplete• Example: autocomplete list for a category field
• q=category:*headph* • rows=0
ISolrOperations<Product> solr = ServiceLocator.Current.GetInstance<ISolrOperations<Product>>(); SolrQueryResults<Product> results = solr.Query ( new SolrQueryByField(category, "*" + searchTerm + "*") );• For the UX
• JQueryUI or alternate autocomplete facility • AJAX call to web service that executes the query (direct or proxied?)
Solr Search Boosting• Why would you?
• Content / marketing strategy • Tailored user search results
• Why not? • Skewed results !
• Boost Using • Queries: q, dismax, or edismax • Boost functions in queries • Or static boosting (solrconfig.xml)
• Document elevation • Static boosting request handler
• Simple “q” boosting
• Boosts are added to total scoring
• q=features:video^10+text:video^2SolrQueryResults<Product> results = solr.Query ( new SolrQuery(“features:video”).Boost(10) + new SolrQuery(“text:video").Boost(2) );
Solr Search Boosting
• dismax (disjunction max) • search is executed across multiple fields with different relevance weights • The maximum score across these is added to the score - more control over ranking
• q=video • defType=dismax • qf=features^10 text^2 SolrQueryResults<College> results = solr.Query ( new SolrQuery("new york"), new QueryOptions { ExtraParams = new Dictionary<string, string> { {"qt", "dismax" }, {"qf", “features^10 text^2" } } } );
Solr Search Boosting
• edismax (extended disjunction max - more features like full lucene parser, and/or, not, …) • q=video OR streaming • defType=edismax • qf=features^20 text^2 • bq:category:portable^5 SolrQueryResults<College> results = solr.Query ( new SolrQuery(“video OR streaming"), new QueryOptions { ExtraParams = new Dictionary<string, string> { {"qt", "edismax" }, {"qf", “features^20 text^2" }, {"bq", “category:portable^5" } } } );
Solr Search Boosting
Sitecore Solr Use Cases• General search
• Faceting • Autocomplete • Boosting
• Search external sites • File crawling • Big data / external data
Search External Sites• Search of other web properties you own • Search partner web properties • Shared Solr publish (Other CMS's Publish Indexes) • Crawling External Sites (Like a Search Engine)
• Custom scheduled Sitecore crawler • FileDataSource / HttpDataSource (DataImportHandler) • Nutch
Solr File Crawling
• Got a File Repository? • Powered by Apache Tika • Push files via POST to Solr
• Enable Solr extracting request handler (solrconfig.xml) • File List Entity Processor (DataImportHandler)
• REST API /solr/collection/dataimport?command=… • Use nutch
Big Data / External Data
• Expose searchable data to other applications • Push to Solr
• REST API • Pull from Solr
• DataImportHandler
Big Data / External Data
DataImportHandler Demo
Scaling Solr
Scaling Solr
• Debug Your Queries • debug=true • debug=timing • debug=query • debug=results !
• Cache Configuration / When Not To Cache • Some filters aren’t good cache candidates (full dates with seconds, spatial) • fq={!cache=false}date:… • fq={!cache=false}location:…
General Tuning
Scaling Solr• High Availability • Replication
• Core vs. Collection • Core is instance • Collection spans Cores
• Sharding • Automatic vs. Custom • Better to split as they grow
SolrCloud
Scaling Solr
• Distributed Queries • Can limit to specific shards
• shards=host1:port,host2:port • Limitations
• Grouping component’s group.truncate & group.func are not supported • Unique key must be unique across shards • Elevation not supported • More like this not supported • Changed documents may yield false positive matches
SolrCloud
Q & A
Thank You!NorthPoint
Ian Mariano Steven Zhao
http://www.northps.com @northps
[email protected] @ianmariano
[email protected] @stevenzhaonps
LinksSDN!
http://sdn.sitecore.net
Nutch!http://nutch.apache.org
Solr!http://lucene.apache.org/solr http://wiki.apache.org/solr/
Solr 4 Cookbook!http://www.ebooks-it.net/ebook/apache-solr-4-cookbook
NYC Open Data!https://nycopendata.socrata.com
US Open Data!http://www.data.gov