Post on 15-Jan-2015
description
Solr @ eBay Kleinanzeigen
Olaf Zschiedrich, eBay Classifieds Group ozschiedrich@ebay-kleinanzeigen.de, 5/25/2011
Who I am? ! Olaf Zschiedrich ! eBay Classifieds Group ! Head of Technology @ eBay Kleinanzeigen ! Area of expertise/interest:
• High traffic web-applications • Agile development • Java/JEE • Search technologies
3
Agenda ! About eBay Classifieds Group/ebay Kleinanzeigen ! Metrics & Traffic Numbers ! Why Solr? ! Solr Features in Action ! Data Indexing ! Solr in Production ! Best Practices ! Problems ! Outlook ! Questions
4
About eBay Classifieds Group
5
About eBay Classifieds Group
online classifieds company in the world
6
About eBay Kleinanzeigen ! Typilcal classifieds ad platform (horizontal, local trading)
! Launched 2009 after 4 months of development ! Small agile team (using Scrum)
• 12-15 people total • 5-7 developers
! Leverages open source (Spring, Solr, MySQL, ActiveMQ)
! Applications: • Public website • Customer support tool • API (Rest supporting JSON and XML) • Iphone App (~ 250.000 installations) • Facebook App
7
Metrics & Traffic Numbers ! Site metrics:
• ~ 3.2 M active ads • 16 – 24 M PVs per day • Peak hours = 1.8 M PVs (~ 500 PVs per second)
! Solr request metrics: • ~ 60 M requests per day • Peak hours = ~ 1500 request per second
! Avg. response time • 20 ms (search) and 3 ms for auto-suggest
Site is rapidly growing !!!
8
Why Solr ! Open Source ! Good documentation / big community ! Java-based (the language we know/use)
! Widely used (especially lucene)
! Based on lucene (de-facto standard for full text search in java)
! Feature-rich (including enterprise features)
! Extensible (e.g. easy implementation of own tokenizers)
! Easy to integrate (HTTP, SolrJ client)
! Easy to setup (java web application)
Most promising option we looked at. Due to very aggressive timelines no time consuming research was possible!
9
Solr Features in Action ! Faceting ! Language specific stemming ! More Like This ! Auto-Suggest based on TermComponent ! Spellchecking ! Synonyms ! Stopwords ! Dynamic fields
10
Data Indexing ! Use of Delta Import Handler ! Delta import runs every 10 minutes ! Full import only done in case schema
change requires full index rebuild ! Index optimized once a day
11
MySQL Slave
Solr Master
Solr Slave
JDBC
Delta Import Handler
Solr Slave Solr Slave
HTTP / REST API Replication Handler
Solr In Production ! 2 datacenters ! 1 Master + 6 Slaves per datacenter
Slaves show very low resource consumption. Could go down to 4 slaves per datacenter while still having 50% overcapacity
! Master only used for indexing ! Load balancer in front of slaves ! Varnish in front of slaves (for dedicated use cases)
! Working closely with SITE-OPS Team ! DEV-OPS are part of development process
12
Solr 3.1 in Production ! Solr 3.1 productive since mid of May ! Not plug and play. Needs migration path as:
• Index format has changed • Java-bin format has changed
! Two major problems: • Bug in spellchecker (SOLR-2462)
Leads to infinite GC loops
• Bug in replication handler (SOLR-2469) Leads to growing disk usage as old index files are not removed is case “replicateAfter=startup” is used.
13
Best Practises ! Use solr cores right from the beginning
Allows you to run mutiple indexes on one box in dev and distribute indexes to mutiple boxes in production
! Use filter queries ! Use caching (FieldCache, QueryCache, Web Proxy Cache e.g. Varnish or Squid)
! Tune JVM properly ! Build search-layer hiding the usage of solr
SearchCommand cmd = new SearchCommand(); cmd.setKeywords(“BMW 323“); ... SearchResult result = searchService.searchActiveAds(cmd); List<Ad> ads = result.getAds();"
! Create a QueryBuilder to ease query building SolrQueryBuilder sqb = new SolrQueryBuilder(); sqb = sqb.freetext("freetext", "BMW").and().in("color", "RED", "BLACK“); sqb = sqb.and().not().eq("fuel_type", "GAS").and().lt(“price“, "10000"); ... String query = sqb.build(); (Just an example. Normally filter queries should be used for a query like this!)
14
Problems ! Distance search including sorting
• Not supported in previous Solr versions • LocalSolr
not working with Solr 1.4 final, GC issues, performance issues • Solution:
Got rid of sort by distance. Implemented own distance search based on bounding boxes and simple range queries.
• Solved in 3.1
! Real time updates ! Deep paging large result sets (SOLR-1726)
15
Outlook / Future Plans ! Migrate further applications to Solr
Most batch-jobs and customer support tool search against db which is getting slower due to growth of data.
! Evaluate new features of Solr 3.1 • Spatial/distance search • New auto-suggest component • Extended dismax query parser
16
Questions ?
17
Contact ! Olaf Zschiedrich
• ozschiedrich@ebay-kleinanzeigen.de • ozschiedrich@ebay.com • www.ebay-kleinanzeigen.de
18