Solr and ElasticSearch demo and speaker feb 2014
-
Upload
nkabra -
Category
Data & Analytics
-
view
154 -
download
1
Transcript of Solr and ElasticSearch demo and speaker feb 2014
Distributed Database Architecture
Search and Indexing
Nick Kabra
Distributed Database Architecture 1
Presentation AgendaTeam Introduction
Basics and History
Use Cases & Current Usage
Highlights
Appendix
DISCLAIMER: This is a knowledge-sharing session and not a recommendation for any specific technology / product
From the web
Migration
Distributed Database Architecture 2
Team Introduction
Name: Designation: Experience with Search and Indexing:How long have you been working with Solr or ElasticSearch:
Distributed Database Architecture 3
Basics
1
2
3
4
• Used for Indexing and Searching • Built on top of Lucene API
• Solr and ES take Lucene API and build features on top. API accessed through web server
• Smaller version of Google which has indexed and ranked the web pages
Search platform for Web sites. Search platform for organization.
• Lucene – search engine packaged together in set of jar files
Distributed Database Architecture 4
History
• Differences in design and architecture.
Distributed Database Architecture 5
ES was released in 2010. Additional features.
Solr released in 2008.
Key Players: Solr and ElasticSearch
1
2
3
Latest Version= Solr 4.6.1 released on Jan 28, 2014
Collection – Main logical structure for Solr
Index – Main logical structure for ES
Architecture• Distributed• Fault tolerant and auto
replicas• Coord: Only ElasticSearch
nodes + zen discovery. Split brain.
• Single leader• Automatic leader election
Solr ElasticSearch (ES)
Latest Version= ElasticSearch1.0.0 released on Feb 12, 2014
Architecture• Distributed• Fault tolerant and auto
replicas• Coord: Apache Solr +
ZooKeeper ensemble. So quorum
• Leader per shard• Automatic leader election
Distributed Database Architecture 6
Resume recommendations
Use
Cas
e1
Challenge• Company ABC helps other firms hire skilled developers, project
managers. Empower customers to find the right job candidate from a database of 8 million profiles.
• Need fast and predictable performance.• Include geo-spatial.
Success• Customer hires using the company ABC.• ABC stores searches made by customers.• Identify candidates, skills, compensation structure to
enhance the customer search experience with better matches.
• Make recommendations to customers on salaries, future market needs etc.
• Eliminate duplicate profiles with realtime indexing and percolation.
• Provides enhanced customers experience, faster responses
Opportunity• Use ES as the search engine with realtime indexing
and nested querying.Point
Distributed Database Architecture 7
Integration - Use Case 2
THE FULL
CIRCLE
KibanaVisualization engine for dynamic dashboards created in real-time or on-the-fly
ElasticSearchSearch, analyze in realtime
LogstashTake logs, scrub, parse and enrich the data
Distributed Database Architecture 8
Chatagent for 460 million documents – Use Case 3
9
Challenge6,000 customers from around the world use LiveChat daily to communicate with their customers from one person owned businesses to international organizations like LG, Apple, Adobe etc.LiveChat customers conduct 3.6 million queries and 220 million “get” operations per day on 460 million documents. LiveChat keeps these documents updated with 70 million indexing operations every day.
Solution
Advantage
• Reduce query time from 2 seconds to 100 ms• Streamline updating from hours to seconds• Guarantee maximum uptime• Scale to meet the needs of 6,000 customers• Store and search on 460 million documents• Process 3.6 million queries per day
• Scalability, indexing, Full text search allows users to search through chat archives• Faceting makes it possible to pull various statistics for LiveChat clients.• ES acts as single datastore, data updates available immediately - Now each of the documents is updated in LiveChat on an average of 20 to
30 times every 20 to 60 seconds.
Distributed Database Architecture
Current Uses
1
2
3
4
• Use Case 1
• Use Case 2
• Use Case 4
• Use Case 3
x• Use Case X
10Distributed Database Architecture
Highlights
Schema and config –Solrconfig.xml, es.yml – change no. of shards and replicas live
Scaling - nodes autobalanced,/ Solr -3755 or shard splitting /add a document
Nesting (address, users & rights, boolean, parent children)
Index=different types of documents and analyzer
Point Node discovery and fault discovery. Zookeeper
PointMultiple documents per schemaand parent-child
PointPercolator
PointAggregation+facets in ES /Facets in Solr
Distributed Database Architecture 11
Highlights (contd. 2)
Auto-load balancer and auto-sharding
Marvel metrics on 03/13/2014
Brain Split problem in ES
Structured queryDSL and query control
Real-time indexing /near real-time indexing
Query routing and Solr 5816 to be introduced
1
2
3
4
5
6
Distributed Database Architecture 12
ElasticSearch / Solr funnel
UIMA
Text analysis debugger, spell check
Decision tree faceting / Drilldown
Cloudera, Mapr, DataStaxsupport Solr
Filters for queries across nested documents
Query handling analyzer and language, term suggester,autocomplete
Realtime GET with query routing
Hortonworks, Couchbasesupport ElasticSearch
Distributed Database Architecture 13
FROM THE WEB
Web CPAThis is only an FYI: Found some customers moving from Solr to ElasticSearch but could not find any article which mentioned that clients moved from ES to Solr.Caveat: No prejudice but it would be good to hear what customers say.
Let us also check this site: http://www.ymc.ch/en/why-we-chose-solr-4-0-instead-of-elasticsearch
http://www.mgt-commerce.com/magento-elasticsearch.html
Foursquare= http://engineering.foursquare.com/2012/08/09/foursquare-now-uses-elastic-search-and-on-a-related-note-slashem-also-works-with-elastic-search/Jetwick= http://karussell.wordpress.com/2011/02/07/why-jetwick-moved-from-solr-to-elasticsearch/Netricos= http://www.netricos.com/blog/posts/how-we-are-using-elastic-searchStumbleupon = http://www.elasticsearch.org/case-study/stumbleupon/UK govt. site= https://gds.blog.gov.uk/2012/08/03/from-solr-to-elasticsearch/Wikimedia= http://thenextweb.com/insider/2014/01/06/wikimedia-will-replace-search-elasticsearch-beta-users-february-users-march-april/#!xDKnd
Distributed Database Architecture 14
2 Parts of a whole – The Math
Solr performs very well on small indexes that don’t change very often1
Scalability, auto-sharding, GUI admin, schemaless, real-time, nested queries, routing and the way indexing and queries are handled which provide faster execution of queries and better indexing provide a distinct advantage to using ES
2
Solr
ElasticSearch
Distributed Database Architecture 15
Migration
Step 1Use river plugin to migrate
from existing Solr to ES.
Step 2Pulls the content from
existing Solr cluster and index it in ES
Step 3When you decide to switch to
Elasticsearch permanently, you would obviously switch your indexing to directly index content from your
sources to Elasticsearch. Keeping Solrin the middle is not a recommended
setup.
Distributed Database Architecture 16
If we have a small site and need search features without the distributed bells-and-whistles, both Solr and ElasticSearch are efficient
If we are planning a large installation that requires running distributed search with nesting, scalability, sharding, real-time ElasticSearch can do a better job.
Conclusion
Distributed Database Architecture 17
Both products trying to catch-up based on other product’s capabilities
Where do we go from here ?---------------------------------------The best way to define this is: Some possible next steps….
Question to ask
Distributed Database Architecture 18
Thank you!
201-925-0488
Architecture – Global Head
Distributed Database Architecture 19