Elasticsearch from the trenches
-
Upload
jai-jones -
Category
Technology
-
view
92 -
download
1
Transcript of Elasticsearch from the trenches
elasticsearchelasticsearchfrom the trenchesfrom the trenches
about meabout me
solution architect at slalomenjoy building search apps7+ years Lucene2+ years Hibernate Search~2 years Elasticsearch
agendaagenda
the askinitial approachproblemsnext stepslessons learnedimprovementsquestions
the askthe ask
search 6 billions docs in under 1.5 secindex 2 millions new docs / dayexport billions of docs to CSV filesindex and search docs in realtimeuse search throughout the application
free text searchfaceted navigationsuggestionsdashboards
free text searchfree text search
faceted navigationfaceted navigation
drill down
suggestionssuggestions
dashboardsdashboards
hardwarehardwareused "large" serversservers had lots of CPUs & RAMnon-RAIDed spinning disks
5 dedicated nodesall nodes store dataall nodes are masterall nodes sort & aggregate
clustercluster
initial approachinitial approach
shardsshardsused the default shard count5 primary + 1 replicaunlimited primary shards / node
indicesindicesdata was chronologicalused the time-based index strategy
weekly indices for transaction logsdaily indices for audit logs
initial approachinitial approach
memorymemorydedicated 31 GB to the jvm heapused remaining memory for file system cacheturned off linux process swappingmaxed out linux file descriptorsused G1 Garbage Collector
initial approachinitial approach
index mappingsindex mappingsindexed all fieldsstored big documents with 60+ fieldsnested documentsparent-child relationships
searchessearchessearched all indicesused query_string searchessearched all fieldssorted & aggregated on any fieldrange queriesparent-child queries
GET /index-*/_search
"query_string" : { "query": "+(eggplant | potato)", "default_field": "_all", "default_operator": "and"}
initial approachinitial approach
problemsproblems
OutOfMemoryErrorfield data exceeded jvm heapshard count was in the thousandsgarbage collector could not free memory
CircuitBreakerExceptionfield data exceeded jvm heapsearch results exceeded jvm heap
slow searches (latency increased from seconds to minutes)nodes became unresponsivefrequent GC pauses
early signs
cluster downcluster down
index corruptiondata loss
nodes failed to restart
next stepsnext steps
shard capacityshard capacityunderstand data & searches
size based on actual usage
field datafield datamonitor
identify the producers
reduce usage
searchsearchidentify bottlenecks
optimize
clusterclusterfind failure points
make topology changes
make hardware changes
identify and fix problems...
shard capacityshard capacity
1 shard can handle a lot of dataactually it held ~5x more datadidn't need 5 shards per indexdid't need weekly/daily indices
learned...learned...
shard is the unit of scalehow much data can a single shard hold?find the single shard breaking point
1. loaded a single shard with data
2. ran typical searches
3. recorded search response time
4. repeated until response time became unacceptable
field datafield datawhich fields and indices are using a lot of field data?use the stats API to find out
fields used for sorting & aggregationhigh cardinality fieldsid-cache for parent-child relationshipsfield data is loaded first time field is accessedfield data is maintained per-indexfield data is not GC'd
culprits...culprits...
# Node Statscurl -XGET 'http://localhost:9200/_nodes/stats/indices/fielddata?human'
# Indices Statcurl -XGET 'http://localhost:9200/_stats/fielddata/?human'
searchsearch
searching all indices is slow, CPUintensive and causes field data tobe loaded for every index
# Searches all indices/indexname-*/_search
# Search specific indices/indexname-2015/_search
query_string is flexible but allowsinefficient searches like leadingwildcard searches and searches_all fields by default
{ "query_string" : { "default_field" : "_all", "allow_leading_wildcard" "true", "query" : "this AND that OR thus" }}
what are the bottlenecks and resource killers?
clustercluster
field data used up 70-90% of the heap memorynot much heap left for node & shard managementstop the world Garbage Collector (GC) pauses made thecluster unresponsivenodes dropped out of the clusterthe G1 GC had longer pauses than the CMS GCsorting, aggregations, id-cache for parent-childrelationships used up a lot of heap memorymanaging too many shards used a lot of heap memory
why is the cluster crashing?
lessons learned...lessons learned...
number of shards / node should not exceed the number of CPU coresfigure out the single shard capacitymonitor field data usagefield data usage is permanent and does not get garbage collectedtoo high field data usage will bring down the clustersearch specific indices by target date rangetune and test all search API searchessplit cluster into data, client and master nodes use the default ES JVM settings and garbage collector
hardwarehardwareused "large" serversservers had lots of CPUs & RAMnon-RAIDed spinning disksput master and client nodes on same servers
5 8 dedicated nodesall nodes are master dedicated master nodesall nodes store data dedicated data nodes all nodes sort & aggregate dedicated client nodes
clustercluster
improvementsimprovements
shardsshardsdefault shard count didn't work5 1 primary + 1 replicaunlimited primary shards / node # of primaryshards less than # of CPU cores
indicesindicesdata was chronologicalused the time-based index strategyweekly monthly indices for transaction logsdaily monthly indices for audit logs
improvementsimprovements
memorymemorydedicated 31 GB to the jvm heapused remaining memory for file system cacheturned off linux process swappingmaxed out linux file descriptorsused new G1 GC used stable CMS GC
improvementsimprovements
index mappingsindex mappingsindexed all 40 fieldsstored big documents with 60+ fieldsnested documentsparent-child relationshipsused field aliases to define alternatefields used in sorting and aggregationused doc_value on sortable &aggregation fieldschanged boolean data type to string
"field": { "index": "no"}
# uses field data"fieldA": { "type": "boolean"}
# uses doc_value (no field data)"fieldA": { "type": "string", "index": "analyzed", "fields": { "raw" : { "type" : "string", "index" : "not_analyzed", "fielddata": { "format": "doc_values" } } }}
improvementsimprovements
searchessearchessearch all indices target specificindices query_string simple_query_stringsearch on all some fieldssorting & aggregations on all lowcardinality fieldsrange queries filtersparent-child nested queriesadded query timeouts
GET /index-201501/_search
"simple_query_string" : { "query": "+(eggplant | potato)", "fields": ["field1", "field2"], "default_operator": "and"}
improvementsimprovements
Questions?Questions?
Thank YouThank You