Elasticsearch Scalability and Performance - Berlin · PDF filecodecentric AG Patrick Peschlow...

34
codecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

Transcript of Elasticsearch Scalability and Performance - Berlin · PDF filecodecentric AG Patrick Peschlow...

Page 1: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Patrick Peschlow

The Do's and Don’ts of

Elasticsearch Scalability and Performance

Page 2: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Think hard about your mapping

Page 3: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Think hard about your mapping

−Which fields to analyze? How to analyze them?

!−Need term frequencies, positions, offsets? Field norms?

!−Which fields to not analyze or not index/enable?

!− _all

!− _source vs stored fields

Page 4: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Think hard about your mapping

−Dynamic mapping/templates

− Excessive number of fields?

!− Index-time vs. query-time solutions

!−Multi field, copy to, transform script

!−Relations: parent-child/nested

Page 5: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Design for scale

Page 6: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Design for scale

Shard 1 Shard 2 Shard M

...

Search

Page 7: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Design for scale

2015-05-30 2015-05-31 2015-06-01

Search for „last 3 days“

2014-11-25

...

Page 8: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Design for scale

routing for user 1

Shard 1 Shard 2 Shard M

...

User 2 User 1 User 5User 3User 4

User 6 User 7User 8

Search by user 1

Page 9: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Design for scale

−Can documents/access be partitioned in a natural way?

!−Need to find documents by ID (update/delete/get)?

!−Know the relevant features

−Routing, aliases, multi-index search

!− Indices don’t come for free

!−Measure the impact of distributed search

Page 10: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t create moreshards than you need

Page 11: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t create more shards than you need

−More shards

− Enable larger indices

−Scale operations on individual documents

!−But shards don’t come for free

!−Measure how many shards you need

−When unsure, overallocate a little

Page 12: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t treat all nodes as equal

Page 13: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t treat all nodes as equal

−Cluster nodes

−Master nodes, data nodes, client/aggregator nodes

!−Client applications

−HTTP?

− Transport protocol?

− Join the cluster as a client node?

− In Java: HTTP client vs TransportClient vs NodeClient

Page 14: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t run wasteful queries

Page 15: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t run wasteful queries

−Only request as many hits as you need

!−Avoid deep pagination

!−Use scan+scroll to iterate without sorting

!−Only query indices/shards that may contain hits

Page 16: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Engineer queries

Page 17: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Engineer queries

−Measure performance

−Set up production-like cluster and data

!−Use filters

!−Check and tune filter caching

!−Reduce work for heavyweight filters

−Order them, consider accelerators

Page 18: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Care about field data

Page 19: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Care about field data

−Used for sorting, aggregation, parent-child, scripts, …

!−High memory consumption or OutOfMemoryError

−Cache limit, circuit breakers avoid the worst

!− Evaluate field data requirements in advance

!−Use „doc values“ to store expensive field data on disk

Page 20: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Be prepared for reindexing

Page 21: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Be prepared for reindexing

−Reasons for reindexing

−Mapping changes

− Index/shard reaches its capacity

−Reduce number of indices/shards

Page 22: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Be prepared for reindexing

−Reindexing procedure depends on many factors

−Data source?

−Zero downtime?

−Update API usage?

−Possible deletes?

−Designated component (queue) for indexing?

Page 23: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Be prepared for reindexing

−Use existing tooling

!−Do it yourself? Use scan+scroll and bulk indexing

!− Follow best practices

−Use aliases

−Disable refresh

−Decrease number of replicas

Page 24: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t use the defaults

Page 25: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t use the defaults

−Cluster settings

− cluster name, discovery, minimum_master_nodes

− recovery

!−Number of shards and replicas

!−Refresh interval

!− Thread pool and cache configuration

Page 26: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Monitor

Page 27: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Monitor

−Cluster health, split brains

!− Thread pools and caches

!−Garbage collection (the actual JVM output)

!−Slow log

!−Hot threads

Page 28: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Follow the production recommendations

Page 29: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Follow the production recommendations

−A good start would be to read/research them at all

!− Just to mention a few

− The more memory, the better

− Isolate as much as possible

−SSDs and local storage recommended

Page 30: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t test in production

Page 31: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Don’t test in production

−Use a test environment

!− Test the cluster

−Single node restarts, rolling upgrades, node loss

− Full cluster restarts

!− Test behavior under expected load

−Queries

− Indexing

Page 32: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Read the guide

Page 33: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Read the guide

− Elasticsearch: The Definitive Guide − https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

!− Elasticsearch Reference

− https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Page 34: Elasticsearch Scalability and Performance - Berlin  · PDF filecodecentric AG Patrick Peschlow The Do's and Don’ts of Elasticsearch Scalability and Performance

codecentric AG

Questions?

Dr. rer. nat. Patrick Peschlow

codecentric AGMerscheider Straße 142699 Solingentel +49 (0) 212.23 36 28 54 fax +49 (0) 212.23 36 28 79 [email protected] www.codecentric.de