Elasticsearch in production New York Meetup at Twitter October 2014

Elasticsearch in production !

Konrad Beiske konrad@found.no

@beiske

Senior software engineer of Found AS Working with Elasticsearch for 2 years

Herding hundreds of Elasticsearch clusters

Agenda

Agenda• Anti-patterns

• Memory / Resource Usage

• Distributed problems

• Security

• Client concerns

• Changing a cluster

found.no/foundation

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Snapshot / Restore

Circuit breakersDocument values

Aggregations

Distributed percolation

Suggesters

Anti-Patterns

Arbitrary Keys

• “Schema Free”

• One field per value

• Ever-growing cluster state

acls: 1234: READ 42: WRITE

Heavy Updating

• Update = Delete + Reindex

• Be careful with counters

Slow queries

• WHERE foo ILIKE ‘%bar%’

• {“query_string”: {“query”: “foo:*bar*”}}

Arbitrary searches

query: filtered: filter: term: user_id: 42 query: [user’s query here]

Time Bomb

Memory

Memory• Field caches

• Filter caches

• Page caches

• Aggregations

• Index building

Page Cache

• Keeping index pages in memory

• Can’t have too much

• Outgrow: Gradual slowdown

Heap Space

• Memory used by Elasticsearch process

• Field / Filter caches

• Aggregations

Time Bomb

OutOfMemoryError

Woah there

I ate all the memories

Your cluster may or may not work any more

OutOfMemory

• Growing too big

• Selecting too big timespan in Kibana

• Document ingestion peak

Preventing OOMs• Have enough memory :-)

• Understand your search’s memory profile

• Bulk / Circuit breaker settings

• Monitoring

• Document values

Marvel( /_stats )

Document Values

"my_field": { "type": "string", "fielddata": { "format": "doc_values" } }

Sizing

• Test, don’t guess

• Start big, scale down

• Index, search, monitor

Glitch Meltdown

• Tie-breaker can be a cheap master-node

• Applies to data centers / availability zones too

Data-only nodes

Master-only nodes

Jepsen

• Kyle Kingsbury’s series on distributed systems

• Distributed systems are hard

• aphyr.com

Security

• “Not my job!” – Elasticsearch

• That’s fine!

Dynamic Scripts

• Scoring

• Aggregations

• Updating

Dynamic Scripts

Runtime.getRuntime().exec(…)

Security

• Disable dynamic scripts

• Mind index patterns

• Even then, don’t accept arbitrary requests

Client Concerns

• Connection pools

• Idempotent requests

• Have sane syncing/indexing strategies

# BOOM !

Cluster changes

• Make new nodes join existing cluster

• No rolling restarts

• Easy rollback if things go bad

v1.0.0 v1.0.1

Cluster changes

• Test first

• Mind recover_*-settings

Multi-Cluster Workflows

• Snapshot/Restore

• Operations across clusters

• Swap clusters!

• Works well with good syncing strategy

• Same JVM

• ulimits

• Unicast and cluster name

• SSD? noop-scheduler

@foundsays

Learn More! !

found.no/foundation

@beiskeFollow

Elasticsearch in production New York Meetup at Twitter October 2014

Software

Transcript of Elasticsearch in production New York Meetup at Twitter October 2014

ElasticSearch Meetup 30 - 10 - 2014

Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem at Twitter, Dmitriy Ryaboy, Twitter

GETTING STARTED WITH ELASTICSEARCH ON WINDOWS AND.NET WITH NEST A short introduction Oslo/NNUG Meetup Tomas Jansson 29/01/2014.

Michal Barla: Beyond search queries @ ElasticSearch Vienna Meetup #1

Tlantic @ ElasticSearch POA Meetup

OneRiot Twitter Meetup Presentation

Simple fuzzy name matching in elasticsearch paris meetup

Multiple ways of building a recommender system with Elasticsearch - Elastic Meetup Switzerland - Andrii Vozniuk

Digital Marketing Tips: Twitter, Instagram, Pinterest, Video - SEMdmv Meetup

OnCrawl ElasticSearch Meetup France #12

Delhi elasticsearch meetup

Logmatic at ElasticSearch November Paris meetup

Elasticsearch meetup final_2014_04

Shield talk elasticsearch meetup Zurich 27.05.2015

AWS Meetup San Francisco ELK_SF Meetup... · Built on the Elasticsearch search engine, ELK provide blazing quick responses even when searching through millions of documents Hard to

SF ElasticSearch Meetup 2013.04.06 - Monitoring

Zeppelin at twitter (sf data science meetup, july 2016)

Elasticsearch in production Boston Meetup October 2014

FVCP - Facebook , Twitter and Meetup API / Widgets

Elasticsearch: Accelerating the Django Admin · Elasticsearch Service elastic cloud Elasticsearch Reference + + + + + + + Elasticsearch Reference: 6.4 (current) Getting Started Set