Scaling Elasticsearch on Kubernetes -...

Scaling Elasticsearch on Kubernetes

By Ryan Staatz

Fast multi-cloud logging

What is Elasticsearch (ES) and why would I use it?

● Elasticsearch is a distributed full-text search engine that is queryable via a JSON API

● The ELK stack, a popular open-source logging backend, uses Elasticsearch (the ‘E’)

● Unstructured data (e.g. long strings) can be searched relatively easily

● Native distributed clustering support makes adding Elasticsearch nodes easy

● You’ve been watching the Elasticsearch hype train and want to hop aboard

In brief:

Presentation by Ryan Staatz


What is Kubernetes (k8s) and why would I run ES on it?

● Kubernetes is an open-source container orchestration platform developed by Google

● Native cloud provider integrations makes scaling hardware easy with k8s

● Scheduling & distributing application workloads onto hardware resources is automatic

● Configuration as code & static docker images enforce consistent pod behaviors

● You’ve been watching the Kubernetes hype train ship and want to hop aboard

Also in brief:



So with Kubernetes, this should be easy, right?

● Choose the appropriate Elasticsearch version and select the correct settings (there are hundreds of settings)

● Learn the expansive query language for Elasticsearch and integrate it into your workflows

● Set up a Kubernetes environment with access to appropriately sized hardware

● Configure the Elasticsearch k8s workload to request the appropriate resources, including disks

● Ensure the correct index templates and cluster settings are applied after launching your ES cluster

● Create k8s services such that Elasticsearch pods can find each other

● Troubleshoot all remaining issues as they arise and continue to manage and scale the cluster

All you gotta do is...

Sounds great, let’s get started!



Getting started

● ES version 5.5 (later versions are probably ok too)

● Hardware resources (k8s nodes) with at least 64 GB of RAM and 16 vCPUs (depends on your volume)

● Kubernetes cluster v1.11+ (for pre-emption)

● Statefulset yaml configurations (we need identity and disks)

● Kubernetes services to help cluster pods discover each other

● Basic, but important cluster settings & a good starter index template

● Deploy an ES cluster management GUI (cerebro) to help with troubleshooting

Maybe let’s just start with some sane defaults



A tale of too many yamls

● Two ConfigMaps:

○ The elasticsearch configuration file

○ A start script used to configure ulimits, permissions, and JVM heap size

● Three ES role types (statefulsets)

○ Master - handles lightweight cluster-wide actions (does not require disk)

○ Hot - handles incoming writes to active indices (higher cpu to disk ratio)

○ Cold - stores and queries older indices (lower cpu to disk ratio)

There’s going to be a lot of these, but configuration as code is good!



Important ES configuration notes

● Use the alpine flavor of ES to reduce image size: elasticsearch:5.5.2-alpine

● Configure volumeClaimTemplates to dynamically provision disks

● Ensure the correct security context settings are specified in each statefulset

● Use k8s pre-emption to ensure your ES pods get scheduling priority

● Create a startup script to set the correct configuration prior to starting the JVM

Pro tip: this slide contains several pro tips



Service discovery

● ES hot and cold have a single load balanced cluster IP service endpoint for insertions

● ES masters have 2 services

○ 1 load-balanced cluster IP for transport (9300) and http API requests (9200)

○ 1 clusterIP: None used for ES unicast discovery

● 2 important settings for clusterIP: None

○ Ensure DNS is publishable immediately

○ No sessionAffinity ensures accurate addresses

Leverage Kubernetes’ native services



ES startup settings

● Ensure memory_lock is on

● Adjust the min master nodes based on the total number of masters you have

● The clusterIP: None service from the last slide is referenced by unicast settings

● Set the correct ES role

● Specify the number of cores

Just the ones we use



Configuring an index template

● Configure index.total_shards_per_node based on your expected load

○ Optimizing shards can increase performance and reduced cluster state overhead

● Set a refresh_interval that works for you

○ Higher refresh intervals offer better throughput performance at the cost of latency

○ We typically use 15-30 seconds

● Change translog.durability to async (allow asynchronous translog writes)

○ We regret not discovering this setting sooner, as it gave us 5-10x better performance

● Note: index templates MUST be applied AFTER the ES masters are already running

Index templates can have a huge impact on your cluster performance



Manage ES the GUI way: Cerebro

● Cerebro connects to your ES service endpoint(s)

● Contains an ES node (pod) list and their health stats

● View indices and shards across the available data nodes

● Modify index settings, templates, and data

● Move shards around (important)

● Some APIs cerebro doesn’t have a GUI for

Previously kopf if you’re using ES v2.X or lower



Manage ES the API way

● We use Insomnia (a REST API GUI to share API calls)

● Curl works too!

● API calls we commonly use:

○ /_cluster/health

○ /_cat/pending_tasks?v

○ /_flush?force & /_cluster/reroute?retry_failed=true

A bit more work to start on, but automation is much easier



Wrap up

● ES requires some coaxing to properly run inside a container

○ Use the correct security context, ulimit, and vm settings

● There are native concepts in Kubernetes than can make running ES easier

○ Service discovery, volumeClaimTemplates, pre-emption, and more

○ ...or you could just use an operator! (your mileage may vary)

● Index templates have a big impact on how well your ES cluster runs

● GUIs (cerebro) and ES APIs are extremely useful for tuning performance

That was a lot of info, but here’s what to walk away with:


Scaling Elasticsearch on Kubernetes -...

Documents

Transcript of Scaling Elasticsearch on Kubernetes -...