When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...

When every millisecond counts

July 2016

Matija [email protected]

@mad_max0204

Why this talk

We were challenged with an interesting requirement...

What makes a distributed system?

A bunch of stuff that magically works together

How to start?

Investigate the current setup (if any)

Understand your use case

Understand your data

Set a base configuration

Define the goal

Investigate the current setup

● What type of deployment are you working with?● What is the available hardware?

○ CPU cores and threads○ Memory amount and type○ Storage size and type○ Network interfaces amount and type○ Limitations

Hardware configuration

8-16 cores32GB ram

Commit log SSDData drive SSD

1GbE

Placement groupsAvailability zones

Enhanced networking

OS - Swap, storage, cpu

Swap is bad

● remove swap from fstab● disable swap: swapoff -a

Optimize block layer

echo 1 > /sys/block/XXX/queue/nomergesecho 8 > /sys/block/XXX/queue/read_ahead_kbecho deadline > /sys/block/XXX/queue/scheduler

Disable cpu scaling

for sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]*do echo performance > $sysfs_cpu/cpufreq/scaling_governordone

sysctl.d - network

net.ipv4.tcp_rmem = 4096 87380 16777216 # read buffer space allocatable in units of pagesnet.ipv4.tcp_wmem = 4096 65536 16777216 # write buffer space allocatable in units of pagesnet.ipv4.tcp_ecn = 0 # disable explicit congestion notificationnet.ipv4.tcp_window_scaling = 1 # enable window scaling (higher throughput)net.ipv4.ip_local_port_range = 10000 65535 # allowed local port rangenet.ipv4.tcp_tw_recycle = 1 # enable fast time-wait recycle

net.core.rmem_max = 16777216 # max socket receive buffer in bytesnet.core.wmem_max = 16777216 # max socket send buffer in bytesnet.core.somaxconn = 4096 # number of incoming connectionsnet.core.netdev_max_backlog = 16384 # incoming connections backlog

sysctl.d - vm and fs

vm.swappiness = 1 # memory swapping thresholdvm.max_map_count = 1073741824 # max memory map areas a process can havevm.dirty_background_bytes = 10485760 # dirty memory amount threshold (kernel)vm.dirty_bytes = 1073741824 # dirty memory amount threshold (process)fs.file-max = 1073741824 # max number of open filesvm.min_free_kbytes = 1048576 # min number of VM free kilobytes

JVM - G1GC

JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"

JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" # Set to number of full coresJVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" # Set to number of full cores

JVM - HotSpot

MAX_HEAP_SIZE="8G" # Good starting pointHEAP_NEWSIZE="2G" # Good starting point

JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"

# Tunable settingsJVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16"JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096"

# Instagram settingsJVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"

Cassandra yaml

concurrent_reads: 128concurrent_writes: 128concurrent_counter_writes: 128memtable_allocation_type: heap_buffersmemtable_flush_writers: 8memtable_cleanup_threshold: 0.15memtable_heap_space_in_mb: 2048memtable_offheap_space_in_mb: 2048

trickle_fsync: truetrickle_fsync_interval_in_kb: 1024

internode_compression: dc

Data model

Data model impacts performance a lotOptimize so that you read from one partition

Make sure your data can be distributedSSTable compression depending on the use case

Compaction strategy

Ok, what now?

After we set the base configuration it’s time for testing and observing

Test setup

Make sure you have repeatable testsFixed rate tests

Variable rate testsProduction like testsCassandra Stress

Various loadgen tools (gatling, wrk, loader,...)Coordinated omission

Tuning methodology

Metrics and reporting stack

OS metrics (SmartCat)Metrics reporter config (AddThis)

Cassandra diagnostics (SmartCat)FilebeatRiemannInfluxDBGrafana

ElasticsearchLogstashKibana

https://github.com/smartcat-labs/smartcat-os-metrics

https://github.com/addthis/metrics-reporter-config

https://github.com/smartcat-labs/cassandra-diagnostics

Grafana

Kibana

Slow queries

Track query execution times above some thresholdGain insights into the long processing queries

Relate that to what’s going on on the nodeCompare app and cluster slow queries




Slow queries - cluster

Slow queries - cluster vs app

Ops center

Pros:Great when starting out

Everything you need in a nice GUICluster metrics

Cons:Metrics stored in the same cluster

Issues with some of the services (repair, slow query,...)Additional agents on the nodes

AWS deployment

Choose your instance based on calculationsCost limits come second

Use placement groups and availability zonesDon’t overdo it just because you can ($$$)

Go for EBS volumes (gp2)You don’t need ephemeral storage (mostly)

EBS volumes

Pros:3.4TB+ volume has 10.000 IOPs

Average latency is ~0.38msDurable across reboots

AWS snapshotsCan be attached/detached

Easy to recreate

Cons:Rare latency spikes

Average latency is ~0.38msDegrading factor

EBS volume problems

End result

Did we meet our goal?Can we go any further?

Torture testingFailure scenarios

Latency and delay inducersAutomate everything

Matija [email protected]

@mad_max0204

Thank you

When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...

Documents

Transcript of When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...