JavaCro'14 - Desktop2web migration is finished, 700.000+ lines of code is reborn – Matija Tomašković
When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...
Transcript of When every - Meetupfiles.meetup.com/7139612/Cassandra meetup Amsterdam 20.07... · 2016-08-01 ·...
Why this talk
We were challenged with an interesting requirement...
What makes a distributed system?
A bunch of stuff that magically works together
How to start?
Investigate the current setup (if any)
Understand your use case
Understand your data
Set a base configuration
Define the goal
Investigate the current setup
● What type of deployment are you working with?● What is the available hardware?
○ CPU cores and threads○ Memory amount and type○ Storage size and type○ Network interfaces amount and type○ Limitations
Hardware configuration
8-16 cores32GB ram
Commit log SSDData drive SSD
1GbE
Placement groupsAvailability zones
Enhanced networking
OS - Swap, storage, cpu
Swap is bad
● remove swap from fstab● disable swap: swapoff -a
Optimize block layer
echo 1 > /sys/block/XXX/queue/nomergesecho 8 > /sys/block/XXX/queue/read_ahead_kbecho deadline > /sys/block/XXX/queue/scheduler
Disable cpu scaling
for sysfs_cpu in /sys/devices/system/cpu/cpu[0-9]*do echo performance > $sysfs_cpu/cpufreq/scaling_governordone
sysctl.d - network
net.ipv4.tcp_rmem = 4096 87380 16777216 # read buffer space allocatable in units of pagesnet.ipv4.tcp_wmem = 4096 65536 16777216 # write buffer space allocatable in units of pagesnet.ipv4.tcp_ecn = 0 # disable explicit congestion notificationnet.ipv4.tcp_window_scaling = 1 # enable window scaling (higher throughput)net.ipv4.ip_local_port_range = 10000 65535 # allowed local port rangenet.ipv4.tcp_tw_recycle = 1 # enable fast time-wait recycle
net.core.rmem_max = 16777216 # max socket receive buffer in bytesnet.core.wmem_max = 16777216 # max socket send buffer in bytesnet.core.somaxconn = 4096 # number of incoming connectionsnet.core.netdev_max_backlog = 16384 # incoming connections backlog
sysctl.d - vm and fs
vm.swappiness = 1 # memory swapping thresholdvm.max_map_count = 1073741824 # max memory map areas a process can havevm.dirty_background_bytes = 10485760 # dirty memory amount threshold (kernel)vm.dirty_bytes = 1073741824 # dirty memory amount threshold (process)fs.file-max = 1073741824 # max number of open filesvm.min_free_kbytes = 1048576 # min number of VM free kilobytes
JVM - G1GC
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"JVM_OPTS="$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25"
JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=16" # Set to number of full coresJVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=16" # Set to number of full cores
JVM - HotSpot
MAX_HEAP_SIZE="8G" # Good starting pointHEAP_NEWSIZE="2G" # Good starting point
JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
# Tunable settingsJVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=2"JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=16"JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=4096"
# Instagram settingsJVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
Cassandra yaml
concurrent_reads: 128concurrent_writes: 128concurrent_counter_writes: 128memtable_allocation_type: heap_buffersmemtable_flush_writers: 8memtable_cleanup_threshold: 0.15memtable_heap_space_in_mb: 2048memtable_offheap_space_in_mb: 2048
trickle_fsync: truetrickle_fsync_interval_in_kb: 1024
internode_compression: dc
Data model
Data model impacts performance a lotOptimize so that you read from one partition
Make sure your data can be distributedSSTable compression depending on the use case
Compaction strategy
Ok, what now?
After we set the base configuration it’s time for testing and observing
Test setup
Make sure you have repeatable testsFixed rate tests
Variable rate testsProduction like testsCassandra Stress
Various loadgen tools (gatling, wrk, loader,...)Coordinated omission
Tuning methodology
Metrics and reporting stack
OS metrics (SmartCat)Metrics reporter config (AddThis)
Cassandra diagnostics (SmartCat)FilebeatRiemannInfluxDBGrafana
ElasticsearchLogstashKibana
Grafana
Kibana
Slow queries
Track query execution times above some thresholdGain insights into the long processing queries
Relate that to what’s going on on the nodeCompare app and cluster slow queries
https://github.com/smartcat-labs/cassandra-diagnostics
Slow queries - cluster
Slow queries - cluster vs app
Ops center
Pros:Great when starting out
Everything you need in a nice GUICluster metrics
Cons:Metrics stored in the same cluster
Issues with some of the services (repair, slow query,...)Additional agents on the nodes
AWS
AWS deployment
Choose your instance based on calculationsCost limits come second
Use placement groups and availability zonesDon’t overdo it just because you can ($$$)
Go for EBS volumes (gp2)You don’t need ephemeral storage (mostly)
EBS volumes
Pros:3.4TB+ volume has 10.000 IOPs
Average latency is ~0.38msDurable across reboots
AWS snapshotsCan be attached/detached
Easy to recreate
Cons:Rare latency spikes
Average latency is ~0.38msDegrading factor
EBS volume problems
End result
Did we meet our goal?Can we go any further?
Torture testingFailure scenarios
Latency and delay inducersAutomate everything
Q&A