Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

119
CASSANDRA COMMUNITY WEBINARS AUGUST 2013 IN CASE OF EMERGENCY, BREAK GLASS Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

description

A look at high level Operations problem solving with Apache Cassandra

Transcript of Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Page 1: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

CASSANDRA COMMUNITY WEBINARS AUGUST 2013

IN CASE OF EMERGENCY, BREAK GLASS

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

About The Last PickleWork with clients to deliver and improve

Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP,

Hector Maintainer, 6+ years combined Cassandra experience.

Based in New Zealand & Austin, TX.

Page 3: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

PlatformTools

ProblemsMaintenance

www.thelastpickle.com

Page 4: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform

www.thelastpickle.com

Page 5: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Clients

www.thelastpickle.com

Page 6: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Running Clients

www.thelastpickle.com

Page 7: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Reality

ConsistencyAvailability

Partition Tolerance

www.thelastpickle.com

Page 8: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Consistency

Strong Consistency(R + W > N)

Eventual Consistency(R + W <= N)www.thelastpickle.com

Page 9: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

What Price Consistency?

In a Multi DC cluster QUOURM and EACH_QUOURM involve

cross DC latency.

www.thelastpickle.com

Page 10: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Availability

Maintain Consistency Level UP nodes for each Token Range.

www.thelastpickle.com

Page 11: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Best Case Failure with N=9 and RF 3, 100% Availability

Replica 1

Replica 2

Replica 3

Range A

www.thelastpickle.com

Page 12: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Worst Case Failure with N=9 and RF 3, 78% Availability

Range B

Range A

www.thelastpickle.com

Page 13: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Partition Tolerance

A failed node does not create a partition.

www.thelastpickle.com

Page 14: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Partition Tolerance

www.thelastpickle.com

Page 15: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Partition Tolerance

Partitions occur when the network fails.

www.thelastpickle.com

Page 16: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Platform & Partition Tolerance

www.thelastpickle.com

Page 17: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

The Storage Engine

Optimised for Writes.

www.thelastpickle.com

Page 18: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Write Path

Append to Write Ahead Log.(fsync every 10s by default, other options available)

www.thelastpickle.com

Page 19: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Write Path

Merge new Columns into Memtable.

(Lock free, always in memory.)

www.thelastpickle.com

Page 20: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Write Path... Later

Asynchronously flush Memtable to a new SSTable on

disk.(May be 10’s or 100’s of MB in size.)

www.thelastpickle.com

Page 21: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

SSTable Files

*-Data.db*-Index.db*-Filter.db

(And others)

www.thelastpickle.com

Page 22: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Row Fragmentation

SSTable 1foo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2foo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3 SSTable 4foo: dishwasher (ts 15): tomacco

SSTable 5

www.thelastpickle.com

Page 23: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read Path

Read columns from each SSTable, then merge results.

(Roughly speaking.)

www.thelastpickle.com

Page 24: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read Path

Use Bloom Filter to determine if a row key does

not exist in a SSTable.(In memory)

www.thelastpickle.com

Page 25: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read Path

Search for prior key in *-Index.db sample.

(In memory)

www.thelastpickle.com

Page 26: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read Path

Scan *-Index.db from prior key to find the search key and its’ *-Data.db

offset.(On disk.)

www.thelastpickle.com

Page 27: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read Path

Read *-Data.db from offset, all columns or specific

pages.

www.thelastpickle.com

Page 28: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read purple, monkey, dishwasher

SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco

SSTable 5-Data.db

Bloom Filter

Index Sample

SSTable 1-Index.db

Bloom Filter

Index Sample

SSTable 2-Index.db

Bloom Filter

Index Sample

SSTable 3-Index.db

Bloom Filter

Index Sample

SSTable 4-Index.db

Bloom Filter

Index Sample

SSTable 5-Index.db

Memory

Disk

www.thelastpickle.com

Page 29: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read With Key Cache

SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco

SSTable 5-Data.db

Key Cache

Index Sample

SSTable 1-Index.db

Key Cache

Index Sample

SSTable 2-Index.db

Key Cache

Index Sample

SSTable 3-Index.db

Key Cache

Index Sample

SSTable 4-Index.db

Key Cache

Index Sample

SSTable 5-Index.db

Memory

Disk

Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter

www.thelastpickle.com

Page 30: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Read with Row CacheRow Cache

SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco

SSTable 5-Data.db

Key Cache

Index Sample

SSTable 1-Index.db

Key Cache

Index Sample

SSTable 2-Index.db

Key Cache

Index Sample

SSTable 3-Index.db

Key Cache

Index Sample

SSTable 4-Index.db

Key Cache

Index Sample

SSTable 5-Index.db

Memory

Disk

Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter

www.thelastpickle.com

Page 31: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Performant Reads

Design queries to read from a small number of SSTables.

www.thelastpickle.com

Page 32: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Performant Reads

Read a small number of named columns or a slice of

columns.

www.thelastpickle.com

Page 33: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Performant Reads

Design data model to support current application

requirements.

www.thelastpickle.com

Page 34: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

PlatformTools

ProblemsMaintenance

www.thelastpickle.com

Page 35: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Logging

Configure via log4j-server.properties

and StorageServiceMBean

www.thelastpickle.com

Page 36: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

DEBUG Logging For One Class

log4j.logger.org.apache.cassandra.thrift.CassandraServer=DEBUG

www.thelastpickle.com

Page 37: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Reading LogsINFO [OptionalTasks:1] 2013-04-20 14:03:50,787 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='KS1', ColumnFamily='CF1') (estimated 403858136 bytes)

INFO [OptionalTasks:1] 2013-04-20 14:03:50,787 ColumnFamilyStore.java (line 634) Enqueuing flush of Memtable-CF1@1333396270(145839277/403858136 serialized/live bytes, 1742365 ops)

INFO [FlushWriter:42] 2013-04-20 14:03:50,788 Memtable.java (line 266) Writing Memtable-CF1@1333396270(145839277/403858136 serialized/live bytes, 1742365 ops)

www.thelastpickle.com

Page 38: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

GC Logscassandra-env.sh

# GC logging options -- uncomment to enable# JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"# JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"# JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"# JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"# JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"# JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"# JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"# JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"

www.thelastpickle.com

Page 39: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

ParNew GC Starting

{Heap before GC invocations=224115 (full 111): par new generation total 873856K, used 717289K ...) eden space 699136K, 100% used ...) from space 174720K, 10% used ...) to space 174720K, 0% used ...)

www.thelastpickle.com

Page 40: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Tenuring Distribution

240217.053: [ParNewDesired survivor size 89456640 bytes, new threshold 4 (max 4)- age 1: 22575936 bytes, 22575936 total- age 2: 350616 bytes, 22926552 total- age 3: 4380888 bytes, 27307440 total- age 4: 1155104 bytes, 28462544 total

www.thelastpickle.com

Page 41: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

ParNew GC Finishing

Heap after GC invocations=224116 (full 111): par new generation total 873856K, used 31291K ...) eden space 699136K, 0% used ...) from space 174720K, 17% used ...) to space 174720K, 0% used ...)

www.thelastpickle.com

Page 42: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool infoToken : 0Gossip active : trueLoad : 130.64 GBGeneration No : 1369334297Uptime (seconds) : 29438Heap Memory (MB) : 3744.27 / 8025.38Data Center : eastRack : rack1Exceptions : 0Key Cache : size 104857584 (bytes), capacity 104857584 (bytes), 25364985 hits, 34874180 requests, 0.734 recent hit rate, 14400 save period in secondsRow Cache : size 0 (bytes), capacity 0...

www.thelastpickle.com

Page 43: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool ringNote: Ownership information does not include topology, please specify a keyspace. Address DC Rack Status State Load Owns Token 10.1.64.11 east rack1 Up Normal 130.64 GB 12.50% 0 10.1.65.8 west rack1 Up Normal 88.79 GB 0.00% 1 10.1.64.78 east rack1 Up Normal 52.66 GB 12.50% 212...216 10.1.65.181 west rack1 Up Normal 65.99 GB 0.00% 212...217 10.1.66.8 east rack1 Up Normal 64.38 GB 12.50% 425...432 10.1.65.178 west rack1 Up Normal 77.94 GB 0.00% 425...433 10.1.64.201 east rack1 Up Normal 56.42 GB 12.50% 638...648 10.1.65.59 west rack1 Up Normal 74.5 GB 0.00% 638...649 10.1.64.235 east rack1 Up Normal 79.68 GB 12.50% 850...864 10.1.65.16 west rack1 Up Normal 62.05 GB 0.00% 850...865 10.1.66.227 east rack1 Up Normal 106.73 GB 12.50% 106...080 10.1.65.226 west rack1 Up Normal 79.26 GB 0.00% 106...081 10.1.66.247 east rack1 Up Normal 66.68 GB 12.50% 127...295 10.1.65.19 west rack1 Up Normal 102.45 GB 0.00% 127...297 10.1.66.141 east rack1 Up Normal 53.72 GB 12.50% 148...512 10.1.65.253 west rack1 Up Normal 54.25 GB 0.00% 148...513

www.thelastpickle.com

Page 44: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool ring KS1Address DC Rack Status State Load Effective-Ownership Token 10.1.64.11 east rack1 Up Normal 130.72 GB 12.50% 0 10.1.65.8 west rack1 Up Normal 88.81 GB 12.50% 1 10.1.64.78 east rack1 Up Normal 52.68 GB 12.50% 212...21610.1.65.181 west rack1 Up Normal 66.01 GB 12.50% 212...21710.1.66.8 east rack1 Up Normal 64.4 GB 12.50% 425...43210.1.65.178 west rack1 Up Normal 77.96 GB 12.50% 425...43310.1.64.201 east rack1 Up Normal 56.44 GB 12.50% 638...64810.1.65.59 west rack1 Up Normal 74.57 GB 12.50% 638...64910.1.64.235 east rack1 Up Normal 79.72 GB 12.50% 850...86410.1.65.16 west rack1 Up Normal 62.12 GB 12.50% 850...86510.1.66.227 east rack1 Up Normal 106.72 GB 12.50% 106...08010.1.65.226 west rack1 Up Normal 79.28 GB 12.50% 106...08110.1.66.247 east rack1 Up Normal 66.73 GB 12.50% 127...29510.1.65.19 west rack1 Up Normal 102.47 GB 12.50% 127...29710.1.66.141 east rack1 Up Normal 53.75 GB 12.50% 148...51210.1.65.253 west rack1 Up Normal 54.24 GB 12.50% 148...513

www.thelastpickle.com

Page 45: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool status$ nodetool statusDatacenter: ams01 (Replication Factor 3)=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 10.70.48.23 38.38 GB 256 19.0% 7c5fdfad-63c6-4f37-bb9f-a66271aa3423 RAC1UN 10.70.6.78 58.13 GB 256 18.3% 94e7f48f-d902-4d4a-9b87-81ccd6aa9e65 RAC1UN 10.70.47.126 53.89 GB 256 19.4% f36f1f8c-1956-4850-8040-b58273277d83 RAC1Datacenter: wdc01 (Replication Factor 3)=================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 10.24.116.66 65.81 GB 256 22.1% f9dba004-8c3d-4670-94a0-d301a9b775a8 RAC1UN 10.55.104.90 63.31 GB 256 21.2% 4746f1bd-85e1-4071-ae5e-9c5baac79469 RAC1UN 10.55.104.27 62.71 GB 256 21.2% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 RAC1

www.thelastpickle.com

Page 46: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool cfstatsKeyspace: KS1 Column Family: CF1 SSTable count: 11 Space used (live): 32769179336 Space used (total): 32769179336 Number of Keys (estimate): 73728 Memtable Columns Count: 1069137 Memtable Data Size: 216442624 Memtable Switch Count: 3 Read Count: 95 Read Latency: NaN ms. Write Count: 1039417 Write Latency: 0.068 ms. Bloom Filter False Postives: 345 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 230096 Compacted row minimum size: 150 Compacted row maximum size: 322381140 Compacted row mean size: 2072156

www.thelastpickle.com

Page 47: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool cfhistograms$nodetool cfhistograms KS1 CF1Offset SSTables Write Latency Read Latency Row Size Column Count1 67264 0 0 0 13315912 19512 0 0 0 42416863 35529 0 0 0 474784...10 10299 1150 0 0 2176812 5475 3569 0 0 399313514 1986 9098 0 0 143477817 258 30916 0 0 36689520 0 52980 0 0 18652424 0 104463 0 0 25439063...179 0 93 1823 1597 1284167215 0 84 3880 1231655 1147150258 0 170 5164 209282 956487

www.thelastpickle.com

Page 48: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool proxyhistograms$nodetool proxyhistogramsOffset Read Latency Write Latency Range Latency60 0 15 072 0 51 086 0 241 0103 2 2003 0124 9 5798 0149 67 7348 0179 222 6453 0215 184 6071 0258 134 5436 0310 104 4936 0372 89 4997 0446 39 6383 0535 76797 7518 0642 9364748 96065 0770 16406421 152663 0924 7429538 97612 01109 6781835 176829 0

www.thelastpickle.com

Page 49: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

JMX via JConsole

www.thelastpickle.com

Page 50: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

JMX via MX4J

www.thelastpickle.com

Page 51: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

JMX via JMXTERM$ java -jar jmxterm-1.0-alpha-4-uber.jar Welcome to JMX terminal. Type "help" for available commands.$>open localhost:7199#Connection to localhost:7199 is opened$>bean org.apache.cassandra.db:type=StorageService#bean is set to org.apache.cassandra.db:type=StorageService$>info#mbean = org.apache.cassandra.db:type=StorageService#class name = org.apache.cassandra.service.StorageService# attributes %0 - AllDataFileLocations ([Ljava.lang.String;, r) %1 - CommitLogLocation (java.lang.String, r) %2 - CompactionThroughputMbPerSec (int, rw)...# operations %1 - void bulkLoad(java.lang.String p1) %2 - void clearSnapshot(java.lang.String p1,[Ljava.lang.String; p2) %3 - void decommission()

www.thelastpickle.com

Page 52: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

JVM Heap Dump via JMAP

jmap -dump:format=b, file=heap.bin pid

www.thelastpickle.com

Page 53: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

JVM Heap Dump with YourKit

www.thelastpickle.com

Page 54: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

PlatformTools

ProblemsMaintenance

www.thelastpickle.com

Page 55: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Corrupt SSTable(Very rare.)

www.thelastpickle.com

Page 56: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Compaction ErrorERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:36,1,main]java.lang.RuntimeException: Last written key DecoratedKey(138024912283272996716128964353306009224, 6138633035613062     2d616666362d376330612d666531662d373738616630636265396535) >= current key DecoratedKey(127065377405949402743383718901402082101, 64323962636163652d646561372d333039322d386166322d663064346132363963386131) writing into *-tmp-hf-7372-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)

www.thelastpickle.com

Page 57: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Cause

Change in Key Validator or bug in older versions.

www.thelastpickle.com

Page 58: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Fix

nodetool scrub

www.thelastpickle.com

Page 59: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Dropped Messages

www.thelastpickle.com

Page 60: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

LogsMessagingService.java (line 658) 173 READ messages dropped in last 5000msStatusLogger.java (line 57) Pool Name Active PendingStatusLogger.java (line 72) ReadStage 32 284StatusLogger.java (line 72) RequestResponseStage 1 254StatusLogger.java (line 72) ReadRepairStage 0 0

www.thelastpickle.com

Page 61: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool tpstatsMessage type DroppedRANGE_SLICE 0READ_REPAIR 0BINARY 0READ 721MUTATION 1262REQUEST_RESPONSE 196

www.thelastpickle.com

Page 62: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Causes

Excessive GC.Overloaded IO.

Overloaded Node.Wide Reads / Large Batches.

www.thelastpickle.com

Page 63: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

High Read Latency

www.thelastpickle.com

Page 64: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool infoToken : 113427455640312814857969558651062452225Gossip active : trueThrift active : trueLoad : 291.13 GBGeneration No : 1368569510Uptime (seconds) : 1022629Heap Memory (MB) : 5213.01 / 8025.38Data Center : 1Rack : 20Exceptions : 0Key Cache : size 104857584 (bytes), capacity 104857584 (bytes), 13436862 hits, 16012159 requests, 0.907 recent hit rate, 14400 save period in secondsRow Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

www.thelastpickle.com

Page 65: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool cfstats Column Family: page_views SSTable count: 17 Space used (live): 289942843592 Space used (total): 289942843592 Number of Keys (estimate): 1071416832 Memtable Columns Count: 2041888 Memtable Data Size: 539015124 Memtable Switch Count: 83 Read Count: 267059 Read Latency: NaN ms. Write Count: 10516969 Write Latency: 0.054 ms. Pending Tasks: 0 Bloom Filter False Positives: 128586 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 802906184 Compacted row minimum size: 447 Compacted row maximum size: 3973 Compacted row mean size: 867

www.thelastpickle.com

Page 66: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool cfhistograms KS1 CF1Offset SSTables Write Latency Read Latency Row Size Column Count1 178437 0 0 0 02 20042 0 0 0 03 15275 0 0 0 04 11632 0 0 0 05 4771 0 0 0 06 4942 0 0 0 07 5540 0 0 0 08 4967 0 0 0 010 10682 0 0 0 28415512 8355 0 0 0 1537250814 1961 0 0 0 13795909617 322 3 0 0 62573393020 61 253 0 0 25295354724 53 15114 0 0 3910971829 18 255730 0 0 035 1 1532619 0 0 0...

www.thelastpickle.com

Page 67: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool cfhistograms KS1 CF1Offset SSTables Write Latency Read Latency Row Size Column Count446 0 120 233 0 0535 0 155 261 21361 0642 0 127 284 19082720 0770 0 88 218 498648801 0924 0 86 2699 504702186 01109 0 22 3157 48714564 01331 0 18 2818 241091 01597 0 15 2155 2165 01916 0 19 2098 7 02299 0 10 1140 56 02759 0 10 1281 0 03311 0 6 1064 0 03973 0 4 676 3 0

...

www.thelastpickle.com

Page 68: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

jmx-term$ java -jar jmxterm-1.0-alpha-4-uber.jar Welcome to JMX terminal. Type "help" for available commands.$>open localhost:7199#Connection to localhost:7199 is opened$>bean org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies#bean is set to org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies$>get BloomFilterFalseRatio#mbean = org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies:BloomFilterFalseRatio = 0.5693801541828607;

www.thelastpickle.com

Page 69: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Back to cfstats

Column Family: page_views Read Count: 270075 Bloom Filter False Positives: 131294

www.thelastpickle.com

Page 70: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Causebloom_filter_fp_chance had been set to 0.1

to reduce memory requirements when storing 1+ Billion rows per Node.

www.thelastpickle.com

Page 71: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

FixChanged read queries to select by column

name to limit SSTables per query.

Long term, migrate to Cassandra v1.2 for off heap Bloom Filters.

www.thelastpickle.com

Page 72: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

GC Problems

www.thelastpickle.com

Page 73: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

WARN

WARN [ScheduledTasks:1] 2013-03-29 18:40:48,158 GCInspector.java (line 145) Heap is 0.9355130159566108 full. You may need to reduce memtable and/or cache sizes.

INFO [ScheduledTasks:1] 2013-03-26 16:36:06,383 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 207 ms for 1 collections, 10105891032 used; max is 13591642112

INFO [ScheduledTasks:1] 2013-03-28 22:18:17,113 GCInspector.java (line 122) GC for ParNew: 256 ms for 1 collections, 6504905688 used; max is 13591642112

www.thelastpickle.com

Page 74: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Serious GC ProblemsINFO [ScheduledTasks:1] 2013-04-30 23:21:11,959 GCInspector.java (line 122) GC for ParNew: 1115 ms for 1 collections, 9355247296 used; max is 12801015808

www.thelastpickle.com

Page 75: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Flapping NodeINFO [GossipTasks:1] 2013-03-28 17:42:07,944 Gossiper.java (line 830) InetAddress /10.1.20.144 is now dead.INFO [GossipStage:1] 2013-03-28 17:42:54,740 Gossiper.java (line 816) InetAddress /10.1.20.144 is now UPINFO [GossipTasks:1] 2013-03-28 17:46:00,585 Gossiper.java (line 830) InetAddress /10.1.20.144 is now dead.INFO [GossipStage:1] 2013-03-28 17:46:13,855 Gossiper.java (line 816) InetAddress /10.1.20.144 is now UPINFO [GossipStage:1] 2013-03-28 17:48:48,966 Gossiper.java (line 830) InetAddress /10.1.20.144 is now dead.

www.thelastpickle.com

Page 76: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

“GC Problems are the result of workload and configuration.”

Aaron Morton, Just Now.

www.thelastpickle.com

Page 77: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Workload Correlation?

Look for wide rows, large writes, wide reads, un-

bounded multi row reads or writes.

www.thelastpickle.com

Page 78: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Compaction Correlation?Slow down Compaction to improve stability.concurrent_compactors: 2compaction_throughput_mb_per_sec: 8in_memory_compaction_limit_in_mb: 32

(Monitor and reverse when resolved.)

www.thelastpickle.com

Page 79: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

GC Logging InsightsSlow down rate of tenuring and enable full

GC logging.

HEAP_NEWSIZE="1200M"

JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"

www.thelastpickle.com

Page 80: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

GC’ing Objects in ParNew {Heap before GC invocations=7937 (full 205): par new generation total 1024000K, used 830755K ...) eden space 819200K, 100% used ...) from space 204800K, 5% used ...) to space 204800K, 0% used ...)

Desired survivor size 104857600 bytes, new threshold 4 (max 4)- age 1: 8090240 bytes, 8090240 total- age 2: 565016 bytes, 8655256 total- age 3: 330152 bytes, 8985408 total- age 4: 657840 bytes, 9643248 total

www.thelastpickle.com

Page 81: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

GC’ing Objects in ParNew{Heap before GC invocations=7938 (full 205): par new generation total 1024000K, used 835015K ...) eden space 819200K, 100% used ...) from space 204800K, 7% used ...) to space 204800K, 0% used ...)

Desired survivor size 104857600 bytes, new threshold 4 (max 4)- age 1: 1315072 bytes, 1315072 total- age 2: 541072 bytes, 1856144 total- age 3: 499432 bytes, 2355576 total- age 4: 316808 bytes, 2672384 total

www.thelastpickle.com

Page 82: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Cause

Nodes had wide rows & 1.3+ Billion rows and 3+GB of

Bloom Filters.(Using older bloom_filter_fp_chance of 0.000744.)

www.thelastpickle.com

Page 83: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Fix

Increased FP chance to 0.1 on one CF’s and .01 on others.

(One CF reduced from 770MB to 170MB of Bloom Filters.)

www.thelastpickle.com

Page 84: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Fix

Increased index_interval from 128

to 512.(Increased key_cache_size_in_mb to 200.)

www.thelastpickle.com

Page 85: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Fix

MAX_HEAP_SIZE="8G"HEAP_NEWSIZE="1000M"-XX:SurvivorRatio=4" -XX:MaxTenuringThreshold=2"

www.thelastpickle.com

Page 86: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

PlatformTools

ProblemsMaintenance

www.thelastpickle.com

Page 87: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Maintenance

Expand to Multi DC

www.thelastpickle.com

Page 88: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Expand to Multi DCUpdate Snitch

Update Replication StrategyAdd Nodes

Update Replication FactorRebuild

www.thelastpickle.com

Page 89: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

DC Aware Snitch?

SimpleSnitch puts all nodes in rack1 and datacenter1.

www.thelastpickle.com

Page 90: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

More Snitches?

PropertyFileSnitchRackInferringSnitch

www.thelastpickle.com

Page 91: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Gossip Based Snitch?

Ec2SnitchEc2MultiRegionSnitchGossipingPropertyFileSnitch*

www.thelastpickle.com

Page 92: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Changing the Snitch

Do Not change the DC or Rack for an existing node.

(Cassandra will not be able to find your data.)

www.thelastpickle.com

Page 93: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Moving to the GossipingPropertyFileSnitch

Update cassandra-topology.properties

on existing nodes with existing DC/Rack settings for all existing nodes.

Set default to new DC.www.thelastpickle.com

Page 94: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Moving to the GossipingPropertyFileSnitch

Update cassandra-rackdc.properties

on existing nodes with existing DC/Rack for the node.

www.thelastpickle.com

Page 95: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Moving to the GossipingPropertyFileSnitch

Use a rolling restart to upgrade existing nodes to GossipingPropertyFileSnitch

www.thelastpickle.com

Page 96: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Expand to Multi DCUpdate Snitch

Update Replication StrategyAdd Nodes

Update Replication FactorRebuild

www.thelastpickle.com

Page 97: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Got NTS ?

Must use NetworkTopologyStrategy for Multi DC deployments.

www.thelastpickle.com

Page 98: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

SimpleStrategy

Order Token Ranges. Start with range that contains

Row Key.Count to RF.

www.thelastpickle.com

Page 99: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

SimpleStrategy

"foo"

www.thelastpickle.com

Page 100: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

NetworkTopologyStrategy

Order Token Ranges in the DC. Start with range that contains the Row Key.Add first unselected Token Range from each

Rack.Repeat until RF selected.

www.thelastpickle.com

Page 101: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

NetworkTopologyStrategy

"foo"

Rack 1

Rack 2Rack 3

www.thelastpickle.com

Page 102: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

NetworkTopologyStrategy & 1 Rack

"foo"

Rack 1

www.thelastpickle.com

Page 103: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Changing the Replication Strategy

Be Careful if existing configuration has multiple

Racks.(Cassandra may not be able to find your data.)

www.thelastpickle.com

Page 104: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Changing the Replication Strategy

Update Keyspace configuration to use NetworkTopologyStrategy with datacenter1:3 and new_dc:0.

www.thelastpickle.com

Page 105: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Preparing The ClientDisable auto node discovery or use DC

aware methods.

Use LOCAL_QUOURM or EACH_QUOURM.

www.thelastpickle.com

Page 106: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Expand to Multi DC

Update SnitchUpdate Replication Strategy

Add NodesUpdate Replication Factor

Rebuild

www.thelastpickle.com

Page 107: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Configuring New NodesAdd auto_bootstrap: false to

cassandra.yaml.Use GossipingPropertyFileSnitch.

Three Seeds from each DC.

(Use cluster_name as a safety.)

www.thelastpickle.com

Page 108: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Configuring New NodesUpdate cassandra-rackdc.properties

on new nodes with new DC/Rack for the node.

(Ignore cassandra-topology.properties)

www.thelastpickle.com

Page 109: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Start The New Nodes

New Nodes in the Ring in the new DC without data or

traffic.

www.thelastpickle.com

Page 110: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Expand to Multi DCUpdate Snitch

Update Replication StrategyAdd Nodes

Update Replication FactorRebuild

www.thelastpickle.com

Page 111: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Change the Replication Factor

Update Keyspace configuration to use NetworkTopologyStrategy with dataceter1:3 and new_dc:3.

www.thelastpickle.com

Page 112: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Change the Replication Factor

New DC nodes will start receiving writes from old DC

coordinators.

www.thelastpickle.com

Page 113: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Expand to Multi DCUpdate Snitch

Update Replication StrategyAdd Nodes

Update Replication FactorRebuild

www.thelastpickle.com

Page 114: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Y U No Bootstrap?

DC 1 DC 2

www.thelastpickle.com

Page 115: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

nodetool rebuild DC1

DC 1 DC 2

www.thelastpickle.com

Page 116: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Rebuild Complete

New Nodes now performing Strong Consistency reads.

(If EACH_QUOURM used for writes.)

www.thelastpickle.com

Page 117: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Summary

Relax.Understand the Platform and

the Tools.Always maintain Availability.

www.thelastpickle.com

Page 118: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Thanks.

www.thelastpickle.com

Page 119: Cassandra Community Webinar August 29th 2013 - In Case Of Emergency, Break Glass

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License