State of Cassandra, August 2010

Post on 01-Nov-2014

7.787 views 2 download

Tags:

description

Keynote from the 2010 Cassandra Summit

Transcript of State of Cassandra, August 2010

Professional Cassandra support and services

Tuesday, August 10, 2010

Cassandra: Present & FutureJonathan Ellis

@spyced

Tuesday, August 10, 2010

Cassandra 0.6 & 0.7Jonathan Ellis

@spyced

Tuesday, August 10, 2010

Quiet change of policy

• 0.5.1 was bug fixes only

• Too early to be strict about bugfix-only policy in stable branch, especially w/ 0.7 being longer/more break-y

• Maybe after 1.0?

Tuesday, August 10, 2010

0

375

750

1125

1500

Jan(0.5)

Feb(0.5.1) Mar

Apr(0.6, 0.6.1)

May(0.6.2)

Jun(0.6.3)

Jul(0.6.4)

mails sent

Tuesday, August 10, 2010

Lots of bug fixes

• 85 issues marked Resolved/Fixed in 0.6 branch after 0.6 released

Tuesday, August 10, 2010

Runtime configuration

• concurrent reads, writes (0.6.2)

• making it easier to bandage your foot after you shoot it

• PhiConvictThreshold (0.6.2)

Tuesday, August 10, 2010

Performance

• JVM GC defaults (0.6.2)

• Faster commitlog (0.6.2)

• Faster range slice, Hadoop jobs (0.6.1, 2)

• Better parallelization of multiget (0.6.4)

• UTF8Type, UUIDType optimizations (0.6.5)

Tuesday, August 10, 2010

Bulletproofing

• HH disable (0.6.2)

• compaction priority (0.6.3)

• HH hourly scan (0.6.3)

• JMX metrics for row-level bloom filters (0.6.3)

• Flow control (0.6.4, 5)

• HH paging (0.6.5)

• Dynamic snitch (0.6.5)

Tuesday, August 10, 2010

Hinted Handoff

• 0.6.0: send hints to natural replicas

• 0.6.0: fix row-level concurrency bottleneck

• 0.6.2: option to disable entirely

• 0.6.3: remove hourly scan

• 0.6.4: lower priority

• 0.6.5: paging of large hinted rows

• 0.7.0: large rows

Tuesday, August 10, 2010

Why keep HH around?

https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/

Tuesday, August 10, 2010

Compaction priority

-XX:+UseThreadPriorities \-XX:ThreadPriorityPolicy=42 \-Dcassandra.compaction.priority=1 \

Extended to HH in 0.6.4

Tuesday, August 10, 2010

http://www.javamex.com/tutorials/threads/priority_what.shtml

Tuesday, August 10, 2010

JMX for bloom filters

• o.a.c.db:ColumnFamilyStores

• getBloomFilterFalsePositives

• [not in nodetool yet]

Tuesday, August 10, 2010

Flow control in 0.5

• Why backpressure doesn’t fit Cassandra

Tuesday, August 10, 2010

Flow Control in 0.6.4

• Replica nodes drop hopeless requests on the floor

• Coordinator node is unaffected

• TimedOutException signals client to back off

• Requires enough memory to buffer RPCTimeout’s worth of requests

• (In the short term, you’re still screwed)

Tuesday, August 10, 2010

Flow Control, 0.6.4IncomingTcpConnection

Message Deserializer

MutationRead

Uncapped

Capped at 4096

Tuesday, August 10, 2010

IncomingTcpConnection

Message Deserializer

MutationRead Gossip

Tuesday, August 10, 2010

Flow Control, 0.6.5IncomingTcpConnection

MutationRead Gossip Uncapped

Tuesday, August 10, 2010

Dynamic snitch

• sortByProximity

Tuesday, August 10, 2010

Open problems

• Linux/mmap/swap unholy trio (0.6.5)

• Memory fragmentation (0.6.5?)

• Compaction effect on caches (0.7.1?)

Tuesday, August 10, 2010

mmap and swap

• The problem

• Mitigations

• mmap_index_only

• swappiness=0

• turn off swap

• mlockall at startup (Xms=Xmx)

Tuesday, August 10, 2010

GC Fragmentation

• Culprit of infamous CASSANDRA-1014?

• Mitigation: tune with much larger new generation / tenuring threshold?

Tuesday, August 10, 2010

Compaction and caches

• Compactions wrecks the OS fs cache

• Wrecks Cassandra key cache, too

• (but not row cache)

Tuesday, August 10, 2010

0.7

Tuesday, August 10, 2010

New in 0.7

• live schema changes

• large rows

• secondary indexes

• efficient Streaming

• DatacenterStrategy

Tuesday, August 10, 2010

Large rows

• 0.6: smaller of {2GB, memory limit}

• 0.7: in_memory_compaction_limit_in_mb

Tuesday, August 10, 2010

Secondary indexes

Tuesday, August 10, 2010

A

L

T

W

F(A-L]

Streaming in 0.6

Tuesday, August 10, 2010

A

L

T

W

F(A-F]

(F-L]

(A-F]

Tuesday, August 10, 2010

A

L

T

W

F

Data

Index

Filter

Tuesday, August 10, 2010

A

L

T

W

F

Index

Filter

Streaming in 0.7

Tuesday, August 10, 2010

DatacenterStrategy

• RackAwareStrategy is tuned for 3 replicas and 2 data centers

• DS allows configuring replicas per data center, per Keyspace

Tuesday, August 10, 2010

Minor features in 0.7

• read_repair_chance

• per-keyspace request scheduling

• Hadoop OutputFormat

• Per CF what used to be global (gc_grace_seconds, memtable thresholds)

Tuesday, August 10, 2010

0.7 API changes

• String keys become byte[]

• Thrift keyspace argument moved to set_keyspace

• i64 timestamp becomes Clock

• SlicePredicate for _count methods

Tuesday, August 10, 2010

0.7 performance

• Reads roughly 100% faster, thanks largely to removing String creation

• Row-cached reads up to 8x faster after optimizations by tjake and jbellis

• Optimizations for reads of large rows

• 0.7.1? ~20% improvement everywhere from Thrift optimizations

Tuesday, August 10, 2010

Thrift

• OOMs on malformed packets

• Python Unicode string issues

• PHP support is buggy and maintainerless

Tuesday, August 10, 2010

After 0.7.0

• IndexOperator.GT

• Triggers / plugins

• Avro?

• On-disk data format improvements (Compression, heirarchical data?)

• Auth

Tuesday, August 10, 2010

Questions

Tuesday, August 10, 2010