Download - C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

Transcript
Page 1: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Dude, Where’s My Tweet? Taming the Twitter Firehose

Andrew Noonan Software Engineer at Gnip

@noonanisms

Page 2: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Gnip

Cassandra

Rainbows

Unicorns

???  

Page 3: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Page 4: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Social Data

Page 5: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

90% of Fortune 500

120 Billion Activities Delivered Per Month

Page 6: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Lots-O-Data

Redundancy & Reliability

Availability

Page 7: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Page 8: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

High Write Throughput ✔

Scalable ✔

Highly Available ✔

Persistent ✔

Page 9: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Right?

Page 10: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Not Exactly…

Page 11: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

No Maintenance? Bad Idea

Begin Maintenance -> 2X Data Growth

Scalable, Right?

Bootstrap Failures Due To Cluster Load

Page 12: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Reconsider (Life) Choices?

Page 13: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Size Tiered Compaction vs Leveled Compaction

How Much Data To Store Per Node

Your Write Pattern Matters Too

Page 14: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

compaction_throughput_mb_per_sec

16-32X write rate?

Lots-o-options – explore them

Page 15: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Lookup by Tweet ID

Read Rate < Write Rate

Dynamic ColumnFamilies

Page 16: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

For realz this time!?

Page 17: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Page 18: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

Bloom Filter False Positive Rate

Index Intervals

Only Change Schema On One Node! (For Now)

Page 19: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

You Won’t Always Fit The Mold and That’s Okay

Explore Your Options No Matter What

Understand The Consequences Of Your Choices

Staging Environment Identical To Production

Page 20: C* Summit 2013: Dude, Where's My Tweet? Taming the Twitter Firehose by Andrew Noonan

#Cassandra2013

www.gnip.com

@noonanisms