The Big Data Revolution is an Evolution
-
Upload
planet-cassandra -
Category
Technology
-
view
855 -
download
1
description
Transcript of The Big Data Revolution is an Evolution
![Page 1: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/1.jpg)
Eric Lubow
@elubow
The Big Data Revolution is an
![Page 2: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/2.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Overvie• Evolution
• SimpleReach
• Data Stores / Languages
• Architecture Implementation
![Page 3: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/3.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
We're in the midst of an evolution, not a revolution.
![Page 4: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/4.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
The 2 Truths
![Page 5: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/5.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Even with the right tools, 80% of the work of building a big data system is acquiring and refining
The Real Truth
![Page 6: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/6.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
30m plays/day + 4m user ratings + 75k movies metadata + 24.4m users metadata =
David Fincher + Kevin Spacey + British House of
Cards
Mitch Hurwitz + Will Arnett + Jason Bateman + Arrested
Development
![Page 7: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/7.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
BRING IT TOGETHE
![Page 8: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/8.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
evolutionrevolutionInsufficient Capabilities
Scale/Need Changes
Development & Integration
New Products
![Page 9: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/9.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
![Page 10: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/10.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
![Page 11: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/11.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
• Millions of URLs per day
• Over 1 billion pageviews per month
• 250m events per day (~3k events/second)
• Auto-scale 90-130 machines depending on traffic
SimpleReach
![Page 12: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/12.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
HUMBLE BEGINNINGS
![Page 13: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/13.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Scale
![Page 14: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/14.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
AND THEN...
C*
![Page 15: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/15.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
• Large data volume ingestion at high velocity
• Really fast writes to many locations (eventual consistency)
• Query by column groups within rows (slicing)
• TTLs for small group aggregation
• Wrote Helenus, Node.js driver for Cassandra
Cassandra C*
![Page 16: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/16.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
• Fast atomic increments (Node.js is native JSON)
• Sharding
• Solid ORM for Rails (MongoID)
• B-Tree Indexes
• Document based via JSON
• TTLs for ephemeral data
MongoDB
![Page 17: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/17.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
• Supports hundreds of thousands transactions per second
• Great caching engine
• Supports useful variable types like sets, sorted set, lists
• Everything is guaranteed to be Memory Mapped
Redis
![Page 18: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/18.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
• Works with standard MySQL driver
• Column Stores for ad-hoc analytics queries in SQL
• Heavy compression of data (avg 12:1)
Infobright
![Page 19: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/19.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
• Polyglottany doesn’t only apply to data stores
• Each language has its own benefit to each stack layer
• Each language has its own individual benefits
• Each language has its own development benefits
The c0dez
![Page 20: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/20.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
![Page 21: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/21.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Cons• Redis - Can only utilize a single core. SerDe price.
• Infobright - DELETE/UPDATEs are VERY expensive
• Cassandra - No btree indexes or probabilistic counters
• Mongo - Indexes must fit in memory. Forced Replica ping times
• Python - Whitespace. Community
• Ruby - Not high performance enough for our standards
![Page 22: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/22.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Evolution Takes Work• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
• Built framework for testing out engines (Storage, Queueing, etc)
• Access to many toolsets (for all languages, DBs, Engines)
![Page 23: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/23.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Service
Internal API
Solr
Real-timeC*
C*
![Page 24: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/24.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Path of a Packet
InternetEP
Inte
rnal
API
Solr
C*
Mong
Redis
IB
API
Fire Hos
SC
Cons
umer
s
Que
ue
![Page 25: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/25.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Architecture DistributionUS-EAST-1a
MONGO-SHARD-0001-B
MONGO-SHARD-0000-A
CASSANDRA-0001
CASSANDRA-0010
REDIS-0001A
INFOBRIGHT-0001
iAPI-0001
US-EAST-1b
MONGO-SHARD-0002-B
MONGO-SHARD-0001-A
CASSANDRA-0002
CASSANDRA-0011
REDIS-0001B
iAPI-0002
US-EAST-1e
MONGO-SHARD-0002-A
MONGO-SHARD-0000-B
CASSANDRA-0003
CASSANDRA-0012
INFOBRIGHT-0002
iAPI-0003
![Page 26: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/26.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
The Schrute of the Problem
![Page 27: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/27.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Evolving Amazon Tools• Full Featured API
• Simple Queuing Service
• Data Pipelining
• OpsWorks
• Cloud Formation
• Redshift Analytics
• CloudSearch
• Elastic Beanstalk
• Elastic MapReduce
• Simple Workflow Coordinator
• S3 / Glacier
![Page 28: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/28.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
DevOps Wizardry• Extensive use of AWS
• Monitor: Nagios, Statsd, and Graphite
• Manage: Chef, OpsWorks, cSSHx
• Deployments
![Page 29: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/29.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Summary• Solutions Require Evolution
• Build, Use, and Integrate Tools
• Abstraction
• Distribution
• Monitoring & Automation
![Page 30: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/30.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
A revolution only lasts fifteen years, a period which coincides with the
Evolution Takes Time
![Page 31: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/31.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
We’re (Ask us about Food Coma Fridays)
![Page 32: The Big Data Revolution is an Evolution](https://reader034.fdocuments.us/reader034/viewer/2022042713/548277825806b51a058b46c9/html5/thumbnails/32.jpg)
Big Data Revolution is an Evolution
Eric Lubow @elubow #NYCassandra2013
Questions are guaranteed in life.Answers aren’t.
Eric Lubow
@elubow
Thank you.