HPTS 2011: The NoSQL Ecosystem

69
The NoSQL Ecosystem Adam Marcus MIT CSAIL [email protected] / @marcua

description

My talk for HPTS 2011, summarizing the NoSQL ecosystem and what systems builders can learn from NoSQL's success.

Transcript of HPTS 2011: The NoSQL Ecosystem

Page 1: HPTS 2011: The NoSQL Ecosystem

The NoSQL Ecosystem

Adam MarcusMIT CSAIL

[email protected] / @marcua

Page 2: HPTS 2011: The NoSQL Ecosystem

About Me

● Social Computing + Database Systems● Easily Distracted: Wrote The NoSQL Ecosystem in

The Architecture of Open Source Applications 1

1 http://www.aosabook.org/en/nosql.html

Page 3: HPTS 2011: The NoSQL Ecosystem

History, Compressed

Late 1990s Today

Page 4: HPTS 2011: The NoSQL Ecosystem

History, Compressed

Late 1990s Today

Buy RAM

Page 5: HPTS 2011: The NoSQL Ecosystem

History, Compressed

Late 1990s

Wrap RDBMSs

Today

Buy RAM

Page 6: HPTS 2011: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM

Page 7: HPTS 2011: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM

Not Only SQL

Page 8: HPTS 2011: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM Commercialize

Page 9: HPTS 2011: The NoSQL Ecosystem

History, Compressed

NoSQL

Late 1990s

Wrap RDBMSs

Today

Buy RAM Commercialize

Page 10: HPTS 2011: The NoSQL Ecosystem

The List, So You Don't Yell at Me

Cassandra

HBase

Voldemort

Riak

RedisMongoDB

HyperTable

Neo4j

HyperGraphDBDEX

InfoGrid

VertexDB

Sones

CouchDB

AllegroGraph

BerkeleyDB

Oracle NoSQLMemcacheDBTokyo Cabinet

FlockDB

Page 11: HPTS 2011: The NoSQL Ecosystem

The List, So You Don't Yell at Me

Cassandra

HBase

Voldemort

Riak

RedisMongoDB

HyperTable

Neo4j

HyperGraphDBDEX

InfoGrid

VertexDB

Sones

CouchDB

AllegroGraph

BerkeleyDB

Oracle NoSQLMemcacheDBTokyo Cabinet

FlockDB

Marcus' Law of Databases:

The number of persistence options doubles every 1.5 years

Page 12: HPTS 2011: The NoSQL Ecosystem

The List, So You Don't Yell at Me

Cassandra

HBase

Voldemort

Riak

RedisMongoDB

HyperTable

Neo4j

HyperGraphDBDEX

InfoGrid

VertexDB

Sones

CouchDB

AllegroGraph

BerkeleyDB

Oracle NoSQLMemcacheDBTokyo Cabinet

FlockDB

Marcus' Law of Databases:

The number of persistence options doubles every 1.5 years

Corollary:I'm sorry I missed your

persistence option

Page 13: HPTS 2011: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 14: HPTS 2011: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 15: HPTS 2011: The NoSQL Ecosystem

Conventional Wisdom on NoSQL

● Key-based data model● Sloppy schema● Single-key transactions● In-app joins● Eventually consistent

Page 16: HPTS 2011: The NoSQL Ecosystem

Exceptions are More Interesting

● Data Model● Query Model● Transactions● Consistency

Page 17: HPTS 2011: The NoSQL Ecosystem

Data Model

● Usually key-based...● Column-families● Documents● Data structures

Page 18: HPTS 2011: The NoSQL Ecosystem

Data Model

● Usually key-based...● Column-families● Documents● Data structures

● ...but not always● Graph stores

Page 19: HPTS 2011: The NoSQL Ecosystem

Query Model

● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no

joins or transactions

Page 20: HPTS 2011: The NoSQL Ecosystem

Query Model

● Redis: data structure-specific operations● CouchDB, Riak: MapReduce● Cassandra, MongoDB: SQL-like languages, no

joins or transactions● Third-party

● High-level: PigLatin, HiveQL● Library: Cascading, Crunch● Streaming: Flume, Kafka, S4, Scribe

Page 21: HPTS 2011: The NoSQL Ecosystem

Transactions

● Full ACID for single key● Redis: multi-key single-node transactions

Page 22: HPTS 2011: The NoSQL Ecosystem

Consistency

● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions

Page 23: HPTS 2011: The NoSQL Ecosystem

Consistency

● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions

● Dynamo: strong or eventual (quorum size)

Page 24: HPTS 2011: The NoSQL Ecosystem

Consistency

● Strong: Appears that all replicas see all writes● Eventual: Replicas may have different, divergent versions

● Dynamo: strong or eventual (quorum size)

● PNUTs: Timeline consistency● ...many consistency models!

See: http://www.allthingsdistributed.com/2007/12/eventually_consistent.html

Page 25: HPTS 2011: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 26: HPTS 2011: The NoSQL Ecosystem

NoSQL Use-cases

● Cassandra● HBase● MongoDB

Page 27: HPTS 2011: The NoSQL Ecosystem

Cassandra

● BigTable data model: key column family● Dynamo sharding model: consistent hashing● Eventual or strong consistency

Page 28: HPTS 2011: The NoSQL Ecosystem

Cassandra at Netflix

● Transitioned from Oracle● Store customer profiles, customer:movie watch

log, and detailed usage logging

Note: no multi-record locking (e.g., bank transfer)

In-datacenter: 3 replicas, per-app consistency

read/write quorum = 1, ~1ms latency

read/write quorum = 2, ~3ms latency

“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

Page 29: HPTS 2011: The NoSQL Ecosystem

Cassandra at Netflix

● Transitioned from Oracle● Store customer profiles, customer:movie watch

log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer)

In-datacenter: 3 replicas, per-app consistency

read/write quorum = 1, ~1ms latency

read/write quorum = 2, ~3ms latency

“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

Page 30: HPTS 2011: The NoSQL Ecosystem

Cassandra at Netflix

● Transitioned from Oracle● Store customer profiles, customer:movie watch

log, and detailed usage logging ● Note: no multi-record locking (e.g., bank transfer)

● In-datacenter: 3 replicas, per-app consistency

“Replicating Datacenter Oracle with Global Apache Cassandra on AWS” by Adrian Cockcroft http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra

Page 31: HPTS 2011: The NoSQL Ecosystem

Cassandra at Netflix (cont'd)

● Benefit: async inter-datacenter replication● Benefit: no downtime for schema changes● Benefit: hooks for live backups

Page 32: HPTS 2011: The NoSQL Ecosystem

HBase

● Data model: key column family● Sharding model: range partitioning● Strong consistency

Applications

Logging events/crawls, storing analytics

Twitter: replicate data from MySQL, Hadoop analytics

Facebook Messages

Page 33: HPTS 2011: The NoSQL Ecosystem

HBase

● Data model: key column family● Sharding model: range partitioning● Strong consistency

● Applications● Logging events/crawls, storing analytics● Twitter: replicate data from MySQL, Hadoop analytics● Facebook Messages

Page 34: HPTS 2011: The NoSQL Ecosystem

HBase for Facebook Messages

● Cassandra/Dynamo eventual consistency was difficult to program against

Benefit: simple consistency model

Benefit: flexible data model

Benefit: simple sharding, load balancing, replication

Page 35: HPTS 2011: The NoSQL Ecosystem

HBase for Facebook Messages

● Cassandra/Dynamo eventual consistency was difficult to program against

● Benefit: simple consistency model● Benefit: flexible data model● Benefit: simple sharding, load balancing,

replication

Page 36: HPTS 2011: The NoSQL Ecosystem

MongoDB

● Document-based data model● Range-based partitioning● Consistency depends on how you use it

Page 37: HPTS 2011: The NoSQL Ecosystem

MongoDB: Two use-cases

● Archiving at Craigslist● 2.2B historical posts, semi-structured● Relatively large blobs: avg 2KB, max > 4 MB

Page 38: HPTS 2011: The NoSQL Ecosystem

MongoDB: Two use-cases

● Archiving at Craigslist● 2.2B historical posts, semi-structured● Relatively large blobs: avg 2KB, max > 4 MB

● Checkins at Foursquare● Geospatial indexing● Small location-based updates, sharded on user

Page 39: HPTS 2011: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 40: HPTS 2011: The NoSQL Ecosystem

Takeaways

● Developer accessibility● Ecosystem of reuse● Soft spot

Page 41: HPTS 2011: The NoSQL Ecosystem

Developer Accessibility

Page 42: HPTS 2011: The NoSQL Ecosystem

Developer Accessibilitydevops

@DEVOPS_BORAT

Page 43: HPTS 2011: The NoSQL Ecosystem

Developer Accessibilitydevops

@DEVOPS_BORAT

self-taught dev

Page 44: HPTS 2011: The NoSQL Ecosystem

A Different Five-Minute Rule

What does a first-time user of your system experience in their first five minutes? (idea credit: Justin Sheehy, Basho)

Page 45: HPTS 2011: The NoSQL Ecosystem

Redis

http://simonwillison.net/2009/Oct/22/redis/

Page 46: HPTS 2011: The NoSQL Ecosystem

PostgreSQL

http://www.postgresql.org/docs/9.1/interactive/sql-createtable.html

Page 47: HPTS 2011: The NoSQL Ecosystem

Cassandra

http://www.datastax.com/docs/0.7/getting_started/using_cli

Page 48: HPTS 2011: The NoSQL Ecosystem

Accessibility is Forever

Beyond five minutes: ● changing schemas● scaling up● modifying topology

Page 49: HPTS 2011: The NoSQL Ecosystem

Accessibility is Forever

Beyond five minutes: ● changing schemas● scaling up● modifying topology

An accessible data store will ease first-time users in, and support

experienced users as they expand

Page 50: HPTS 2011: The NoSQL Ecosystem

Ecosystem of Reuse

Page 51: HPTS 2011: The NoSQL Ecosystem

Spectrum of Reusability

Monolithic System System Kernel

Page 52: HPTS 2011: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

Use as provided

Monolithic System System Kernel

Page 53: HPTS 2011: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

CassandraVoldemort

Use as provided

Pick between components

Monolithic System System Kernel

Page 54: HPTS 2011: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

CassandraVoldemort

Use as provided

Pick between components

ZooKeeperLevelDB

Reusablecomponents

Monolithic System System Kernel

Page 55: HPTS 2011: The NoSQL Ecosystem

Spectrum of Reusability

MongoDBRedis

CassandraVoldemort

Use as provided

Pick between components

riak_coreZooKeeperLevelDB

Reusablecomponents

Build yourown system

Monolithic System System Kernel

Page 56: HPTS 2011: The NoSQL Ecosystem

And finally, a soft spot

Page 57: HPTS 2011: The NoSQL Ecosystem

polyglot persistence=

right tool for right task

Page 58: HPTS 2011: The NoSQL Ecosystem

Polyglot persistence: Where's the data?

“HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley

Page 59: HPTS 2011: The NoSQL Ecosystem

Polyglot persistence: Where's the data?

“HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley

Page 60: HPTS 2011: The NoSQL Ecosystem

Polyglot persistence: Where's the data?

“HBase at Mendeley,” Dan Harvey http://www.slideshare.net/danharvey/hbase-at-mendeley

Aid developers in reasoning about data consistency across multiple storage engines

Page 61: HPTS 2011: The NoSQL Ecosystem

interesting properties

real-world usage

takeaways

questions

Page 62: HPTS 2011: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?

Page 63: HPTS 2011: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?

Page 64: HPTS 2011: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?

Page 65: HPTS 2011: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?

Page 66: HPTS 2011: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?

Page 67: HPTS 2011: The NoSQL Ecosystem

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores?

Page 68: HPTS 2011: The NoSQL Ecosystem

Thank You!

Adam [email protected] / @marcua

● Systems: Polyglot persistence + data consistency?● Operations: Availability/consistency/latency tradeoffs in

datacenters?● Accessibility: RDBMS five-minute usability?● Accessibility: Scale-up wizard for storage systems?● Comparisons: Design or implementation?● Future: Next-generation NoSQL stores?

Page 69: HPTS 2011: The NoSQL Ecosystem

Photo Credits

http://www.flickr.com/photos/dhannah/4577657955/

http://www.flickr.com/photos/27316226@N02/3000888100/sizes/m/in/photostream/

http://www.texample.net/tikz/examples/valentine-heart/

http://voltdb.com/sites/default/files/VCE_button.png