The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem...

49
7-21-10 The NoSQL Ecosystem Jonathan Ellis @spyced [email protected] Wednesday, July 21, 2010

Transcript of The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem...

Page 1: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

7-21-10

The NoSQL Ecosystem

Jonathan Ellis@[email protected]

Wednesday, July 21, 2010

Page 2: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Executive summary

✤ NoSQL is about using the right tool for the job

Wednesday, July 21, 2010

Page 3: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

My bias

✤ Started working on Cassandra in 2009 after looking at the alternatives

✤ Co-founded Riptano in April 2010

Wednesday, July 21, 2010

Page 4: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

NoSQL at OSCON

✤ Introduction to MongoDB

✤ Scaling Sourceforge with MongoDB

✤ Hadoop, Pig, and Twitter*

✤ (Plus the Neo4J and Cassandra tutorials Monday and Tuesday)

Wednesday, July 21, 2010

Page 5: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Why NoSQL? 1

✤ Relational databases don’t scale

Wednesday, July 21, 2010

Page 6: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 7: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 8: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 9: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 10: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 11: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 12: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 13: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 14: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 15: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 16: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 17: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 18: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 19: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 20: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 21: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

(“The eBay Architecture,” Randy Shoup and Dan Pritchett)

Wednesday, July 21, 2010

Page 22: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 23: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Why NoSQL? 2

✤ The relational model maps poorly to some problems

✤ Sub-category: almost all NoSQL databases are schema-free or schema-optional to some degree

Wednesday, July 21, 2010

Page 24: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 25: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Why NoSQL? 3

✤ Relational databases are slow

Wednesday, July 21, 2010

Page 26: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 27: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Myth 1

✤ “NoSQL is for people who don’t understand {SQL, denormalization, query tuning, ...}”

✤ Similarly: “Only users of [database X] are turning to NoSQL databases, because X sucks.”

Wednesday, July 21, 2010

Page 28: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

eBay: NoSQL pioneer

✤ “BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.”✤ ”BASE: An Acid Alternative,” Dan Pritchett, eBay

Wednesday, July 21, 2010

Page 29: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Scale forces tradeoffs

Wednesday, July 21, 2010

Page 30: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Myth 2

✤ “NoSQL is nothing new because we had key/value databases like bdb years ago.”

Wednesday, July 21, 2010

Page 31: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Myth 3

✤ “Only huge sites like Facebook and Twitter need to care about scalability.”

Wednesday, July 21, 2010

Page 32: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

The downside to NoSQL-as-identifier

Wednesday, July 21, 2010

Page 33: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Evaluating NoSQL databases

✤ Data model / query language

✤ Scalability / availability

✤ Persistence

Wednesday, July 21, 2010

Page 34: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Data model

✤ Document

✤ CouchDB, MongoDB, Riak

✤ ColumnFamily

✤ Cassandra, HBase

✤ Graph

✤ Neo4j, AllegroGraph, Objectivity InfiniteGraph

✤ Collections

✤ Redis

✤ Key/value

✤ bdb, bitcask, Memcached, Tokyo Cabinet

Wednesday, July 21, 2010

Page 35: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Document queries

✤ CouchDB

✤ js map/reduce creates [materialized] views that may be queried

✤ MongoDB

✤ b-tree indexes allow querying documents by field

✤ Riak

✤ link-walking or [runtime] js map/reduce

Wednesday, July 21, 2010

Page 36: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

ColumnFamily queries

SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?)

followers

?

tweets

timeline

?

Wednesday, July 21, 2010

Page 37: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Persistence

✤ Classic B-tree✤ bdb, TC, MongoDB

✤ Append-only B-tree✤ CouchDB

✤ On-disk linked lists✤ Neo4J

✤ Pluggable✤ Riak, Voldemort

✤ SSTable✤ Cassandra, HBase

✤ Memory-only✤ Memcached, VoltDB

✤ Memory w/checkpoint✤ Membase, Redis

Wednesday, July 21, 2010

Page 38: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Durable

✤ bdb

✤ Cassandra

✤ CouchDB

✤ Neo4J

✤ Riak*, Voldemort*

Wednesday, July 21, 2010

Page 39: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 40: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

pathExists(a, b, 4)

1 000 2 000 ms 1 000 2 ms 1 000 000 2 ms

Wednesday, July 21, 2010

Page 41: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Commitlog

MemtableWriterReader

The Log-Structured Merge-Tree,Bigtable: A Distributed Storage System for Structured Data

Wednesday, July 21, 2010

Page 42: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Scalability

✤ Master-driven vs distributed replicas

Wednesday, July 21, 2010

Page 43: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Lock manager

Wednesday, July 21, 2010

Page 44: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 45: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Wednesday, July 21, 2010

Page 46: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

CAP

✤ Consistency

✤ Availability

✤ Partition tolerance

Wednesday, July 21, 2010

Page 47: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

A

L

T

W

F

P

Y Key K

U

Multi-DC withdistributed

replicas

Wednesday, July 21, 2010

Page 48: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

CA

✤ Scalaris

✤ VoltDB

Wednesday, July 21, 2010

Page 49: The NoSQL Ecosystem - O'Reilly Mediaassets.en.oreilly.com/1/event/45/The NoSQL Ecosystem Presentation.pdf · The NoSQL Ecosystem Jonathan Ellis @spyced ... denormalization, query

Conclusion

✤ “If you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL data store”

Wednesday, July 21, 2010