20110515 cassandra linuxfb

13
Cassandra and NoSQL Database Wang Xu [email protected] May, 2011 . . 1 / 13 . Cassandra and NoSQL Database .

Transcript of 20110515 cassandra linuxfb

Page 1: 20110515 cassandra linuxfb

Cassandra and NoSQL Database

Wang Xu

[email protected]

May, 2011

..1 / 13

.

Cassandra and NoSQL Database

. ▲

Page 2: 20110515 cassandra linuxfb

Outline

...1 Cassandra in Greek Mythology

...2 Brief History of Cassandra Project

...3 NoSQL and Big Data

...4 Eventual Consistency

...5 Bigtable and Dynamo

...6 Some Highlight Detail

...7 Pieces about the Book and Translation

..2 / 13

.

Cassandra and NoSQL Database

. ▲

Page 3: 20110515 cassandra linuxfb

Cassandra in Greek Mythology

..

She Could Foresee the Future.

.. Daughter of King Priam and Queen Hecuba of Troy

.. Apollo gave her the ability to see the future.

.. No one would believe her.

..

Some related....

.. Delphi, Oracle

.. Hector

..3 / 13

.

Cassandra and NoSQL Database

. ▲

Page 4: 20110515 cassandra linuxfb

Brief History of Cassandra Project

..

The important players in Cassandra Community.

.. Facebook create Cassandra for their inbox search.

.. Facebook donate Cassandra to Apache Software Foundation.

.. Rackspace become a leader contributor in community.

.. Twitter detonate the Cassandra discussion, but. . .

.. Digg also actively participates the development.

..

And the releases.

.. 0.7 introduces runtime schema modification

.. 0.8 (Beta) introduces a query language named as CQL

..4 / 13

.

Cassandra and NoSQL Database

. ▲

Page 5: 20110515 cassandra linuxfb

No-SQL or Not-Only-SQL

..

NoSQL is Blooming in the recent decade.

.. Columnar: HBase in the Hadoop Community follows the designof Google Bigtable.

.. Doucument-base: MongoDB is used in Foursqure and otherpopular site.

.. Key-value: Redis is dramatic fast.

.. Graph: Neo4j and other Graph Database is suit for SocialNetwork and Semantic Web.

..

Quotes:.. . . the term “Big Data” to highlight the fact that this family ofnonrelational databases is not defined by they’re not (implemen-tations of SQL), but rather by what they do (handle huge dataloads).

..5 / 13

.

Cassandra and NoSQL Database

. ▲

Page 6: 20110515 cassandra linuxfb

Eventual Consistency and Brewer’s CAPTheory

..

Brewer’s CAP Theory.

Figure: Databases in CAP Continuum

..6 / 13

.

Cassandra and NoSQL Database

. ▲

Page 7: 20110515 cassandra linuxfb

Bigtable: Column Family based Data Model

..

Bigtable and Column Family based Data Model (in Cas-sandra).

.. Google Columnar DB, build upon Google GFS

.. Both HBase and Cassandra follow Bigtable’s Data Model

.. Keyspace vs. Database, Column Family vs. Table,

.. Sparse table, every column is a name/value pair, rather than asingle value.

.. Columns are sorted and could query range of columns

.. Insert or update a column for a row-key is the same.

.. Cassandra has “Super Column”

..7 / 13

.

Cassandra and NoSQL Database

. ▲

Page 8: 20110515 cassandra linuxfb

Dynamo: DHT Based Decentralized Storage

..

It’s a DHT Ring.

.. Dynamo is designed for Amazon’s “Shopping Cart”.

.. Cassandra is based on Dynamo’s Decentralized architecture.

.. Dynamo is fully decentralized, or say structured P2P.

.. Routing information is maintained by “Gossip”.

.. Repair data while reading

.. Clock vector vs. timestamp

.. Anti-Entropy and Merkle Tree

..8 / 13

.

Cassandra and NoSQL Database

. ▲

Page 9: 20110515 cassandra linuxfb

Memtable, Commit-log, and SSTable

..

Append write vs. Random write.

.. Write into Memtable (in memory), and commit log (on disk,append)

.. Memtable is flushed to SSTable

.. SSTable will be Compact periodically or triggered by nodetool

.. Commit log is read only during repair

..

Bloom Filter.

.. Disk acces is expensive

.. Bloom Filter is used for accelorating element search

..9 / 13

.

Cassandra and NoSQL Database

. ▲

Page 10: 20110515 cassandra linuxfb

Trade-off between Available and Consistency

..

Different Consistency level.

.. CL.ZERO

.. CL.ANY (Hinted Hand-off)

.. CL.ONE

.. CL.QUORUM

.. CL.ALL

..10 / 13

.

Cassandra and NoSQL Database

. ▲

Page 11: 20110515 cassandra linuxfb

SEDA for Performance

..

Threading pool and IO.

.. Operations are separated as Stages

.. Stages are specified for special resources such as CPU and IO

.. Stages are driven by Executors (Threadpool)

.. Stages could be observated through JMX

..11 / 13

.

Cassandra and NoSQL Database

. ▲

Page 12: 20110515 cassandra linuxfb

Pieces about the Book and Translation

..

How about the Book.

.. The only one focus on Cassandra

.. Give you the big picture of NoSQL and Cassandra

.. Not excellent, but still useful

.. Some repeat content and codes . . .

..

Is it a funny job?.

.. It took me about 3 months

.. More than half of it is finished in the last month.

.. Now I feel well and do not want to translate another one soon.

..12 / 13

.

Cassandra and NoSQL Database

. ▲

Page 13: 20110515 cassandra linuxfb

Q & A

..13 / 13

.

Cassandra and NoSQL Database

. ▲