NoSQL Databases

45

Transcript of NoSQL Databases

Agenda● History● Relational databases● Horizontal vs vertical scaling● CAP theorem● Document databases● Key value databases● Graph databases● Column family databases

History● Non SQL (not traditional tabular database)● Facebook, Google, Amazon..etc (Big data and real

time applications)● Horizontal scaling is a problem in relational

database● Not only SQL (SQL like queries)

Relational Databases :)● MySQL, Oracle, SQL Server, Postgres..etc● Carpenter Hammer● Easy & Popular● Avoid data duplication but complex queries● Atomicity (transactions)

Relational Databases :(

● Defined schema, optional attributes (NULLs)● Use joins to aggregate related data● Large data VOLUME and high rate of READ

(scalability)

Scaling

source: https://commons.wikimedia.org/wiki/File:They_started_our_car_by_pushing_it_backwards_up_the_hill!_(3854246685).jpg

Scaling

source: http://slashnode.com/the-12-factor-php-app-part-2/

Horizontal (Sharding)

Horizontal (Master-Slave Replication)

CAP Theorem

● Consistency (all nodes see the same data at the same time)

CAP Theorem

● Availability (every request definitely receives a response with success or failure)

CAP Theorem

● Partition tolerance (the system continues to operate )

Pick

Only

“TWO”

source: http://www.abramsimon.com/

CAP Proof

Eventually Consistent

SQL Vs NoSQLRelational Databases NoSQL Databases

Vertical and not too many horizontal Horizontal scaling

Consistent Consistent or Eventual consistent

Scalable reads Scalable reads/writes

Transactions on multiple tables Difficult to support transactions

No partition tolerance Partition tolerance

Schema/tables Schemaless

Flexible queries (joins) Limited queries

1) Document Databases● Simple & popular● Close to relational database● MongoDB was a rising star in 2009

1) Document Databases● Simple & Popular● Seven Databases in Seven Weeks

JSON Document Vs Row

● Document Vs Row● Collection Vs Table● Nesting no joins● Query in sub-doc● Duplicate data to

avoid joins● Schemaless

MongoDB CP● Consistency Master-Slave (elections)

● CouchDB is AP

MongoDB Conclusion● Simple● Scalable● Embedded document● CP● No joins● May need to duplicate data● Writes should go through master node● Built-in Geo-spatial support

2) Key-Value Databases● Light & compact● Hash table (values; text, blob, json, image..etc)● Reads are fast, writes are faster

Key-Value Databases

● Redis Hash

Redis Complex Data Types● List

Redis Complex Data Types● Blocking List

Redis Complex Data Types● Publish-Subscribe

Redis Complex Data Types● Set

Redis Complex Data Types● Expiry Caching

Redis in Memory

● No instant persistency by default in memory

● Persist periodically by taking snapshots

Redis CP● Sharding (A,B,C)● Replication A => A1, B => B1, C => C1● If master B fails, B1 is the promoted to be a master● Redis is NOT strong consistent (if both A, A1 fails)

● Riak is AP

Redis Conclusion● Light & Compact● Key-value● Complex data types● Fast in memory● Dataset should be less than RAM size● Transforming data, caching, messaging● CP but not strongly consistent● Flexible persistence levels● Rarely used alone

3) Graph Databases

● Directed graph

● Node has properties

● Relation has properties

Graph Databases

Graph Databases

Graph Databases (AP)

● Tens of billions of nodes and edges● No Sharding; replicate all the graph● High availability over Consistency● Elect a gold master but writes to

slaves directly● Community edition is free but full

version is NOT

4) Column-Family Databases

Row family database:

● Many columns● Seek disk operation● Low compression

rate

Column-Family Databases

● In RDBMS, heavy writes, so store rows as a bulk

● In columns, heavy reads, store columns together

HBase● Database for HDFS (RDBMS vs files)● Widely used with Hadoop● Scalability! At least five nodes in

production● Facebook messaging system

infrastructure 2010

HBase Column Family

HBase Column Family● Key-Value pairs

(Map of maps)● Column families

should be defined but the columns are schema-less

HBase Versioning● Versioning● It became map of map

of map (asc, asc, desc)● Garbage collector for

expired data● Everything is binary● Compression rate

FB Messaging Index Table● The row keys are user IDs● Column qualifiers are words that appear in

that user’s messages● Timestamps are message IDs of messages

that contain that word● Value is offset of word in message

HBase Vs Cassandra● HBase on Hadoop, Cassandra is standalone● HBase community is more active

● HBase is CP, Cassandra is AP● Cassandra more suitable for high concurrent writes

The right tool for the right job