NoSQL Databases
Transcript of NoSQL Databases
Agenda● History● Relational databases● Horizontal vs vertical scaling● CAP theorem● Document databases● Key value databases● Graph databases● Column family databases
History● Non SQL (not traditional tabular database)● Facebook, Google, Amazon..etc (Big data and real
time applications)● Horizontal scaling is a problem in relational
database● Not only SQL (SQL like queries)
Relational Databases :)● MySQL, Oracle, SQL Server, Postgres..etc● Carpenter Hammer● Easy & Popular● Avoid data duplication but complex queries● Atomicity (transactions)
Relational Databases :(
● Defined schema, optional attributes (NULLs)● Use joins to aggregate related data● Large data VOLUME and high rate of READ
(scalability)
Scaling
source: https://commons.wikimedia.org/wiki/File:They_started_our_car_by_pushing_it_backwards_up_the_hill!_(3854246685).jpg
SQL Vs NoSQLRelational Databases NoSQL Databases
Vertical and not too many horizontal Horizontal scaling
Consistent Consistent or Eventual consistent
Scalable reads Scalable reads/writes
Transactions on multiple tables Difficult to support transactions
No partition tolerance Partition tolerance
Schema/tables Schemaless
Flexible queries (joins) Limited queries
1) Document Databases● Simple & popular● Close to relational database● MongoDB was a rising star in 2009
JSON Document Vs Row
● Document Vs Row● Collection Vs Table● Nesting no joins● Query in sub-doc● Duplicate data to
avoid joins● Schemaless
MongoDB Conclusion● Simple● Scalable● Embedded document● CP● No joins● May need to duplicate data● Writes should go through master node● Built-in Geo-spatial support
2) Key-Value Databases● Light & compact● Hash table (values; text, blob, json, image..etc)● Reads are fast, writes are faster
Redis in Memory
● No instant persistency by default in memory
● Persist periodically by taking snapshots
Redis CP● Sharding (A,B,C)● Replication A => A1, B => B1, C => C1● If master B fails, B1 is the promoted to be a master● Redis is NOT strong consistent (if both A, A1 fails)
● Riak is AP
Redis Conclusion● Light & Compact● Key-value● Complex data types● Fast in memory● Dataset should be less than RAM size● Transforming data, caching, messaging● CP but not strongly consistent● Flexible persistence levels● Rarely used alone
Graph Databases (AP)
● Tens of billions of nodes and edges● No Sharding; replicate all the graph● High availability over Consistency● Elect a gold master but writes to
slaves directly● Community edition is free but full
version is NOT
4) Column-Family Databases
Row family database:
● Many columns● Seek disk operation● Low compression
rate
Column-Family Databases
● In RDBMS, heavy writes, so store rows as a bulk
● In columns, heavy reads, store columns together
HBase● Database for HDFS (RDBMS vs files)● Widely used with Hadoop● Scalability! At least five nodes in
production● Facebook messaging system
infrastructure 2010
HBase Column Family● Key-Value pairs
(Map of maps)● Column families
should be defined but the columns are schema-less
HBase Versioning● Versioning● It became map of map
of map (asc, asc, desc)● Garbage collector for
expired data● Everything is binary● Compression rate
FB Messaging Index Table● The row keys are user IDs● Column qualifiers are words that appear in
that user’s messages● Timestamps are message IDs of messages
that contain that word● Value is offset of word in message
HBase Vs Cassandra● HBase on Hadoop, Cassandra is standalone● HBase community is more active
● HBase is CP, Cassandra is AP● Cassandra more suitable for high concurrent writes