Web Scale with NoSQL

Post on 01-Nov-2014

3.166 views 0 download

Tags:

description

Introduction to NoSQL, RDBMS Scalability, Why NoSQL, Categories of NoSQL, SQL vs NoSQL

Transcript of Web Scale with NoSQL

Web Scale with NoSQLSergejus Barinovas (@sergejusb)

http://sergejus.blogas.lt

Who Am I?

Architect at Running NoSQL servers in production

Blogger (http://sergejus.blogas.lt, @sergejusb)

Community member (http://dotnetgroup.lt)

Contact me via sergejus.barinovas@gmail.com

Powered by RDBMS

Used everywhere… …even where it shouldn’t

Used for 30+ years!

Back to 1980’s…

Data boom

in numbers

600 000 000 users

30 000 servers

20+ TB raw data per day

>20 PB stored data

You really think they use RDBMS?

RDBMS Scaling Example

Simple usage

Customers

Reads / Writesmaster

Scale reads

Customers

Writes master

slave slave

Reads

Scale writes

Customers [A-M]Reads / Writes [A-M]

master

masterCustomers [N-Z]Reads / Writes [N-Z]

Reads [A-M

]

Scale reads / writes

Customers [A-M]

Writes [A-M]

master

slave slave

Reads [A-M]

masterCustomers [N-Z]

slave slave

Writes [N-Z]

Pray your system won’t fail

Enter the NoSQL

Why NoSQL

Limited SQL scalability Sharding and vertical partitioning

Limited SQL availability Master / slave configuration

Limited SQL speed of read operations Multiple read replicas

SQL limitations for huge amount of data Key / value / type columns

NoSQL history

2009, Eric Evans, no:sql(est)

NoSQL – open source distributed databases, not relational SQL databases

NoSQL – not only SQL

NoSQL → Big Data

NoSQL characteristics (1/2)

Scalability The ability to horizontally scale simple-

operation throughput over many servers

BASE A “weaker” concurrency model than the ACID

transactions in most SQL systems

NoSQL characteristics (2/2)

Distributed Efficient use of distributed indexes and RAM

for data storage

Schema-less The ability to dynamically define new

attributes or data schema

CAP theorem

2000, Eric Brewer It is impossible for a distributed computer

system to simultaneously provide all three of the following guarantees:

Consistency Availability Partition tolerance

NoSQL Databases

NoSQL categories

Key / value store

Document database

Graph database

Columnar database

Key / value store

<key, value> or Tuple<key, v1,. ., vn> Simple operations

Get Put Delete

Byte[] Byte[]

Key Value

Key / value store

Key Value

“current_date” 2023-04-08

“sergejusb” Binary Object

“sergejusb” JSON Object

Key / value stores

Redis (+)messaging (-)no shards

Voldermort

Membase (+)memcache interface

Riak

Document database

Document == complex object XML YAML JSON / BSON

Support for secondary indexes Schema can be defined at runtime Optional support for simple querying

using Map / Reduce

Document databases

MongoDB (+)shards

CouchDB (+)master / master replication

Graph database

Graph == network Basic constructs

Node Edge Properties

sergejus

sergejus.blogas.lt

tdagys

auth

ors reads

knows

knows

Graph databases

Neo4j (-)paid version required for scaling

FlockDB (+)fast (-)limited functionality

Columnar database

For HUGE amount of data

Columns are added at a runtime

Great scalability Horizontal Vertical

Columnar database

Unusual data model Key Space → Database Column Family → Table Columns and Super Columns Super Column → array of Columns Column → Tuple<Key, Value, Timestamp, TTL>

Columnar database

Cassandra (+)easy scalable

HBase (+)consistent (+)part of Hadoop

Hypertable

NoSQL is Cool! But…

NoSQL limitations

ORDER BY ? Natural key order

GROUP BY ? Map / Reduce*

JOIN ? Multiple Map / Reduce*

SELECT * ? Multi-machine Map / Reduce*

*if possible

NoSQL Limitations

Maturity

Tooling

Specificity

SQL vs. NoSQL

Choose the right tool for the task

You can use BOTH

Thank you!

Sergejus Barinovas (@sergejusb)

sergejus.barinovas@gmail.com

http://sergejus.blogas.lt