RDBMS vs NoSQL: Performance and Scaling Comparison MSc in ...
NoSQL for Architects - Migrating from RDBMS to a Schema-less world
-
Upload
couchbase -
Category
Technology
-
view
656 -
download
1
Transcript of NoSQL for Architects - Migrating from RDBMS to a Schema-less world
1
NoSQL for Architects Migra3ng From RDBMS to a
Schema-‐less World
Dip& Borkar Senior Product Manager
2
NoSQL Webinar Series
NoSQL for Architects: Migra3ng from RDBMS to a schema-‐less world NoSQL for Developers: Migra3ng from RDBMS to a schema-‐less world NoSQL for DBAs: Migra3ng from RDBMS to a schema-‐less world
3
INTRODUCTION TO DOCUMENT DATABASES
4
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
Neo4j
couchDB
5
Document Databases
• Each record in the database is a self-‐describing document
• Each document has an independent structure
• Documents can be complex • All databases require a unique key • Documents are stored using JSON or XML or their deriva&ves
• Content can be indexed and queried • Offer auto-‐sharding for scaling and replica&on for high-‐availability
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
6
CRITICAL DIFFERENCES BETWEEN NOSQL AND RDBMS
7
Changes in interac&ve soVware – NoSQL driver
8
COMPARING DATA MODELS
9 h[p://www.geneontology.org/images/diag-‐godb-‐er.jpg
10
Rela&onal vs Document data model
R1C1
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
Rela&onal data model Document data model Highly-‐structured table organiza&on with rigidly-‐defined data formats and record
structure.
Collec&on of complex documents with arbitrary, nested data formats and
varying “record” format.
R1C2 R1C3 R1C4
R2C1 R2C2 R2C3 R2C4
R3C1 R3C2 R3C3 R3C4
R4C1 R4C2 R4C3 R4C4
11
Example: Error Logging Use case
KEY
Table 1: Error Log Table 2: Data Centers
ERR DC TIME KEY LOC
1 ERR FK(DC2)
TIME
2 ERR FK(DC2)
TIME
3 ERR FK(DC2)
TIME
4 ERR FK(DC3)
TIME
NUM
1
2
3
DEN
NYC
SFO
303-‐223-‐ 2332
212-‐223-‐ 2332
415-‐223-‐ 2332
12
FK(DC2)
FK(DC2)
FK(DC2)
FK(DC3)
Example: Error Logging Use case
KEY
Table 1: Error Log Table 2: Data Centers
ERR DC TIME KEY LOC
1 ERR FK(DC2)
TIME
2 ERR FK(DC2)
TIME
3 ERR FK(DC2)
TIME
4 ERR FK(DC3)
TIME
NUM
1
2
3
DEN
NYC
SFO
303-‐223-‐ 2332
212-‐223-‐ 2332
415-‐223-‐ 2332
13
FK(DC2)
FK(DC2)
FK(DC3)
Example: Error Logging Use case
KEY
Table 1: Error Log Table 2: Data Centers
ERR DC TIME KEY LOC
2
1 ERR FK(DC2)
TIME
ERR FK(DC2)
TIME
3 ERR FK(DC2)
TIME
4 ERR FK(DC3)
TIME
NUM
1
3
DEN
SFO
303-‐223-‐ 2332
415-‐223-‐ 2332
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
2 NYC 212-‐223-‐ 2332
14
{ “ID”: 4, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
Document design with flexible schema
{ “ID”: 5, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 5, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”,
“COMPONENT”: ”DMS” “SEV”: “LEVEL1”
“DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
SCHEMA CHANGE
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
{ “ID”: 1, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }
15
When considering how to model data for a given applica&on • Think of a logical container for the data • Think of how data groups together
Document modeling
Q • Are these separate object in the model layer? • Are these objects accessed together? • Do you need updates to these objects to be atomic? • Are mul&ple people edi&ng these objects concurrently?
16
Document Design Op&ons
• One document that contains all related data – Data is de-‐normalized – Be[er performance and scale – Eliminate client-‐side joins
• Separate documents for different object types with cross references – Data duplica&on is reduced – Objects may not be co-‐located – Transac&ons supported only on a document boundary – Most document databases do not support joins
17
Document ID / Key selec&on
• Similar to primary keys in rela&onal databases • Documents are sharded based on the document ID • ID based document lookup is extremely fast • Usually an ID can only appear once in a bucket
Op3ons • UUIDs, date-‐based IDs, numeric IDs • Hand-‐craVed (human readable) • Matching prefixes (for mul&ple related objects)
Q • Do you have a unique way of referencing objects? • Are related objects stored in separate documents?
18
• User profile The main pointer into the user data
• Blog entries • Badge sepngs, like a twi[er badge
• Blog posts Contains the blogs themselves
• Blog comments • Comments from other users
Example: En&&es for a Blog BLOG
19
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
Blog Document – Op&on 1 – Single document
{ !“_id”: “dborkar_Hello_World”,!“author”: “dborkar”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[ ! [“format”: “markdown”, “body”:”Awesome post!”],! [“format”: “markdown”, “body”:”Like it.” ]! ]!}
20
Blog Document – Op&on 2 -‐ Split into mul&ple docs
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ !“_id”: “dborkar_Hello_World”,!“author”: “dborkar”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[!
! “comment1_jchris_Hello_world”!! ]!
}!{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{!“_id”: “comment1_dborkar_Hello_World”,!“format”: “markdown”, !“body”:”Awesome post!” !}
BLOG DOC
COMMENT
21
• You can imagine how to take this to a threaded list
Threaded Comments
Blog First comment
Reply to comment
More Comments
List
List
Advantages • Only fetch the data when you need it
• For example, rendering part of a web page
• Spread the data and load across the en&re cluster
22
COMPARING SCALING MODEL
23
Modern interactive software architecture
Application Scales Out Just add more commodity web servers
Database Scales Up Get a bigger, more complex server
Note – Rela&onal database technology is great for what it is great for, but it is not great for this.
24
NoSQL database matches application logic tier architecture Data layer now scales with linear cost and constant performance.
Application Scales Out Just add more commodity web servers
Database Scales Out Just add more commodity data servers
Scaling out flattens the cost and performance curves.
NoSQL Database Servers
25
EVALUATING NOSQL
26
The Process – From Evalua&on to Go Live
Analyze your requirements
Find solu&ons / products that match key requirements
Execute a proof of concept / performance evalua&on
Begin development of applica&on Deploy in staging and then produc&on
1
2
3
4
5
No different from evalua&ng a rela&onal database
New requirements è New solu&ons
27
Analyze your requirements
• Rapid applica&on development
– Changing market needs – Changing data needs
• Scalability – Unknown user demand – Constantly growing throughput
• Consistent Performance – Low response &me for be[er user experience – High throughput to handle viral growth
• Reliability – Always online
1
Common applica3on requirements
28
Find solu&ons that match key requirements
• Linear Scalability • Schema flexibility • High Performance
2
NoSQL
RDBMS
RDBMS NoSQL
• Mul&-‐document transac&ons • Database Rollback • Complex security needs • Complex joins • Extreme compression needs
• Both / depends on the data
29
Proof of concept / Performance evalua&on
3
Prototype a workload • Look for consistent performance…
– Low response &mes / latency • For be[er user experience
– High throughput • To handle viral growth • For resource efficiency
• … across – Read heavy / Write heavy / Mixed workloads – Clusters of growing sizes
• … and watch for – Conten&on / heavy locking – Linear scalability
30
Other considera&ons
Accessing data – No standards exist yet – Typically via SDKs or over HTTP – Check if the programing language of your
choice is supported.
App Server
App Server
App Server
3
Consistency – Consistent only at the document level – Most documents stores currently don’t
support mul&-‐document transac&ons – Analyze your applica&on needs
Availability – Each node stores ac&ve and replica data
(Couchbase) – Each node is either a master or slave
(MongoDB)
31
Opera3ons – Monitoring the system – Backup and restore the system – Upgrades and maintenance – Support
App Server
App Server
Client
Other considera&ons 3
Ease of Scaling – Ease of adding and reducing capacity – Single node type – App availability on topology changes
Indexing and Querying – Secondary indexes (Map func&ons) – Aggregates Grouping (Reduce func&ons) – Basic querying
32
Begin development
4
Data Modeling and Document Design
33
Deploying to staging and produc&on
5
• Monitoring the system • RESTful interfaces / Easy integra&on with monitoring
tools
• High-‐availability • Replica&on • Failover and Auto-‐failover
• Always Online – even for maintenance tasks • Database upgrades • SoVware (OS) and Hardware upgrades • Backup and restore • Index building • Compac&on
34
Q
Q
So are you being impacted by these?
Schema Rigidity problems • Do you store serialized objects in the database? • Do you have lots of sparse tables with very few columns being used by most rows?
• Do you find that your applica&on developers require schema changes frequently due to constantly changing data?
• Are you using your database as a key-‐value store?
Scalability problems • Do you periodically need to upgrade systems to more powerful servers and scale up?
• Are you reaching the read / write throughput limit of a single database server?
• Is your server’s read / write latency not mee&ng your SLA? • Is your user base growing at a frightening pace?
35
WHERE IS NOSQL A GOOD FIT?
36
Performance driven use cases
• Low latency • High throughput ma[ers • Large number of users • Unknown demand with sudden growth of users/data • Predominantly direct document access • Workloads with very high muta&on rate per document (temporal locality) Working set with heavy writes
37
Data driven use cases
• Support for unlimited data growth • Data with non-‐homogenous structure • Need to quickly and oVen change data structure • 3rd party or user defined structure • Variable length documents • Sparse data records • Hierarchical data
38
BRIEF OVERVIEW COUCHBASE SERVER
39
Couchbase automa&cally distributes data across commodity servers. Built-‐in caching enables apps to read and write data with sub-‐millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements.
Couchbase Server
Simple. Fast. Elas&c. NoSQL.
40
Representa&ve user list
41
Couchbase architecture
Membase EP Engine
CouchDB
storage interface
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
gura&o
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replica&
on m
anager
h[p RE
ST m
anagem
ent A
PI/W
eb UI
Erlang/OTP
(built-‐in memcached)
Data Manager Cluster Manager
Database Opera&ons
Cluster Management
42
Couchbase deployment
Data Flow
Cluster Management
Web Applica&on
Couchbase Client Library
43
3 3 2
Clustering With Couchbase
SET request arrives at KEY’s master server
Listener-‐Sender
Master server for KEY Replica Server 2 for KEY Replica Server 1 for KEY
1 1 SET acknowledgement returned to applica&on
2
Disk Disk Disk
RAM
Couchb
ase storage en
gine
Disk Disk Disk
4
44
COUCHBASE CLIENT LIBRARY
Basic Opera&on
§ Docs distributed evenly across servers in the cluster
§ Each server stores both ac#ve & replica docs § Only one server ac&ve at a &me
§ Client library provides app with simple interface to database
§ Cluster map provides map to which server doc is on § App never needs to know
§ App reads, writes, updates docs
§ Mul&ple App Servers can access same document at same &me
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
User Configured Replica Count = 1
Read/Write/Update
COUCHBASE CLIENT LIBRARY
Read/Write/Update
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Ac&ve Docs Ac&ve Docs Ac&ve Docs
CLUSTER MAP
CLUSTER MAP
APP SERVER 1 APP SERVER 2
COUCHBASE SERVER CLUSTER
45
Add Nodes
§ Two servers added to cluster § One-‐click opera&on
§ Docs automa&cally rebalanced across cluster § Even distribu&on of
docs § Minimum doc movement
§ Cluster map updated
§ App database calls now distributed over larger # of servers
User Configured Replica Count = 1
Read/Write/Update Read/Write/Update
Doc 7
Doc 9
Doc 3
Ac&ve Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7
Doc 8 Doc 6
Doc 3
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
DOC
Doc 9
Doc 5
DOC
DOC
DOC
Doc 1
Doc 8 Doc 2
Replica Docs Replica Docs Replica Docs
Ac&ve Docs Ac&ve Docs Ac&ve Docs
SERVER 4 SERVER 5
Ac&ve Docs Ac&ve Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
46
Fail Over Node
§ App servers happily accessing docs on Server 3
§ Server fails § App server requests to server 3 fail § Cluster detects server has failed
§ Promotes replicas of docs to ac#ve § Updates cluster map
§ App server requests for docs now go to appropriate server
§ Typically rebalance would follow
User Configured Replica Count = 1
Doc 7
Doc 9
Doc 3
Ac&ve Docs
Replica Docs
Doc 6
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 1
COUCHBASE CLIENT LIBRARY CLUSTER MAP
APP SERVER 2
Doc 4
Doc 2
Doc 5
SERVER 1
Doc 6
Doc 4
SERVER 2
Doc 7
Doc 1
SERVER 3
Doc 3
Doc 9
Doc 7 Doc 8
Doc 6
Doc 3
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC DOC
DOC
DOC
DOC
Doc 9
Doc 5 DOC
DOC
DOC
Doc 1
Doc 8
Doc 2
Replica Docs Replica Docs Replica Docs
Ac&ve Docs Ac&ve Docs Ac&ve Docs
SERVER 4 SERVER 5
Ac&ve Docs Ac&ve Docs
Replica Docs Replica Docs
COUCHBASE SERVER CLUSTER
48
Reading and Wri&ng
Reading Data Wri3ng Data
Server
Give me document A
Here is document A
Application Server
A
Server
Please store document A
OK, I stored document A
Application Server
A
RAM
DISK
A
A
RAM
DISK
A
A
49
Server
Flow of data when wri&ng
Wri3ng Data
Application ServerApplication Server Application Server
Applica3ons wri3ng to Couchbase
Couchbase wri3ng to disk
network
Couchbase transmibng replicas
Replica3on queue Disk write queue
50
51