Amazon Dynamo DB - presentation
-
Upload
chetan-nagarkar -
Category
Documents
-
view
221 -
download
0
Transcript of Amazon Dynamo DB - presentation
-
8/12/2019 Amazon Dynamo DB - presentation
1/30
Amazon Dynamo D- a term project presentation
CSc 244 | CSU Sacramento
Spring 2014Instructor : Dr. Ying Jin
8thMay, 2014
-
8/12/2019 Amazon Dynamo DB - presentation
2/30
Agenda
NoSQL background
CAP theorem AWS offerings
Dynamo DB
Dynamo DB Architecture
How to use
Pricing
Advantages
Limitations
References
Questions
-
8/12/2019 Amazon Dynamo DB - presentation
3/30
NoSQL background
Traditional databases have a concrete schema.
-
8/12/2019 Amazon Dynamo DB - presentation
4/30
NoSQL systems are designed to be schema-le
Entry 1
{name:emp1} Entry 2
{name:emp2,e_id:1,e_addr:Cupertino} Entry 3
{name:emp3,e_id:3} Entry 4{name:emp4,e_id:6, dob:03-Sep-1964}
-
8/12/2019 Amazon Dynamo DB - presentation
5/30
Types of NoSQL systems
1. Key-Value Pair :
Every item is a set of key-value pair
Voldemort(LinkedIn), DynamoDB(Amazon), Redis(VMWare), etc.
2. Document Oriented : Each key is pairedwith a complex data structure called document. Doc
contain multiple key-value pairs
CouchDB, MongoDB, etc.
3. Column Family Stores : Key value is mapped to set of columns optimized for query over laCassandra(Facebook), HyperTable, Hbase, BigTable(Google) etc.
4. Graph Databases : Containsinformation about network, social connections
Neo4j, FlockDB(Twitter).
-
8/12/2019 Amazon Dynamo DB - presentation
6/30
ACID properties in traditional RDB
Atomicity Requires each transaction to be all or nothing Consistency
Brings databases from one valid state to another
Isolation
Takes care of concurrent execution and avoid overwritting Durability
Commited transactions will remain so in the event of power loss, crashe
-
8/12/2019 Amazon Dynamo DB - presentation
7/30
CAP theorem
It states : A distributed system possibly cannot guarantee all of the fosimultaneously
Consistency: All nodes see the same data at the same time
Availability: The system is always on, i.e., every request receives a responsserves the request or not.
Partition Tolerance: System continues to operate in case of system failure o
-
8/12/2019 Amazon Dynamo DB - presentation
8/30
-
8/12/2019 Amazon Dynamo DB - presentation
9/30
Amazon Web Services(AWS) offeri Launched in 2006, AWS is used by thousands of companies tod
based computing services.
Includes an array of services: Remote Computing(EC2), IdentityDatabase services(SimpleDB, RDS, ElastiCache, DynamoDB, e 8 Regions and Availability zones :
Code Name
ap-northeast-1 Asia Pacific (Tokyo) Re
ap-southeast-1 Asia Pacific (Singapore
ap-southeast-2 Asia Pacific (Sydney) R
eu-west-1 EU (Ireland) Region
sa-east-1 South America (Sao Pa
us-east-1 US East (Northern Virg
us-west-1 US West (Northern Ca
us-west-2 US West (Oregon) Reg
-
8/12/2019 Amazon Dynamo DB - presentation
10/30
Dynamo DB
Dynamo DB is a managed and eventually consistent NoSQL databasupport highly available data access. A flexible data model with key/attribute pairs. No schema required. Optimized for availability to maximize :
Data consistency Durability Performance
ALL THIS WITHOUT THE OPERATIONAL BURDEN
-
8/12/2019 Amazon Dynamo DB - presentation
11/30
What is Fully Managed?
Never worry about:
Hardware provisioning Cross-availability zone replication
Hardware and Software updates Monitoring and handling of hardware failures
Replicas automatically generated.
-
8/12/2019 Amazon Dynamo DB - presentation
12/30
DynamoDB Architecture
True distributed architecture
Data is spread across hundreds of servers called storage nodes Hundreds of servers form a cluster in the form of a ring
Client application can connect using one of the two approaches Routing using a load balancer
Client-library that reflects Dynamos partitioning scheme and can determine thestorage host to connect
DynamoDB is designed to be always writable storage solution Allows multiple versions of data on multiple storage nodes
Conflict resolution happens while reads and NOT during writes Syntacticconflict resolution
Symantec conflict resolution
f d
-
8/12/2019 Amazon Dynamo DB - presentation
13/30
Put(key,context object) Determines where replicas of the object should be placed based on key,& write replicas to disk
Context Info is stored along with object so as to verify validity of context object
If at least W-1 nodes respond, write is successful
Get(key)- Operation locates all object replicas associated with key in storage system
Returns a single or list of object with conflict version along with context
Client reconcile divergent versions and supersede current version with based on context
Node handling read and write operation is called coordinator
For reads and writes dynamo uses consistency protocol
Consistency protocol has 3 variables N Number of replicas of data to be read or written
W Number of nodes that must participate in successful write operation
R Number of machines contacted in read operation
R+W > N Latency determined by slowest of R or W replicas. Thus R&W less than N
DynamoDB System Interface get() and p
-
8/12/2019 Amazon Dynamo DB - presentation
14/30
DynamoDB Architecture - Partitioning
Data is partitioned over multiple hosts called storage nodes (ring)
Uses consistent hashing to dynamically partition data across storage hosts
Two problems associated with Consistent hashing Hashing of storage hosts can cause imbalance of data and load
Consistent hashing treats every storage host as same capacity
A [3,4]
B [1]
C[2]
1
2
3
4
3
4
A[4]
1
D[2,3]
Initial Situation Situation after C left and D join
-
8/12/2019 Amazon Dynamo DB - presentation
15/30
DynamoDB Architecture - Partitionin
Solution Virtual hosts mapped to physical hosts (tokens)
Number of virtual hosts for a physical host depends on capacity of physical hos
Virtual nodes are function-mapped to physical nodes
AB
C
D
H
I
J
C
B PhysicalHost
Virtual Host
-
8/12/2019 Amazon Dynamo DB - presentation
16/30
DynamoDB Architecture Scaling Gossip protocol used for node membership
Every second, storage node randomly contact a peer to bilaterally reconcile persistedmembership history
Doing so, membership changes are spread and eventually consistent membership view isformed
When a new members joins, adjacent nodes adjust their object and replica ownership
When a member leaves adjacent nodes adjust their object and replica and distributes keyowned by leaving member
A
B
C
D
E
F
G
H
Replication factor = 3
-
8/12/2019 Amazon Dynamo DB - presentation
17/30
DynamoDB Architecture Data Version
Eventual Consistency data propagates asynchronously
Its possible to have multiple version on different storage nodes
If latest data is not available on storage node, client will update the old versio Conflict resolution is done using vector clock
Vector clock is a metadata information added to data when using get() or pu
Vector clock effectively a list of (node, counter) pairs
One vector clock is associated with every version of every object
One can determine two version has causal ordering or parallel branches by evector clocks
Client specify vector clock information while reading or writing data in the focontext
Dynamo tries to resolve conflict among multiple version using syntactic reco
If syntactic reconciliation does not work, client has to resolve conflict using sreconciliation
-
8/12/2019 Amazon Dynamo DB - presentation
18/30
-
8/12/2019 Amazon Dynamo DB - presentation
19/30
Strongly consistent vs Eventually Consisten
Multiple copies of same item to ensure durability
Takes time for update to propagate multiple copies Eventually consistent
After write happens, immediate read may not give latest value
Strongly consistent
Requires additional read capacity unit Gets most up-to-date version of item value
Eventually consistent read consumes half the read capacityunit as strongly consistent read.
-
8/12/2019 Amazon Dynamo DB - presentation
20/30
How to use
AWS provides SDKs to interact with DynamoDB for the followingplatforms/languages:
Android iOS Java
.Net
In Java, there are two means to interact with your DynamoDB table AWS Management Console AWS Toolkit for Eclipse
Python
PHP
Node.js
Ruby
-
8/12/2019 Amazon Dynamo DB - presentation
21/30
Management Console
-
8/12/2019 Amazon Dynamo DB - presentation
22/30
Console create table
-
8/12/2019 Amazon Dynamo DB - presentation
23/30
Inside Amazon DynamoDB
Primary key (mandatory for every table) Hash or Hash + Range
Data model in the form of tables
Data stored in the form of items (name value attributes) Secondary Indexes for improved performance
Local secondary index Global secondary index
Scalar data type (number, string etc) or multi-valued data type (sets)
key=value key=value key=value key=value Table
Item (64KB ma
Attributes
Create Instance in Java
-
8/12/2019 Amazon Dynamo DB - presentation
24/30
Create Instance in Java
-
8/12/2019 Amazon Dynamo DB - presentation
25/30
Using AWS SDK: AWS Toolkit for Ec
-
8/12/2019 Amazon Dynamo DB - presentation
26/30
Pricing
In the free tier, AWS offers 100 MB of free data storage per month. Read capacity: 1 read = 4 KB data Write capacity: 1 write = 1 KB data
With eventually consistant reads, you get twice reads/sec In free tier you get 5 writes/sec & 10 reads/sec Or 432,000 writes and 864,000 writes per day or 40 million data opera
month.
-
8/12/2019 Amazon Dynamo DB - presentation
27/30
Advantages
Easy Administration:No h/w or s/w provisioning, setup or con
Flexible:Secondary Indexes: query on whichever attrib Fast, predictable performance: SSDs, single digit latency Built-in fault tolerance Stong consistency and atomic counters Automatic Data Replication:Data stored across 3 availability zones
Security:AWS Identity Access Management(IAM) CloudWatch Alarms: Alarms are raised if you are near to crossing provisioned t
storage.
Provisioned throughput/Cost effective:Pay how you use
-
8/12/2019 Amazon Dynamo DB - presentation
28/30
Limitations
64KB limit on item size (row size) 1 MB limit on fetching data Pay more if you want strongly consistent data Size is multiple of 4KB (provisioning throughput wastage) Cannot join tables
Indexes should be created during table creation only No triggers or server side scripts Limited comparison capability (no not_null, contains etc)
-
8/12/2019 Amazon Dynamo DB - presentation
29/30
References:
http://aws.amazon.com/dynamodb/ http://www.allthingsdistributed.com/files/amazon-dynamo-sos
http://www.allthingsdistributed.com/2012/01/amazon-dynamo http://docs.aws.amazon.com/amazondynamodb/latest/developoduction.html
http://aws.amazon.com/dynamodb/http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://aws.amazon.com/dynamodb/http://aws.amazon.com/dynamodb/ -
8/12/2019 Amazon Dynamo DB - presentation
30/30
Questions ?
Thank You !