Amazon Dynamo DB - presentation

download Amazon Dynamo DB - presentation

of 30

Transcript of Amazon Dynamo DB - presentation

  • 8/12/2019 Amazon Dynamo DB - presentation

    1/30

    Amazon Dynamo D- a term project presentation

    CSc 244 | CSU Sacramento

    Spring 2014Instructor : Dr. Ying Jin

    8thMay, 2014

  • 8/12/2019 Amazon Dynamo DB - presentation

    2/30

    Agenda

    NoSQL background

    CAP theorem AWS offerings

    Dynamo DB

    Dynamo DB Architecture

    How to use

    Pricing

    Advantages

    Limitations

    References

    Questions

  • 8/12/2019 Amazon Dynamo DB - presentation

    3/30

    NoSQL background

    Traditional databases have a concrete schema.

  • 8/12/2019 Amazon Dynamo DB - presentation

    4/30

    NoSQL systems are designed to be schema-le

    Entry 1

    {name:emp1} Entry 2

    {name:emp2,e_id:1,e_addr:Cupertino} Entry 3

    {name:emp3,e_id:3} Entry 4{name:emp4,e_id:6, dob:03-Sep-1964}

  • 8/12/2019 Amazon Dynamo DB - presentation

    5/30

    Types of NoSQL systems

    1. Key-Value Pair :

    Every item is a set of key-value pair

    Voldemort(LinkedIn), DynamoDB(Amazon), Redis(VMWare), etc.

    2. Document Oriented : Each key is pairedwith a complex data structure called document. Doc

    contain multiple key-value pairs

    CouchDB, MongoDB, etc.

    3. Column Family Stores : Key value is mapped to set of columns optimized for query over laCassandra(Facebook), HyperTable, Hbase, BigTable(Google) etc.

    4. Graph Databases : Containsinformation about network, social connections

    Neo4j, FlockDB(Twitter).

  • 8/12/2019 Amazon Dynamo DB - presentation

    6/30

    ACID properties in traditional RDB

    Atomicity Requires each transaction to be all or nothing Consistency

    Brings databases from one valid state to another

    Isolation

    Takes care of concurrent execution and avoid overwritting Durability

    Commited transactions will remain so in the event of power loss, crashe

  • 8/12/2019 Amazon Dynamo DB - presentation

    7/30

    CAP theorem

    It states : A distributed system possibly cannot guarantee all of the fosimultaneously

    Consistency: All nodes see the same data at the same time

    Availability: The system is always on, i.e., every request receives a responsserves the request or not.

    Partition Tolerance: System continues to operate in case of system failure o

  • 8/12/2019 Amazon Dynamo DB - presentation

    8/30

  • 8/12/2019 Amazon Dynamo DB - presentation

    9/30

    Amazon Web Services(AWS) offeri Launched in 2006, AWS is used by thousands of companies tod

    based computing services.

    Includes an array of services: Remote Computing(EC2), IdentityDatabase services(SimpleDB, RDS, ElastiCache, DynamoDB, e 8 Regions and Availability zones :

    Code Name

    ap-northeast-1 Asia Pacific (Tokyo) Re

    ap-southeast-1 Asia Pacific (Singapore

    ap-southeast-2 Asia Pacific (Sydney) R

    eu-west-1 EU (Ireland) Region

    sa-east-1 South America (Sao Pa

    us-east-1 US East (Northern Virg

    us-west-1 US West (Northern Ca

    us-west-2 US West (Oregon) Reg

  • 8/12/2019 Amazon Dynamo DB - presentation

    10/30

    Dynamo DB

    Dynamo DB is a managed and eventually consistent NoSQL databasupport highly available data access. A flexible data model with key/attribute pairs. No schema required. Optimized for availability to maximize :

    Data consistency Durability Performance

    ALL THIS WITHOUT THE OPERATIONAL BURDEN

  • 8/12/2019 Amazon Dynamo DB - presentation

    11/30

    What is Fully Managed?

    Never worry about:

    Hardware provisioning Cross-availability zone replication

    Hardware and Software updates Monitoring and handling of hardware failures

    Replicas automatically generated.

  • 8/12/2019 Amazon Dynamo DB - presentation

    12/30

    DynamoDB Architecture

    True distributed architecture

    Data is spread across hundreds of servers called storage nodes Hundreds of servers form a cluster in the form of a ring

    Client application can connect using one of the two approaches Routing using a load balancer

    Client-library that reflects Dynamos partitioning scheme and can determine thestorage host to connect

    DynamoDB is designed to be always writable storage solution Allows multiple versions of data on multiple storage nodes

    Conflict resolution happens while reads and NOT during writes Syntacticconflict resolution

    Symantec conflict resolution

    f d

  • 8/12/2019 Amazon Dynamo DB - presentation

    13/30

    Put(key,context object) Determines where replicas of the object should be placed based on key,& write replicas to disk

    Context Info is stored along with object so as to verify validity of context object

    If at least W-1 nodes respond, write is successful

    Get(key)- Operation locates all object replicas associated with key in storage system

    Returns a single or list of object with conflict version along with context

    Client reconcile divergent versions and supersede current version with based on context

    Node handling read and write operation is called coordinator

    For reads and writes dynamo uses consistency protocol

    Consistency protocol has 3 variables N Number of replicas of data to be read or written

    W Number of nodes that must participate in successful write operation

    R Number of machines contacted in read operation

    R+W > N Latency determined by slowest of R or W replicas. Thus R&W less than N

    DynamoDB System Interface get() and p

  • 8/12/2019 Amazon Dynamo DB - presentation

    14/30

    DynamoDB Architecture - Partitioning

    Data is partitioned over multiple hosts called storage nodes (ring)

    Uses consistent hashing to dynamically partition data across storage hosts

    Two problems associated with Consistent hashing Hashing of storage hosts can cause imbalance of data and load

    Consistent hashing treats every storage host as same capacity

    A [3,4]

    B [1]

    C[2]

    1

    2

    3

    4

    3

    4

    A[4]

    1

    D[2,3]

    Initial Situation Situation after C left and D join

  • 8/12/2019 Amazon Dynamo DB - presentation

    15/30

    DynamoDB Architecture - Partitionin

    Solution Virtual hosts mapped to physical hosts (tokens)

    Number of virtual hosts for a physical host depends on capacity of physical hos

    Virtual nodes are function-mapped to physical nodes

    AB

    C

    D

    H

    I

    J

    C

    B PhysicalHost

    Virtual Host

  • 8/12/2019 Amazon Dynamo DB - presentation

    16/30

    DynamoDB Architecture Scaling Gossip protocol used for node membership

    Every second, storage node randomly contact a peer to bilaterally reconcile persistedmembership history

    Doing so, membership changes are spread and eventually consistent membership view isformed

    When a new members joins, adjacent nodes adjust their object and replica ownership

    When a member leaves adjacent nodes adjust their object and replica and distributes keyowned by leaving member

    A

    B

    C

    D

    E

    F

    G

    H

    Replication factor = 3

  • 8/12/2019 Amazon Dynamo DB - presentation

    17/30

    DynamoDB Architecture Data Version

    Eventual Consistency data propagates asynchronously

    Its possible to have multiple version on different storage nodes

    If latest data is not available on storage node, client will update the old versio Conflict resolution is done using vector clock

    Vector clock is a metadata information added to data when using get() or pu

    Vector clock effectively a list of (node, counter) pairs

    One vector clock is associated with every version of every object

    One can determine two version has causal ordering or parallel branches by evector clocks

    Client specify vector clock information while reading or writing data in the focontext

    Dynamo tries to resolve conflict among multiple version using syntactic reco

    If syntactic reconciliation does not work, client has to resolve conflict using sreconciliation

  • 8/12/2019 Amazon Dynamo DB - presentation

    18/30

  • 8/12/2019 Amazon Dynamo DB - presentation

    19/30

    Strongly consistent vs Eventually Consisten

    Multiple copies of same item to ensure durability

    Takes time for update to propagate multiple copies Eventually consistent

    After write happens, immediate read may not give latest value

    Strongly consistent

    Requires additional read capacity unit Gets most up-to-date version of item value

    Eventually consistent read consumes half the read capacityunit as strongly consistent read.

  • 8/12/2019 Amazon Dynamo DB - presentation

    20/30

    How to use

    AWS provides SDKs to interact with DynamoDB for the followingplatforms/languages:

    Android iOS Java

    .Net

    In Java, there are two means to interact with your DynamoDB table AWS Management Console AWS Toolkit for Eclipse

    Python

    PHP

    Node.js

    Ruby

  • 8/12/2019 Amazon Dynamo DB - presentation

    21/30

    Management Console

  • 8/12/2019 Amazon Dynamo DB - presentation

    22/30

    Console create table

  • 8/12/2019 Amazon Dynamo DB - presentation

    23/30

    Inside Amazon DynamoDB

    Primary key (mandatory for every table) Hash or Hash + Range

    Data model in the form of tables

    Data stored in the form of items (name value attributes) Secondary Indexes for improved performance

    Local secondary index Global secondary index

    Scalar data type (number, string etc) or multi-valued data type (sets)

    key=value key=value key=value key=value Table

    Item (64KB ma

    Attributes

    Create Instance in Java

  • 8/12/2019 Amazon Dynamo DB - presentation

    24/30

    Create Instance in Java

  • 8/12/2019 Amazon Dynamo DB - presentation

    25/30

    Using AWS SDK: AWS Toolkit for Ec

  • 8/12/2019 Amazon Dynamo DB - presentation

    26/30

    Pricing

    In the free tier, AWS offers 100 MB of free data storage per month. Read capacity: 1 read = 4 KB data Write capacity: 1 write = 1 KB data

    With eventually consistant reads, you get twice reads/sec In free tier you get 5 writes/sec & 10 reads/sec Or 432,000 writes and 864,000 writes per day or 40 million data opera

    month.

  • 8/12/2019 Amazon Dynamo DB - presentation

    27/30

    Advantages

    Easy Administration:No h/w or s/w provisioning, setup or con

    Flexible:Secondary Indexes: query on whichever attrib Fast, predictable performance: SSDs, single digit latency Built-in fault tolerance Stong consistency and atomic counters Automatic Data Replication:Data stored across 3 availability zones

    Security:AWS Identity Access Management(IAM) CloudWatch Alarms: Alarms are raised if you are near to crossing provisioned t

    storage.

    Provisioned throughput/Cost effective:Pay how you use

  • 8/12/2019 Amazon Dynamo DB - presentation

    28/30

    Limitations

    64KB limit on item size (row size) 1 MB limit on fetching data Pay more if you want strongly consistent data Size is multiple of 4KB (provisioning throughput wastage) Cannot join tables

    Indexes should be created during table creation only No triggers or server side scripts Limited comparison capability (no not_null, contains etc)

  • 8/12/2019 Amazon Dynamo DB - presentation

    29/30

    References:

    http://aws.amazon.com/dynamodb/ http://www.allthingsdistributed.com/files/amazon-dynamo-sos

    http://www.allthingsdistributed.com/2012/01/amazon-dynamo http://docs.aws.amazon.com/amazondynamodb/latest/developoduction.html

    http://aws.amazon.com/dynamodb/http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/2012/01/amazon-dynamodb.htmlhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfhttp://aws.amazon.com/dynamodb/http://aws.amazon.com/dynamodb/
  • 8/12/2019 Amazon Dynamo DB - presentation

    30/30

    Questions ?

    Thank You !