MendeleyMendeley Manual - Getting Started Manual - Getting Started
Getting Started with Amazon DynamoDB
-
Upload
amazon-web-services -
Category
Technology
-
view
312 -
download
2
Transcript of Getting Started with Amazon DynamoDB
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
August 11, 2016
Getting Started with
Amazon DynamoDB
Padma Malligarjunan, Sr. Technical Account Manager, AWS Enterprise Support
Yekesa Kosuru, V.P Engineering, DataXu
Agenda
• Brief history of data processing
• Relational (SQL) vs. non-relational (NoSQL)
• Fully managed features of DynamoDB
• Demo–serverless applications
• DataXu Use of DynamoDB
Data volume since 2010
• 90% of stored data generated in
last 2 years
• 1 terabyte of data in 2010 equals
6.5 petabytes today
• Linear correlation between data
pressure and technical innovation
• No reason these trends will not
continue over time
Relational vs. non-relational databases
Traditional SQL NoSQL
DB
Primary Secondary
Scale up
DB
DB
DBDB
DB DB
Scale out
Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad-hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
Fully managed
Fast, consistent performance
Highly scalable
Flexible
Event-driven programming
Fine-grained access control
DynamoDB benefits
WRITES
Replicated continuously to 3
Availability Zones
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No latency trade-off
Designed to
support 99.99%
of availability
Built for high
durability
High availability and durability
MLBAM (MLB Advanced Media) is a full service solutions
provider, operating a powerful content delivery platform.
For the first time, we can
measure things we’ve never
been able to measure
before.
Joe Inzerillo
Executive Vice President and CTO, MLBAM
”
“ • MLBAM can scale to support many games on a
single day.
• DynamoDB powers queries and supports the fast
data retrieval required.
• MLBAM distributes 25,000 live events annually and
10 million streams daily.
Major League Baseball fields big data,
excitement with Amazon DynamoDB
Redfin is a full-service real estate company with local
agents and online tools to help people buy & sell homes.
We have billions of records
on DynamoDB being
refreshed daily or hourly or
even by seconds.
Yong Huang
Director, Big Data Analytics, Redfin
”
“ • Redfin provides property and agent details and
ratings through its websites and apps.
• With DynamoDB, latency for “similar” properties
improved from 2 seconds to just 12 milliseconds.
• Redfin stores and processes five billion items in
DynamoDB.
Redfin is revolutionizing home buying and
selling with Amazon DynamoDB
Expedia is a leader in the $1 trillion travel industry, with an
extensive portfolio that includes some of the world’s most
trusted travel brands.
With DynamoDB we were up
and running in a less than
day, and there is no need for
a team to maintain.
Kuldeep Chowhan
Principal Engineer, Expedia
”
“ • Expedia’s real-time analytics application collects data
for its “test & learn” experiments on Expedia sites.
• The analytics application processes ~200 million
messages daily.
• Ease of setup, monitoring, and scaling were key
factors in choosing DynamoDB.
Expedia’s real-time analytics application uses
Amazon DynamoDB
Nexon is a leading South Korean video game developer
and a pioneer in the world of interactive entertainment.
By using AWS, we
decreased our initial
investment costs, and only
pay for what we use.
Chunghoon Ryu
Department Manager, Nexon
”
“ • Nexon used DynamoDB as its primary game
database for a new blockbuster mobile game,
HIT.
• HIT became the #1 Mobile Game in Korea
within the first day of launch and has > 2M
registered users.
• Nexon’s HIT leverages DynamoDB to deliver
steady latency of less than 10ms to deliver a
fantastic mobile gaming experience for
170,000 concurrent players.
Nexon scales mobile gaming with Amazon
DynamoDB
Ad Tech Gaming MobileIoT Web
Scaling high-velocity use cases with DynamoDB
DynamoDB table structureTable
Items
Attributes
Partitionkey
Sortkey
Mandatory
Key-value access pattern
Determines data distribution Optional
Model 1:N relationships
Enables rich query capabilities
All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values
Local secondary index (LSI)
Alternate sort key attribute
Index is local to a partition key
A1(partition)
A3(sort)
A2(item key)
A1(partition)
A2(sort)
A3 A4 A5
LSIs A1(partition)
A4(sort)
A2(item key)
A3(projected)
Table
KEYS_ONLY
INCLUDE A3
A1(partition)
A5(sort)
A2(item key)
A3(projected)
A4(projected)
ALL
10 GB maximum per
partition key; LSIs limit the
number of range keys!
Global secondary index (GSI)Alternate partition and/or sort key
Index is across all partition keys
A1(partition)
A2 A3 A4 A5
GSIs A5(partition)
A4(sort)
A1(item key)
A3(projected)
Table
INCLUDE A3
A4(partition)
A5(sort)
A1(item key)
A2(projected)
A3(projected) ALL
A2(partition)
A1(itemkey) KEYS_ONLY
Online indexing
Read capacity units
(RCUs) and write
capacity units (WCUs)
are provisioned
separately for GSIs
How do GSI updates work?
Table
Primary
tablePrimary
tablePrimary
tablePrimary
table
Global
secondary
index
Client
2. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes are throttled!
LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use
GSI!
Advanced topics in DynamoDB
• Data modeling
• Understanding partitions
• # of partitions depends on table throughput and size
• DynamoDB scaling
• Design patterns and best practices
To learn more, please attend:
Deep Dive on Amazon DynamoDB 4:45 p.m.– 5:45 p.m.
Rick Houlihan, Principal Solutions Architect
Architecture of a simple serverless web
application
AWS Identity &
Access
ManagementDynamoDBAPI
Gateway
JavaScript
users
Amazon
S3 Bucket
internet
Lambda
Architecture of a simple serverless web
application
IAM DynamoDBAPI
Gateway
JavaScript
users
S3 Bucket
internet
Lambda
Architecture of a simple serverless web
application
IAM DynamoDBAPI
Gateway
JavaScript
users
S3 Bucket
internet
Lambda
Architecture of a simple serverless web
application
IAM DynamoDBAPI
Gateway
JavaScript
users
S3 Bucket
internet
Lambda
Architecture of a simple serverless web
application
IAM DynamoDBAPI
Gateway
JavaScript
users
S3 Bucket
internet
Lambda
• Free Tier
25 GB of storage
25 reads per second
25 writes per second
• Pricing for additional usage in US East (N. Virginia)
$0.25 per GB per month
Write throughput: $0.0065 per hour for every 10 units of Write Capacity
Read throughput: $0.0065 per hour for every 50 units of Read Capacity
DynamoDB pricing & Free Tier
Resources
Padma Malligarjunan | [email protected]
Getting Started Guide: https://aws.amazon.com/dynamodb/
Deep Dive on Amazon DynamoDB 4:45 p.m.– 5:45 p.m.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Yekesa Kosuru, V.P. Engineering
Aug 11th 2016
DataXu Use of DynamoDB
DataXu
• Who
• A petabyte scale digital marketing platform
• Spun out of MIT Labs
• One of the fastest growing companies in Inc. 5000
• What
• To help world’s most valuable brands understand and engage
with their customer
• Attribution: identifying set of
user actions that contributed to
a outcome and attributing
value to prior impressions
• Production workload hosted in
AWS
• Leverages multiple AWS
Services
• Processes billions of events
• Dynamo DB is used as Key
Value store
Attribution Use Case
Meta
S3EMR
Job
Cloud
Watch
Dynamo
DBData
Pipeline
Kinesis EC2
Instances
3rd
Party
Cloud
Front
Technology Stack
Event Chains
E AI E I A(time)
• Users generate millions of different types of events / second while using the
internet (e.g. impressions, company interactions, clicks, video events, online
purchases)
• All of these events arrive into the platform via different mechanisms (e.g. log
files and streaming data), and need to be correlated, based on time and by
user, to recreate a user’s interaction on the internet.
E
I
Event
E
Impression
A Attribution
DynamoDB Scaling
Avg. Read Capacity Avg. Write Capacity
Provisioned Throughput 60,000 160,000
Storage: 25TB
Average record size: 1-4KB
Partition Key: User
Sort Key: Timestamp
Fully Managed Benefits of Dynamo DB:
• Consistent read/write performance up to provisioned throughput
• Flexible elasticity, never need to overprovision
• Low operational overhead
• Fast & predictable reads with low latencies - millisecs
DataXu Optimizations
• Combined Reads and Writes with LZO compression
• Combined multiple rows that share the same hash key to the
same row (3X less puts)
• Table rotation to match attribution windows
• Drop entire table when it is no longer necessary