AWS December 2015 Webinar Series - Design Patterns using Amazon DynamoDB
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series
-
Upload
amazon-web-services -
Category
Technology
-
view
692 -
download
0
Transcript of SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Series
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Rick Houlihan, Principal TPM – DBS NoSQL
26 July 2016
From SQL to NoSQLBest Practices for Migrating from RDBMS to DynamoDB
Agenda
• Evolution of Data Processing• Why NoSQL• Key DynamoDB concepts• SQL to NoSQL Data Modeling
Timeline of Database TechnologyDa
ta P
ress
ure
Data Volume Since 2010
Dat
a Vo
lum
eHistorical Current
90% of stored data generated in last 2 years
1 Terabyte of data in 2010 equals 6.5 Petabytes today
Linear correlation between data pressure and technical innovation
SQL is not built for this
Technology Adoption and the Hype Curve
Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
Amazon DynamoDB
Document or Key-Value Scales to Any WorkloadFully Managed NoSQL
Access Control Event Driven ProgrammingFast and Consistent
TableTable
Items
Attributes
PartitionKey
Sort Key
MandatoryKey-value access patternDetermines data distribution
OptionalModel 1:N relationshipsEnables rich query capabilities
All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values
00 55 A954 FFAA00 FF
Partition Keys
Id = 1Name = Jim
Hash (1) = 7B
Id = 2Name = AndyDept = Eng
Hash (2) = 48
Id = 3Name = KimDept = Ops
Hash (3) = CD
Key Space
Partition Key uniquely identifies an itemPartition Key is used for building an unordered hash indexAllows table to be partitioned for scale
Partition 3
Partition:Sort Key uses two attributes together to uniquely identify an ItemWithin unordered hash index, data is arranged by the sort keyNo limit on the number of items (∞) per partition keyExcept if you have local secondary indexes
Partition:Sort Key
00:0 FF:∞
Hash (2) = 48
Customer# = 2Order# = 10Item = Pen
Customer# = 2Order# = 11Item = Shoes
Customer# = 1Order# = 10Item = Toy
Customer# = 1Order# = 11Item = Boots
Hash (1) = 7B
Customer# = 3Order# = 10Item = Book
Customer# = 3Order# = 11Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AAPartition 1 Partition 2
Partitions are three-way replicated
Id = 2Name = AndyDept = Engg
Id = 3Name = KimDept = Ops
Id = 1Name = Jim
Id = 2Name = AndyDept = Engg
Id = 3Name = KimDept = Ops
Id = 1Name = Jim
Id = 2Name = AndyDept = Engg
Id = 3Name = KimDept = Ops
Id = 1Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
Indexes
Local secondary index (LSI)
Alternate sort key attributeIndex is local to a partition key
A1(partition)
A3(sort)
A2(item key)
A1(partition)
A2(sort) A3 A4 A5
LSIs A1(partition)
A4(sort)
A2(item key)
A3(projected)
Table
KEYS_ONLY
INCLUDE A3
A1(partition)
A5(sort)
A2(item key)
A3(projected)
A4(projected) ALL
10 GB max per partition key, i.e. LSIs limit the # of range keys!
Global secondary index (GSI)
Alternate partition and/or sort keyIndex is across all partition keysUse composite sort keys for compound indexes
A1(partition) A2 A3 A4 A5
A5(partition)
A4(sort)
A1(item key)
A3(projected) INCLUDE A3
A4(partition)
A5(sort)
A1(item key)
A2(projected)
A3(projected) ALL
A2(partition)
A1(itemkey) KEYS_ONLY
GSIs
Table RCUs/WCUs provisioned separately for GSIs
Online indexing
Migrating to NoSQL
Planning DataAnalysis
DataModeling Testing Migration
Data Analysis Phase
Analysis of both the RDBMS schema as well as application access patterns is key to understanding the performance of workloads on a NoSQL database platform.
Data Analysis Phase
RDBMS Source Data AnalysisKey data attributes• Read/Write Velocity• Data Partitioning• Key Cardinality
Access Pattern of the ApplicationExamples:• Write/Read Workloads• Relational Structures• Aggregation Dimensions
Data Modeling
1:1 relationships or key-values
Use a table or GSI with an alternate partition keyUse GetItem or BatchGetItem API
Example: Given an SSN or license number, get attributes
Users TablePartition key AttributesSSN = 123-45-6789 Email = [email protected], License = TDL25478134SSN = 987-65-4321 Email = [email protected], License = TDL78309234
Users-License-GSIPartition key AttributesLicense = TDL78309234 Email = [email protected], SSN = 987-65-4321License = TDL25478134 Email = [email protected], SSN = 123-45-6789
1:N relationships or parent-children
Use a table or GSI with partition and sort keyUse Query API
Example:Given a device, find all readings between epoch X, Y
Device-measurementsPartition Key Sort key AttributesDeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
N:M relationships
Use a table and GSI with partition and sort key elements switchedUse Query API
Example: Given a user, find all games. Or given a game, find all users.
User-Games-TablePartition Key Sort keyUserId = bob GameId = Game1UserId = fred GameId = Game2UserId = bob GameId = Game3
Game-Users-GSIPartition Key Sort keyGameId = Game1 UserId = bobGameId = Game2 UserId = fredGameId = Game3 UserId = bob
Hierarchical DataTiered relational data structures
It’s all about aggregations…
Document Management Process ControlSocial Network
Data TreesIT Monitoring
How OLTP Apps Use Data
Mostly Hierarchical Structures
Entity Driven Workflows Data Spread Across Tables Requires Complex Queries Primary driver for ACID
Hierarchical Structures as Item Collections…Use composite sort key to define a HierarchyHighly selective result sets with sort queriesIndex anything, scales to any size
Primary Key
AttributesProductID type
Items
1 bookIDtitle author genre publisher datePublished ISBN
Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4
2
albumIDtitle artist genre label studio relesed producer
Dark Side of the Moon Pink Floyd Progressive Rock Harvest Abbey Road 3/1/73 Pink Floyd
albumID:trackIDtitle length music vocals
Speak to Me 1:30 Mason Instrumental
albumID:trackIDtitle length music vocals
Breathe 2:43 Waters, Gilmour, Wright Gilmour
albumID:trackIDtitle length music vocals
On the Run 3:30 Gilmour, Waters Instrumental
3
movieIDtitle genre writer producer
Idiocracy Scifi Comedy Mike Judge 20th Century Fox
movieID:actorIDname character image
Luke Wilson Joe Bowers img2.jpg
movieID:actorIDname character image
Maya Rudolph Rita img3.jpg
movieID:actorIDname character image
Dax Shepard Frito Pendejo img1.jpg
… or as Documents (JSON)
JSON data types (M, L, BOOL, NULL)Index root attributesFully Atomic Updates
Primary Key
AttributesProductID
Items
1id title author genre publisher datePublished ISBN
bookID Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-
4
2
id title artist genre Attributes
albumID Dark Side of the Moon Pink Floyd Progressive Rock
{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason",
vocals: "Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour, Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30",
music: ”Gilmour, Waters", vocals: "Instrumental"}]}
3
id title genre writer Attributes
movieID Idiocracy Scifi Comedy Mike Judge
{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71", character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob: "7/27/72", character: "Rita", image: "img1.jpg"},{ name:
"Dax Shepard", dob: "1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]
Partition 11000 WCUs
Partition K1000 WCUs
Partition M1000 WCUs
Partition N1000 WCUs
Votes Table
Candidate A Candidate B
Scaling bottlenecks
50,000/sec
70,000/sec
Voters
Provision 200,000 WCUs
Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4Candidate A_7 Candidate B_8
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table
Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4Candidate A_7 Candidate B_8
UpdateItem: “CandidateA_” + rand(0, 10)ADD 1 to Votes
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table
Votes Table
Shard aggregation
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_5
Candidate A_6 Candidate A_8
Candidate A_7 Candidate B_8
Periodic Process
Candidate ATotal: 2.5M
1. Sum2. Store Voter
Write Sharded IndexingUse for GSI’s with high volume aggregationsCommon when low cardinality attributes must be indexedScales to any size workload
Primary Key
AttributesProductID type
Items
1 bookIDtitle author genre publisher datePublished ISBN
Ringworld Larry Niven Science Fiction:1 Ballantine:50 Oct-70 0-345-02046-4
2
albumIDtitle artist genre label studio relesed producer
Dark Side of the Moon Pink Floyd Progressive Rock:99 Harvest Abbey Road 3/1/73 Pink Floyd:2
albumID:trackIDtitle length music vocals
Speak to Me 1:30 Mason Instrumental
albumID:trackIDtitle length music vocals
Breathe 2:43 Waters, Gilmour, Wright Gilmour
albumID:trackIDtitle length music vocals
On the Run 3:30 Gilmour, Waters Instrumental
3
movieIDtitle genre writer producer
Idiocracy Scifi Comedy:42 Mike Judge 20th Century Fox:17
movieID:actorIDname character image
Luke Wilson Joe Bowers img2.jpg
movieID:actorIDname character image
Maya Rudolph Rita img3.jpg
movieID:actorIDname character image
Dax Shepard Frito Pendejo img1.jpg
Conclusion
Keys to Success
• Select a workload that is a good fit for NoSQL• Understand the source data and access patterns• Test thoroughly and often• Plan on an iterative migration process
Thank you!