初探AWS 平台上的 NoSQL 雲端資料庫服務

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

蔣宗恩, Technical Account Manager, AWS Enterprise Support

2017/06

Getting Started with NoSQL Cloud

Database Service on AWS

Agenda

1. What is NoSQL?

2. Relational (SQL) vs. non-relational?

3. What is DynamoDB?

4. DynamoDB Tables & Indexes

5. Scaling

6. Integration Capabilities

7. Demo

What is NoSQL?

Data volume since 2010

• 90% of stored data generated in

last 2 years

• 1 terabyte of data in 2010 equals

6.5 petabytes today

• Linear correlation between data

pressure and technical innovation

• No reason these trends will not

continue over time

Timeline of database technologyD

ata

Pre

ssu

re

What is NoSQL?

NoSQL is a term to describe data stores that trade full ACID

compliance for high availability and scale.

A

C

I

D

tomicity

onsistency

solation

urability

Single row/single item only

Eventual consistency

Dirty Read

Data replication on commodity storage

Why NoSQL?

• Dirty Reads?

• Eventual Consistency?

• Single row transactions only?

• Why would anybody trade ACID compliance for this?

Relational (SQL) vs. non-relational?

Relational vs. non-relational databases

Traditional SQL NoSQL

DB

Primary Secondary

Scale up

DB

DB

DBDB

DB DB

Scale out

Scale Up vs Scale Out

The CAP Theorem

Network partitions will happen in

distributed systems:

DB

DBDB

DB DB

Consistency

Availability

Partition Tolerance

C A

P

CA

APCP

SQL vs. NoSQL schema design NoSQL design optimizes for

compute instead of storage

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad-hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

What is DynamoDB?

RDBMSDynamoDB

Amazon’s Path to DynamoDB

Amazon DynamoDB

DynamoDB is a fully managed, NoSQL document and key value data store

Predictable Performance

Highly Available

Massively Scalable

Fully Managed

Low Cost

Consistently low latency at scale

PREDICTABLE

PERFORMANCE!

WRITES

Replicated continuously to 3

Availability Zones

Persisted to disk (custom SSD)

READS

Strongly or eventually consistent

No latency trade-off

Designed to

support 99.99%

of availability

Built for high

durability

High availability and durability

High availability and durability

DynamoDB automatically partition data

• Partition key spreads data (and workload) across partitions

• Automatically partitions as data grows and throughput needs

increase

High-scale

APP

Large number of unique hash keys

+

Uniform distribution of workload

across hash keys

Partition 1..N

Fully managed service = automated operations

DB hosted on premises DynamoDB

DynamoDB Tables & Indexes

DynamoDB table structureTable

Items

Attributes

Partitionkey

Sortkey

Mandatory

Key-value access pattern

Determines data distribution Optional

Model 1:N relationships

Enables rich query capabilities

All items for key==, <, >, >=, <=“begins with”“between”“contains”“in”sorted resultscountstop/bottom N values

00 55 A954 FFAA00 FF

Partition Keys

Id = 1

Name = Jim

Hash (1) = 7B

Id = 2

Name = Andy

Dept = Eng

Hash (2) = 48

Id = 3

Name = Kim

Dept = Ops

Hash (3) = CD

Key Space

Partition Key uniquely identifies an item

Partition Key is used for building an unordered hash index

Allows table to be partitioned for scale

Partition 3

Partition:Sort Key uses two attributes together to uniquely identify an Item

Within unordered hash index, data is arranged by the sort key

No limit on the number of items (∞) per partition key

Except if you have local secondary indexes

Partition:Sort Key

00:0 FF:∞

Hash (2) = 48

Customer# = 2

Order# = 10

Item = Pen

Customer# = 2

Order# = 11

Item = Shoes

Customer# = 1

Order# = 10

Item = Toy

Customer# = 1

Order# = 11

Item = Boots

Hash (1) = 7B

Customer# = 3

Order# = 10

Item = Book

Customer# = 3

Order# = 11

Item = Paper

Hash (3) = CD

55 A9:∞54:∞ AAPartition 1 Partition 2

Partitions are three-way replicated

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Id = 2

Name = Andy

Dept = Engg

Id = 3

Name = Kim

Dept = Ops

Id = 1

Name = Jim

Replica 1

Replica 2

Replica 3

Partition 1 Partition 2 Partition N

Local secondary index (LSI)

Alternate sort key attribute

Index is local to a partition key

A1

(partition)

A3

(sort)

A2

(item key)

A1

(partition)

A2

(sort)A3 A4 A5

LSIsA1

(partition)

A4

(sort)

A2

(item key)

A3

(projected)

Table

KEYS_ONLY

INCLUDE A3

A1

(partition)

A5

(sort)

A2

(item key)

A3

(projected)

A4

(projected)ALL

10 GB max per partition key, i.e.

LSIs limit the # of range keys!

Global secondary index (GSI)

Alternate partition and/or sort key

Index is across all partition keys

Use composite sort keys for compound indexes

A1

(partition)A2 A3 A4 A5

A5

(partition)

A4

(sort)

A1

(item key)

A3

(projected)INCLUDE A3

A4

(partition)

A5

(sort)

A1

(item key)

A2

(projected)

A3

(projected)

ALL

A2

(partition)

A1

(itemkey)KEYS_ONLY

GSIs

TableRCUs/WCUs provisioned

separately for GSIs

Online indexing

How do GSI updates work?

Table

Primary

tablePrimary

tablePrimary

tablePrimary

table

Global

Secondary

Index

Client

2. Asynchronous

update (in progress)

If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI?

LSI can be modeled as a GSI

If data size in an item collection > 10 GB, use GSI

If eventual consistency is okay for your scenario,

use GSI!

Scaling

Scaling

Throughput

Provision any amount of throughput to a table

Size

Add any number of items to a table

- Max item size is 400 KB

- LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Throughput

Provisioned at the table level

Write capacity units (WCUs) are measured in 1 KB per second

Read capacity units (RCUs) are measured in 4 KB per second

- RCUs measure strictly consistent reads

- Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Partitioning Math

In the future, these details might change…

Number of Partitions

By Capacity (Total RCU / 3000) + (Total WCU / 1000)

By Size Total Size / 10 GB

Total Partitions CEILING(MAX (Capacity, Size))

Partitioning Example

Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67

WCUs per partition = 500/3 = 166.67

Data/partition = 10/3 = 3.33 GB

RCUs and WCUs are uniformly

spread across partitions

Number of Partitions

By Capacity (5000 / 3000) + (500 / 1000) = 2.17

By Size 8 / 10 = 0.8

Total Partitions CEILING(MAX (2.17, 0.8)) = 3

What causes throttling?

If sustained throughput goes beyond provisioned throughput per partition

Non-uniform workloads

Hot keys/hot partitions

Very large bursts

Mixing hot data with cold data

Use a table per time period

From the example before:

Table created with 5000 RCUs, 500 WCUs

RCUs per partition = 1666.67

WCUs per partition = 166.67

If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB

may throttle requests

- Solution: Increase provisioned throughput

To learn more, please attend:

Deep Dive on Amazon DynamoDB 3:55 p.m.– 4:35 p.m.

Integration Capabilities

DynamoDB Streams

Stream of table update

Asynchronous

Exactly once

Strictly ordered

24-hr lifetime per item


DynamoDB Triggers

Implement as AWS lambda

function

Your code scale automatically

Java, Node.js and Python

IAM

Fine-grained access control

via AWS IAM

Table-,Item, and attribute- level

access control


ElasticSearch integration

Full-text queries

Add search to mobile app

Monitor IoT sensor status

code

App telemetry pattern

discovery using regular

expressions

Reference Architecture

Architecture of a simple serverless web

application

AWS Identity &

Access

ManagementDynamoDBAPI

Gateway

JavaScript

users

Amazon

S3 Bucket

internet

Lambda

bit.ly/NoSQLDesignPatterns

初探AWS 平台上的 NoSQL 雲端資料庫服務

Technology

Transcript of 初探AWS 平台上的 NoSQL 雲端資料庫服務