Rethinking the database for the cloud (iJAWS)

Rethinking the database for the cloudAWS database services best practices

Amazon Data Services JapanRasmus Ekman

Traditional architecture

Client

Application

Relational database

Problems with this approach

Client

Application

Relational database

• It doesn’t scale• Management is hard• High cost• Low performance• Migration is difficult

Why do we get these problems?When all you have is a hammer, everything looks like a nail

Client

Application

Relational database

Rethinking the architecture

Client

Application

Data

Search

NoSQL SQL DWH

Cache

Hadoop

BlobStore

ETL

AWS service and use case mapping

DataSearch NoSQL SQL DWHCache Hadoop

Blob store

ETL

Amazon S3

Amazon EMR

DynamoDB

Amazon RDS

ElastiCache

Amazon Redshift

AWS Data Pipeline

Amazon CloudSearch

Sample references

Social gaming

Autoscaling

Elastic Loadbalancer

Mobile client

DynamoDB

Amazon S3

Log files

Amazon Elastic

MapReduce

31

2

Social gaming have a large amount of transactions, which all require high performance and extreme scalability

① 　 Player data is stored in Amazon DynamoDB, which can scale both in terms of data volume and performance. Long term usage log files are sent in parallel to S3 for unlimited and cheap storage. Big data analytics are done in EMR, which can be easily integrated with both DynamoDB and S3.

1

2

3

E-commerce site

Autoscaling

End users

RDS(Master)

ElastiCache

4 1

2

High availability, search performance and flexibility to rapidly change data structures to fit new business requirements.① 　 For high performance, low latency responses, cache in Elasticache first② 　 Order and customer information stored in a traditional, but fault tolerant RDS. 　商 Item meta data, such as color, title etc are all stored in DynamoDB for a very flexible data schema④ 　 For scalable search meta data is indexed into CloudSearch, which can handle full text search easily

1

2

3

RDS(Slave)

Amazon CloudSearc

h

Amazon DynamoDB

3

４

How do I know which service to pick?The “data temperature” method

What is “data temperature”?

Data 　　　　　？

http://www.amazon.co.jp/dp/B0016V9FCQ

Data temperature

Hot Warm Cold

Volume MB ～ GB GB ～ TB PB

Item size B ～ KB KB ～ MB KB ～ TB

Latency ms ms-s min-hr

Durability Low-high High Very high

Request rate Very high High Low

Cost/GB $$~$ $~¢¢ ¢

The temperature of the data will vary depending on its format and use.

The AWS service heat map

LowData volume

Latency

Cost/GB

Request

Amazon ElastiCach

e

Amazon CloudSearch

Amazon RDS

Amazon DynamoDB

Amazon S3

Amazon RedShift

Amazon EMR

Low

High

High

Low

Low

High

High

How do I know which service to pick?The cost estimation method

Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?

• “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…”

Request ratewrites/s

Object sizebytes

Total sizeGB/month

Objects per month

300 2048 1483 777,600,000


• Time for …

※ ： http://calculator.s3.amazonaws.com/index.html?lng=ja_JP


Request rate Object size Total size Objects

300 2048 1483 777,600,000

DynamoDB

Monthly cost ： $669.56

Amazon S3

Monthly cost ： $4325.33＜


Request rate Object size Total size Objects

Scenario 1

300 2048 1483 777,600,000

Scenario 2

300 32,768 23,730 777,600,000

DynamoDB win

Amazon S3 win

Summary

Summary

• The era of relational database only onpremises architecture is over.

• Performance, reliability, and scalability can all be improved by the cloud, but choosing the right architecture is must.

• There are several ways of choosing the right service for the job– Use the “data temperature” and use case– Use the reverse cost estimate method– Ask AWS sales

When in doubt, contact us

https://aws.amazon.com/jp/contact-us/

APPENDIXAWS database services - introduction and best practices

Amazon RDSA fully managed relational database service

• Create and scale with a few clicks

• Automated backups every 5 minutes for DR

• Manual snapshot feature

Availability Zone A Availability Zone B

Master SlaveData synch

Automatic failoverAutomated

backup

• Automated security patching

• 4 supported engines• Monitoring and

automatic recovery

Amazon RDSA fully managed relational database service

When to use• Transactions• Complex queries• Medium to high query/write

rate– Up to 30 K IOPS (15 K reads +

15K writes)

• 100s of GB to low TBs• Workload can fit in a single

node• High durability

and not to use• Massive read/write rates

– Example: 150 K write requests per second

• Data size or throughput demands

• sharding– Example: 10 s or 100 s of

terabytes

• Simple Get/Put and queries that a NoSQL can handle

• Complex analytics

DynamoDBFully managed NoSQL service• Easy administration and

high availability– No SPOF– Data is replicated into 3

availability zones– Storage scales, and data is

automatically partioned

• No limit on storage– Only pay for the storage you

use– No need to add nodes or

disks as storage grows Client

Region

DynamoDBFully managed NoSQL service

• Fast and predictable performance

• Seamless/massive scale• Autosharding• Consistent/low latency• No size or throughput limits• Very high durability• Key-value or simple queries

• Need multi-item/row or cross table transactions

• Need complex queries, joins

• Need real-time analytics on historic data

• Storing cold data

When to use and not to use

Amazon RedshiftFully managed data warehouse service• DWH as a Service: Amazon Redshift

is a fast, fully managed, petabyte-scale data warehouse service

• Scalable: 160GB ～ Petabytes

• Fast: Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources.

• Low cost: No initial cost, no license fees, and only pay for what you use.

+nodes

BI tools

リーダーノード

Compute

node

Compute

node

Compute

node

JDBC/ODBC

10GigE Mesh

SQL end point:• Parallel queries• Create results

S3, DynamoDB, EMR integration

Amazon RedshiftFully managed data warehouse service

• Information analysis and reporting• Complex DW queries that

summarize historical data• Batched large updates e.g. daily

sales totals• 10s of concurrent queries• 100s GB to PB• Compression• Column based• Very high durability

• OLTP workloads– 1000s of concurrent

users– Large number of

singleton updates


Amazon S3low cost, highly reliable object storage service

Datacenter A

Datacenter C

Datacenter B

File A

File B

File C

User side Infrastructure side• Never lose data with

99.99999999999% reliability• Data automatically replicated• Choose from over 9 regions

globally

• Only put data, with no need to worry about scalability, infrastructure, volume expansion etc.

• Only pay for what you useExample ： 1GB/Month – ~3yen

Amazon S3low cost, highly reliable object storage service

• Store large objects• Key-value store - Get/Put/List• Unlimited storage• Versioning• Very high durability

– 99.999999999%

• Very high throughput (via parallel clients)

• Use for storing persistent data– Backups– Source/target for EMR– Blob store with metadata in SQL or NoSQL

• Complex queries• Very low latency (ms)• Search• Read-after-write

consistency for overwrites

• Need transactions


Rethinking the database for the cloud (iJAWS)

Technology

Transcript of Rethinking the database for the cloud (iJAWS)