Webinar: Bringing OpenGamma to the Cloud - Rethinking Risk Analytics
Rethinking the database for the cloud (iJAWS)
-
Upload
rasmus-ekman -
Category
Technology
-
view
112 -
download
4
description
Transcript of Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloudAWS database services best practices
Amazon Data Services JapanRasmus Ekman
Traditional architecture
Client
Application
Relational database
Problems with this approach
Client
Application
Relational database
• It doesn’t scale• Management is hard• High cost• Low performance• Migration is difficult
Why do we get these problems?When all you have is a hammer, everything looks like a nail
Client
Application
Relational database
Rethinking the architecture
Client
Application
Data
Search
NoSQL SQL DWH
Cache
Hadoop
BlobStore
ETL
AWS service and use case mapping
DataSearch NoSQL SQL DWHCache Hadoop
Blob store
ETL
Amazon S3
Amazon EMR
DynamoDB
Amazon RDS
ElastiCache
Amazon Redshift
AWS Data Pipeline
Amazon CloudSearch
Sample references
Social gaming
Autoscaling
Elastic Loadbalancer
Mobile client
DynamoDB
Amazon S3
Log files
Amazon Elastic
MapReduce
31
2
Social gaming have a large amount of transactions, which all require high performance and extreme scalability
① Player data is stored in Amazon DynamoDB, which can scale both in terms of data volume and performance. Long term usage log files are sent in parallel to S3 for unlimited and cheap storage. Big data analytics are done in EMR, which can be easily integrated with both DynamoDB and S3.
1
2
3
E-commerce site
Autoscaling
End users
RDS(Master)
ElastiCache
4 1
2
High availability, search performance and flexibility to rapidly change data structures to fit new business requirements.① For high performance, low latency responses, cache in Elasticache first② Order and customer information stored in a traditional, but fault tolerant RDS. 商 Item meta data, such as color, title etc are all stored in DynamoDB for a very flexible data schema④ For scalable search meta data is indexed into CloudSearch, which can handle full text search easily
1
2
3
RDS(Slave)
Amazon CloudSearc
h
Amazon DynamoDB
3
4
How do I know which service to pick?The “data temperature” method
What is “data temperature”?
Data ?
http://www.amazon.co.jp/dp/B0016V9FCQ
Data temperature
Hot Warm Cold
Volume MB ~ GB GB ~ TB PB
Item size B ~ KB KB ~ MB KB ~ TB
Latency ms ms-s min-hr
Durability Low-high High Very high
Request rate Very high High Low
Cost/GB $$~$ $~¢¢ ¢
The temperature of the data will vary depending on its format and use.
The AWS service heat map
LowData volume
Latency
Cost/GB
Request
Amazon ElastiCach
e
Amazon CloudSearch
Amazon RDS
Amazon DynamoDB
Amazon S3
Amazon RedShift
Amazon EMR
Low
High
High
Low
Low
High
High
How do I know which service to pick?The cost estimation method
Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?
• “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…”
Request ratewrites/s
Object sizebytes
Total sizeGB/month
Objects per month
300 2048 1483 777,600,000
Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?
• Time for …
※ : http://calculator.s3.amazonaws.com/index.html?lng=ja_JP
Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
300 2048 1483 777,600,000
DynamoDB
Monthly cost : $669.56
Amazon S3
Monthly cost : $4325.33<
Choosing service based on cost estimateExample: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
Scenario 1
300 2048 1483 777,600,000
Scenario 2
300 32,768 23,730 777,600,000
DynamoDB win
Amazon S3 win
Summary
Summary
• The era of relational database only onpremises architecture is over.
• Performance, reliability, and scalability can all be improved by the cloud, but choosing the right architecture is must.
• There are several ways of choosing the right service for the job– Use the “data temperature” and use case– Use the reverse cost estimate method– Ask AWS sales
When in doubt, contact us
https://aws.amazon.com/jp/contact-us/
APPENDIXAWS database services - introduction and best practices
Amazon RDSA fully managed relational database service
• Create and scale with a few clicks
• Automated backups every 5 minutes for DR
• Manual snapshot feature
Availability Zone A Availability Zone B
Master SlaveData synch
Automatic failoverAutomated
backup
• Automated security patching
• 4 supported engines• Monitoring and
automatic recovery
Amazon RDSA fully managed relational database service
When to use• Transactions• Complex queries• Medium to high query/write
rate– Up to 30 K IOPS (15 K reads +
15K writes)
• 100s of GB to low TBs• Workload can fit in a single
node• High durability
and not to use• Massive read/write rates
– Example: 150 K write requests per second
• Data size or throughput demands
• sharding– Example: 10 s or 100 s of
terabytes
• Simple Get/Put and queries that a NoSQL can handle
• Complex analytics
DynamoDBFully managed NoSQL service• Easy administration and
high availability– No SPOF– Data is replicated into 3
availability zones– Storage scales, and data is
automatically partioned
• No limit on storage– Only pay for the storage you
use– No need to add nodes or
disks as storage grows Client
Region
DynamoDBFully managed NoSQL service
• Fast and predictable performance
• Seamless/massive scale• Autosharding• Consistent/low latency• No size or throughput limits• Very high durability• Key-value or simple queries
• Need multi-item/row or cross table transactions
• Need complex queries, joins
• Need real-time analytics on historic data
• Storing cold data
When to use and not to use
Amazon RedshiftFully managed data warehouse service• DWH as a Service: Amazon Redshift
is a fast, fully managed, petabyte-scale data warehouse service
• Scalable: 160GB ~ Petabytes
• Fast: Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources.
• Low cost: No initial cost, no license fees, and only pay for what you use.
+nodes
BI tools
リーダーノード
Compute
node
Compute
node
Compute
node
JDBC/ODBC
10GigE Mesh
SQL end point:• Parallel queries• Create results
S3, DynamoDB, EMR integration
Amazon RedshiftFully managed data warehouse service
• Information analysis and reporting• Complex DW queries that
summarize historical data• Batched large updates e.g. daily
sales totals• 10s of concurrent queries• 100s GB to PB• Compression• Column based• Very high durability
• OLTP workloads– 1000s of concurrent
users– Large number of
singleton updates
When to use and not to use
Amazon S3low cost, highly reliable object storage service
Datacenter A
Datacenter C
Datacenter B
File A
File B
File C
User side Infrastructure side• Never lose data with
99.99999999999% reliability• Data automatically replicated• Choose from over 9 regions
globally
• Only put data, with no need to worry about scalability, infrastructure, volume expansion etc.
• Only pay for what you useExample : 1GB/Month – ~3yen
Amazon S3low cost, highly reliable object storage service
• Store large objects• Key-value store - Get/Put/List• Unlimited storage• Versioning• Very high durability
– 99.999999999%
• Very high throughput (via parallel clients)
• Use for storing persistent data– Backups– Source/target for EMR– Blob store with metadata in SQL or NoSQL
• Complex queries• Very low latency (ms)• Search• Read-after-write
consistency for overwrites
• Need transactions
When to use and not to use