Workshop part2 – Big Data

Post on 27-Nov-2014

397 views 1 download

description

Webit 2014 AWS workshop

Transcript of Workshop part2 – Big Data

THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN

DERIVE FROM IT

THE COST OF DATA GENERATION IS FALLING

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Lower cost, higher throughput

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Lower cost, higher throughput

Highlyconstrained

+ ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND !

= REMOVE CONSTRAINTS

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

AWS Import / Export AWS Direct Connect

Inbound data transfer is freeMultipart upload to S3Physical mediaAWS Direct Connect

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Amazon S3,Amazon Glacier,

Amazon DynamoDB,Amazon RDS,

Amazon Redshift,AWS Storage Gateway,Data on Amazon EC2

AMAZON S3 SIMPLE STORAGE SERVICE

CASE STUDY:SPOTIFY ADDS 20,000 TRACKS/DAY TO ITS CATALOGUE

AMAZON DYNAMODB"

HIGH-PERFORMANCE, FULLY MANAGED NoSQL DATABASE SERVICE

DURABLE & AVAILABLECONSISTENT, DISK-ONLY

WRITES (SSD)

LOW LATENCYAVERAGE READS < 5MS,

WRITES < 10MS

!

!

!

NO ADMINISTRATION

CASE STUDY:SHAZAM SUPPORTED 500,000 WRITES/SECDURING SUPER BOWL

AMAZON REDSHIFT"

FULLY MANAGED, PETA-BYTE SCALE DATAWAREHOUSE ON AWS

30 MINUTES DOWN TO

12 SECONDS

Extra Large Node (HS1.XL) !

Single Node (2 TB)

!

Cluster 2-32 Nodes (4 TB – 64 TB)

AMAZON REDSHIFT LETS YOU START SMALL AND GROW BIG

Eight Extra Large Node (HS1.8XL)Cluster 2-100 Nodes (32 TB – 1.6 PB)

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

XL XL XL XL XL XL XL XL

JDBC/ODBC  !

!

GENERATE ➔ STORE ➔ ANALYZE ➔ SHAREAmazon EC2

Amazon Elastic MapReduce

AMAZON EC2 ELASTIC COMPUTE CLOUD

3 HOURSFOR $4828.85/hr

Instead of $20+ MILLIONS in infrastructure

GPU INSTANCES"!

G2"CG1 

1x NVIDIA Kepler GK104 8 vCPU (Intel Xeon E5-2670)

2x NVIDIA Fermi M2050 16 vCPU (Intel Xeon X5570)

$0.65/h

$2.10/h

ON A SINGLE INSTANCE

COMPUTE TIME: 4hCOST: 4h x $2.1 = $8.4

ON MULTIPLE INSTANCES

COMPUTE TIME: 1hCOST: 1h x 4 x $2.1 = $8.4

AMAZON ELASTIC MAPREDUCE

HADOOP AS A SERVICE

CASE STUDY:"WITH AMAZON EMR WE CAN ANALYZE 100% OF THE DATA,NOT JUST A SAMPLE" - Sanjeevan Bala, Head of Data Planning & Analytics, Channel 4

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Amazon S3,Amazon DynamoDB,

Amazon RDS,Amazon Redshift,

Data on Amazon EC2

PUBLIC DATA SETShttp://aws.amazon.com/publicdatasets

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

BATCHPROCESSING

GENERATE ➔ ➔ SHARESTREAM

PROCESSING

AMAZON KINESISREAL-TIME DATA STREAM PROCESSING

Hourly server logs: how your systems went wrong an hour ago

Weekly / Monthly Bill: What you spent this past billing cycle

Daily customer report from your website: tells you what deal or ad to try next time

Daily fraud reports: tells you if there was fraud yesterday

Daily business reports: tells me how customers used AWS services yesterday

Real-time metrics: what just went wrong now

Real-time spending alerts/caps: guaranteeing you can’t overspend

Real-time analysis: what to offer the current customer now

Real-time detection: blocks fraudulent use now

Fast ETL into Amazon Redshift: how are customers using services now

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE

Amazon S3,Amazon DynamoDB,

Amazon RDS,Amazon Redshift,

Data on Amazon EC2

Amazon EC2 Amazon Elastic

MapReduce

Amazon S3,Amazon Glacier,

Amazon DynamoDB,Amazon RDS,

Amazon Redshift,AWS Storage Gateway,Data on Amazon EC2

AWS Import / Export AWS Direct Connect

GENERATE ➔ ➔ SHARESTREAM

PROCESSING

GENERATE ➔ ➔ SHARESTREAM

PROCESSING

Amazon S3,Amazon DynamoDB,

Amazon RDS,Amazon Redshift,

Data on Amazon EC2

Amazon KinesisStream Processing

on Amazon EC2

FROM DATA TOACTIONABLE INFORMATION