Big Data Analytics with Amazon Web Services

102
Big Data Analytics with Amazon Web Services Dr. Matt Wood An Online Seminar. Tuesday 16th October.

Transcript of Big Data Analytics with Amazon Web Services

Page 1: Big Data Analytics with Amazon Web Services

Big Data Analyticsw i t h A m a z o n W e b S e r v i c e s

Dr. Matt Wood

An Online Seminar. Tuesday 16th October.

Page 2: Big Data Analytics with Amazon Web Services

Hello, and thank you.

Page 3: Big Data Analytics with Amazon Web Services

Big Data Analytics

An introduction

Page 4: Big Data Analytics with Amazon Web Services

Big Data Analytics

An introduction

The story of analytics on AWS

Page 5: Big Data Analytics with Amazon Web Services

Big Data Analytics

An introduction

The story of analytics on AWS

AWS Marketplace

Page 6: Big Data Analytics with Amazon Web Services

Big Data Analytics

An introduction

The story of analytics on AWS

AWS Marketplace

Success story: Brightcove

Page 7: Big Data Analytics with Amazon Web Services

INTRODUCING BIG DATA

1

Page 8: Big Data Analytics with Amazon Web Services

Data for competitive advantage.

Page 9: Big Data Analytics with Amazon Web Services

Customer segmentation, financial modeling, system analysis,line-of-sight,business intelligence.

Using data

Page 10: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 11: Big Data Analytics with Amazon Web Services

Cost of data generationis falling.

Page 12: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

lower cost, increased throughput

Page 13: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

HIGHLY CONSTRAINED

Page 14: Big Data Analytics with Amazon Web Services

Very high barrier to turning data into information.

Page 15: Big Data Analytics with Amazon Web Services

Move from a data generation challenge to

analytics challenge.

Page 16: Big Data Analytics with Amazon Web Services

Enter the Cloud.

Page 17: Big Data Analytics with Amazon Web Services

Remove the constraints.

Page 18: Big Data Analytics with Amazon Web Services

Enable data-driven innovation.

Page 19: Big Data Analytics with Amazon Web Services

Move to a distributed data approach.

Page 20: Big Data Analytics with Amazon Web Services

Maturation of two things.

Page 21: Big Data Analytics with Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Page 22: Big Data Analytics with Amazon Web Services

Maturation of two things.

Software for distributed storage and analysis

Infrastructure for distributed storage and analysis

Page 23: Big Data Analytics with Amazon Web Services

Frameworks for data-intensive workloads.

Software

Distributed by design.

Page 24: Big Data Analytics with Amazon Web Services

Platform for data-intensive workloads.

Infrastructure

Distributed by design.

Page 25: Big Data Analytics with Amazon Web Services

Support the data timeline.

Page 26: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

HIGHLY CONSTRAINED

Page 27: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 28: Big Data Analytics with Amazon Web Services

Lower the barrier to entry.

Page 29: Big Data Analytics with Amazon Web Services

Accelerate time to market and increase agility.

Page 30: Big Data Analytics with Amazon Web Services

Enable new business opportunities.

Page 31: Big Data Analytics with Amazon Web Services

Washington Post

Pinterest

NASA

Page 32: Big Data Analytics with Amazon Web Services

“AWS enables Pfizer to explore difficult or deep scientific questions in a timely, scalable manner and helps us make better decisions more quickly”

Michael Miller, Pfizer

Page 33: Big Data Analytics with Amazon Web Services

THE STORY OF ANALYTICS

2

Page 34: Big Data Analytics with Amazon Web Services

EC2

Utility computing. 6 years young.

Page 35: Big Data Analytics with Amazon Web Services

Embarrassingly parallel problems.

Scale out systems

Queue based distribution.

Small, medium and high scale.

Page 36: Big Data Analytics with Amazon Web Services
Page 37: Big Data Analytics with Amazon Web Services
Page 38: Big Data Analytics with Amazon Web Services
Page 39: Big Data Analytics with Amazon Web Services

EC2

Utility computing. 6 years young.

Cost optimization.

Page 40: Big Data Analytics with Amazon Web Services

Achieving economies of scale100%

Time

Page 41: Big Data Analytics with Amazon Web Services

Reserved capacity

Achieving economies of scale100%

Time

Page 42: Big Data Analytics with Amazon Web Services

Reserved capacity

Achieving economies of scale100%

Time

On-demand

Page 43: Big Data Analytics with Amazon Web Services

Reserved capacity

Achieving economies of scale100%

Time

On-demand

UNUSED CAPACITY

Page 44: Big Data Analytics with Amazon Web Services

Bid on unused EC2 capacity.

Spot Instances

Very large discount.

Perfect for batch runs.

Balance cost and scale.

Page 45: Big Data Analytics with Amazon Web Services

<$1000 per hour

Page 46: Big Data Analytics with Amazon Web Services

Pattern for distributed computing.

Map/reduce

Software frameworks such as Hadoop.

Write two functions. Scale up.

Page 47: Big Data Analytics with Amazon Web Services

Pattern for distributed computing.

Map/reduce

Software frameworks such as Hadoop.

Write two functions. Scale up.

Complex cluster configuration and management.

Page 48: Big Data Analytics with Amazon Web Services

Managed Hadoop clusters.

Amazon Elastic MapReduce

Easy to provision and monitor.

Write two functions. Scale up.

Optimized for S3 access.

Page 49: Big Data Analytics with Amazon Web Services

Input data

S3

Page 50: Big Data Analytics with Amazon Web Services

Elastic MapReduce

Code

Input data

S3

Page 51: Big Data Analytics with Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Page 52: Big Data Analytics with Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Elastic cluster

Page 53: Big Data Analytics with Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Elastic cluster

HDFS

Page 54: Big Data Analytics with Amazon Web Services

Elastic MapReduce

Code Name node

Input data

S3

Elastic cluster

HDFSQueries

+ BIVia JDBC, Pig, Hive

Page 55: Big Data Analytics with Amazon Web Services

Elastic MapReduce

Code Name node

OutputS3 + SimpleDB

Input data

S3

Elastic cluster

HDFSQueries

+ BIVia JDBC, Pig, Hive

Page 56: Big Data Analytics with Amazon Web Services

OutputS3 + SimpleDB

Input data

S3

Page 57: Big Data Analytics with Amazon Web Services
Page 58: Big Data Analytics with Amazon Web Services
Page 59: Big Data Analytics with Amazon Web Services
Page 60: Big Data Analytics with Amazon Web Services
Page 61: Big Data Analytics with Amazon Web Services
Page 62: Big Data Analytics with Amazon Web Services
Page 63: Big Data Analytics with Amazon Web Services
Page 64: Big Data Analytics with Amazon Web Services
Page 65: Big Data Analytics with Amazon Web Services
Page 66: Big Data Analytics with Amazon Web Services
Page 67: Big Data Analytics with Amazon Web Services

Performance

Page 68: Big Data Analytics with Amazon Web Services

Performance

Compute performance

Page 69: Big Data Analytics with Amazon Web Services

Intel Xeon E5-2670

Cluster Compute

10 gig E non-blocking network

Placement groupings

60.5 Gb

Page 70: Big Data Analytics with Amazon Web Services

Intel Xeon E5-2670

Cluster Compute

10 gig E non-blocking network

Placement groupings

60.5 Gb

+ GPU enabled instances

Page 71: Big Data Analytics with Amazon Web Services

Performance

Compute performance

Page 72: Big Data Analytics with Amazon Web Services

Performance

Compute performance

IO performance

Page 73: Big Data Analytics with Amazon Web Services

NoSQLUnstructured data storage.

Page 74: Big Data Analytics with Amazon Web Services

Predictable, consistent performance

DynamoDB

Unlimited storage

No schema for unstructured data

Single digit millisecond latencies

Backed on solid state drives

Page 75: Big Data Analytics with Amazon Web Services

...and SSDs for all.New Hi1 storage instances.

Page 76: Big Data Analytics with Amazon Web Services

2 x 1Tb SSDs

hi1.4xlarge

10 GigE network

HVM: 90k IOPS read, 9k to 75k write

PV: 120k IOPS read, 10k to 85k write

Page 77: Big Data Analytics with Amazon Web Services

Netflix

“The hi1.4xlarge configuration is about half the system cost for the same throughput.”

http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

Page 78: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 79: Big Data Analytics with Amazon Web Services

Performance + ease of use

Page 80: Big Data Analytics with Amazon Web Services

AWS MARKETPLACE

3

Page 81: Big Data Analytics with Amazon Web Services

Extend platform with partners

Page 82: Big Data Analytics with Amazon Web Services

Innovate on behalf of customers

Page 83: Big Data Analytics with Amazon Web Services

Remove undifferentiated heavy lifting

Page 84: Big Data Analytics with Amazon Web Services

AWS Marketplaceaws.amazon.com/marketplace

Page 85: Big Data Analytics with Amazon Web Services
Page 86: Big Data Analytics with Amazon Web Services
Page 87: Big Data Analytics with Amazon Web Services

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 88: Big Data Analytics with Amazon Web Services

Generation

Analytics & computation

Collaboration & sharing

Collection & storage

Page 89: Big Data Analytics with Amazon Web Services

Acunu ReflexApache Cassandra NoSQL database

Collection & storage

MongoDBWith and without EBS RAID storage

CouchbaseCommunity and Enterprise editions

ScaleArcMySQL load balancing

Page 90: Big Data Analytics with Amazon Web Services
Page 91: Big Data Analytics with Amazon Web Services
Page 92: Big Data Analytics with Amazon Web Services
Page 93: Big Data Analytics with Amazon Web Services
Page 94: Big Data Analytics with Amazon Web Services
Page 95: Big Data Analytics with Amazon Web Services

Generation

Analytics & computation

Collaboration & sharing

Collection & storage

Page 96: Big Data Analytics with Amazon Web Services

Generation

Analytics & computation

Collaboration & sharing

Analytics & computation

Collection & storage

Page 97: Big Data Analytics with Amazon Web Services

KarmaSphere Analyticsfor Amazon Elastic MapReduce

MapR M5Hadoop Distribution

MetamarketsEvent based data processing

Analytics & computation

Page 98: Big Data Analytics with Amazon Web Services

StackIQ Rocks+HPC clusters with MPI, Grid Engine

Univa Grid EngineOne click cluster deployment

QuantivoData association analytics

Analytics & computation

Page 99: Big Data Analytics with Amazon Web Services

Generation

Analytics & computation

Collaboration & sharing

Analytics & computation

Collection & storage

Page 100: Big Data Analytics with Amazon Web Services

Generation

Analytics & computation

Collection & storage

Collaboration & sharing

Page 101: Big Data Analytics with Amazon Web Services

Aspera Faspex20 Mbps data transfer

Collaboration & sharing

Page 102: Big Data Analytics with Amazon Web Services

SUCCESS STORY

4