Big data on AWS

25
Big Data on AWS Johann Romefort

description

What is Big Data and how AWS fits well for Big Data problems

Transcript of Big data on AWS

Page 1: Big data on AWS

Big Data on AWSJohann Romefort

Page 2: Big data on AWS

Agenda

• What is Big Data?

• What is AWS?

• Presenting the tools: How Big Data and AWS fit together

Page 3: Big data on AWS

What is Big Data?

• It’s at the intersection of data’s 3 V:

• Velocity (Batch / Real time / Streaming)

• Volume (Terabytes/Petabytes)

• Variety (structure/semi-structured/unstructured)

Page 4: Big data on AWS

Why is everybody talking about it?

• Cost of generation of data has gone down

• By 2015, 3B people will be online, pushing data volume created to 8 zettabytes

• More data = More insights = Better decisions

• Ease and cost of processing is falling thanks to cloud platforms

Page 5: Big data on AWS

Data flow and constraintsGenerate

Ingest / Store

Process

Visualize / Share

The 3 V involve heterogeneity and

make it hard to achieve those steps

Page 6: Big data on AWS

What is AWS?

• AWS is a cloud computing platform

• On-demand delivery of IT resources

• Pay-as-you-go pricing model

Page 7: Big data on AWS

Cloud Computing

+ +

StorageCompute Networking

Adapts dynamically to ever changing needs to stick closely

to user infrastructure and applications requirements

Page 8: Big data on AWS

How does AWS helps with Big Data?

• Remove constraints on the ingesting, storing, and processing layer and adapts closely to demands.

• Provides a collection of integrated tools to adapt to the 3 V’s of Big Data

• Unlimited capacity of storage and processing power fits well to changing data storage and analysis requirements.

Page 9: Big data on AWS

Computing Solutions for Big Data on AWS

Kinesis

EC2 EMR

Redshift

Page 10: Big data on AWS

Computing Solutions for Big Data on AWS

EC2All-purpose computing instances.Dynamic Provisioning and resizingLet you scale your infrastructure at low cost

Use Case: Well suited for running custom or proprietary application (ex: SAP Hana, Tableau…)

Page 11: Big data on AWS

Computing Solutions for Big Data on AWS

EMR

‘Hadoop in the cloud’

Adapt to complexity of the analysis and volume of data to process

Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)

Page 12: Big data on AWS

Computing Solutions for Big Data on AWS

Kinesis

Stream Processing

Real-time data

Scale to adapt to the flow of inbound data

Use Case: Complex Event Processing, click streams, sensors data, computation over window of time

Page 13: Big data on AWS

Computing Solutions for Big Data on AWS

RedShift

Data Warehouse in the cloud

Scales to Petabytes

Supports SQL Querying

Start small for just $0.25/h

Use Case: BI Analysis, Use of ODBC/JDBC legacy software to analyze or visualize data

Page 14: Big data on AWS

Storage Solution for Big Data on AWS

DynamoDB RedShift

S3 Glacier

Page 15: Big data on AWS

Storage Solution for Big Data on AWS

DynamoDB

NoSQL DatabaseConsistent Low latency access Column-base flexible data model

Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)

Page 16: Big data on AWS

Storage Solution for Big Data on AWS

S3

Use Case: Backups and Disaster recovery, Media storage, Storage for data analysis

Versatile storage system

Low-cost

Fast retrieving of data

Page 17: Big data on AWS

Storage Solution for Big Data on AWS

Glacier

Use Case: Storing raw logs of data. Storing media archives. Magnetic tape replacement

Archive storage of cold data

Extremely low-cost

optimized for data infrequently accessed

Page 18: Big data on AWS

What makes AWS different when it comes to big data?

Page 19: Big data on AWS

Given the 3V’s a collection of tools is most of the time needed for your data processing and storage.

Integrated Environment for Big Data

AWS Big Data solutions comes integrated with each others alreadyAWS Big Data solutions also integrate with the whole AWS ecosystem (Security, Identity Management, Logging, Backups, Management Console…)

Page 20: Big data on AWS

Example of products interacting with each other.

Page 21: Big data on AWS

Tightly integrated rich environment of tools

On-demand scaling sticking to processing requirements

+

=Extremely cost-effective and easy to deploy solution for big data needs

Page 22: Big data on AWS

• Error Detection: Real-time detection of hardware problems

• Optimization and Energy management

Use Case: Real-time IOT Analytics

Gathering data in real time from sensors deployed in factory and send them for immediate processing

Page 23: Big data on AWS

First Version of the infrastructure

Aggregate

Sensors data

nodejs stream

processor

On customer site

evaluate rules over time window

in-house hadoop cluster

mongodb

feed algorithmwrite raw data for further

processing

backup

Page 24: Big data on AWS

Second Version of the infrastructure

Aggregate

Sensors data

On customer site

evaluate rules over time window

write raw data for

archiving

Kinesis RedShift for BI

analysis

Glacier

Page 25: Big data on AWS

Thank [email protected]

follow me on @romefort