Large Scale Data Analytics on AWS

Post on 09-Dec-2016

240 views 3 download

Transcript of Large Scale Data Analytics on AWS

Large Scale Data Analytics on AWS

Ian Meyers, David Elliott, Denis Batalov

Solution Architects, EMEA

Agenda

2:00pm – 3:00pm - AWS & Analytics Services Overview

3:00pm – 4:30pm - Machine Learning with AWS Demonstration

4:30pm – 5:00pm - Break

5:00pm – 6:00pm - Data Analytics Platform Demonstration

WHY BUILD LARGE SCALE ANALYTICS

APPLICATIONS ON AWS?

It’s never been easier and less expensive to

collect, store, analyse & share data

We are constantly producing more data

From all types of industries

From a diverse range of sources

Discovery Development Delivery

Risk Marketing Reporting Trade

Sales

Broad Analytics Use In The AWS Cloud

CLOUD COMPUTING?

A broad and deep platform that helps customers

build sophisticated, scalable applications

What is Cloud Computing?

Cloud Computing

On demand Pay as you go

UniformAvailable

Utility

Cloud Computing

Infrastructure

Cloud Computing

Compute

Database

Load Balancing

Networking

Storage

Analytics

Messaging

Email

Monitoring

Content Distribution

Security

DNS

Cloud Computing

Availability Zones

Global Infrastructure

US-WEST (Oregon)

EU-WEST (Ireland)

ASIA PAC (Tokyo)

US-WEST (N. California)

SOUTH AMERICA

(Sao Paulo)

US-EAST (Virginia)

AWS GovCloud(US)

ASIA PAC (Sydney)

ASIA PAC (Singapore)

ASIA PAC (Beijing)

EU-CENTRAL (Frankfurt)

Availability Zones

Global Infrastructure

Accessible via API endpoints

Global Infrastructure

aws ec2 run-instances

--image-id ami-a813fadf

--count 3

--placement AvailabilityZone=eu-west-1a

--instance-type m3.medium

aws ec2 run-instances

--image-id ami-a813fadf

--count 5

--placement AvailabilityZone=eu-west-1c

—instance-type m3.large

Global Infrastructure

Traditional IT capacityCapacity

TimeYour actual capacity needs

Elastic Capacity (or lack of in this case)

Elasticity

On and Off Fast Growth

Variable peaks Predictable peaks

Elastic Capacity (or lack of in this case)

Elasticity

On and Off Fast Growth

Predictable peaksVariable peaks

Waste

Customer Dissatisfaction

Elastic Capacity (or lack of in this case)

Elasticity

On and Off Fast Growth

Predictable peaksVariable peaks

Elastic Capacity

Elasticity

From One Instance

Elasticity

To Thousands

Elasticity

And Back Again

Elasticity

NetworkingVPC

Direct Connect

Route 53

AnalyticsLambda

EC2 Container Service

Elastic Beanstalk

EMR Data Pipeline KinesisMachine Learning

ComputeEC2

Storage & Content DeliveryS3

Developer ToolsCodeCommit CodeDeploy CodePipeline

Management ToolsCloudWatch

CloudFormation

CloudTrail Config OpsWorksService Catalog

Security & IdentityIdentity & Access

ManagementDirectory Service

Trusted Advisor

CloudFront EFS GlacierStorage Gateway

Application ServicesAPI Gateway AppStream CloudSearch

Elastic Transcoder

SES SQS SWF

Device FarmMobile

Analytics

Mobile ServicesCognito SNS

DatabaseRDS DynamoDB ElastiCache RedShift WorkSpaces WorkDocs WorkMail

Enterprise Applications

Broad Range Of Services

https://aws.amazon.com/compliance/

Broadest Certification & Accreditations

DATA INGESTION & STORAGE

Makes it easy to establish a dedicated network connection from your premises to AWS

Establish private connectivity between AWS & your datacenter, office, or colocation environment

Reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience

The dedicated connection can be partitioned into multiple virtual interfaces using 802.1q VLANs

aws.amazon.com/directconnect

AWS Direct Connect

Data Ingestion & Storage

Amazon S3

Secure, durable, highly-scalable object storage

Accessible via a simple web services interface

Store & retrieve any amount of data

Use alone or together with other AWS services

Different Tiers: Standard, Infrequent Access,

Reduced Redundancy, Glacier

Data Ingestion & Storage

Elastic Block StoreHigh performance block storage

device

1GB to 1TB in size

Mount as drives to instances with

snapshot/cloning functionalities

IMAGE

Availability99.99%

Durability 99.999999999%

Is a Web StoreNot a file system

No Single Points of FailureEventually consistent

Paradigm Object store

Performance Very Fast

Redundancy Across Availability Zones

Security Public Key / Private Key

Pricing $0.03/GB/month

Typical use

case

Write once, read many

Limits 100 Buckets, Unlimited Storage, 5TB Objects

Simple Storage ServiceHighly scalable object storage for the internet

1 byte to 5TB in size

99.999999999% durability

Amazon S3 Multipart Upload

Large file(Size < 5TB)

Large object(Size < 5TB)

Split file into parts Send parts to S3 S3 rejoins the parts

Data IngestionData Ingestion & Storage

Simple Storage ServiceHighly scalable object storage

GlacierLong term object archive

Data Ingestion & Storage

Lifecycle Management

Persistent block level storage volumes

For use with Amazon EC2 instances

Automatically replicated within Availability Zones

Offer consistent and low-latency performance

EBS Snapshot(stored on S3) EBS

Volume

EC2Instance

aws.amazon.com/ebs

Data Ingestion & Storage

Amazon Elastic Block Store

AWS Import/Export

Move large amounts of data into and out of the AWS cloud using portable storage devices

Transfer your data directly onto and off of storage devices using Amazon’s high-speed internal network

For significant data sets, AWS Import/Export is often faster than Internet transfer and more cost effective than upgrading your connectivity

Supports upload & download from S3 & upload to Amazon EBS snapshots & Amazon Glacier Vaults

aws.amazon.com/importexport/

Data Ingestion & Storage

An on-premises software appliance connecting with cloud-based storage

Supports industry-standard storage protocols that work with your existing applications and workflows

Provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in Amazon S3 or Amazon Glacier

aws.amazon.com/storagegateway/

AWS Storage Gateway

Data Ingestion & Storage

A fully managed, cloud-based service for real-time data processing over large, distributed data streams

Continuously capture and store terabytes of data per hour from hundreds of thousands of sources

Emit data to other streams and other AWS services such as Amazon S3, Amazon Redshift, Amazon Elastic Map Reduce (Amazon EMR), Dynamo DB

Elastically Add and Remove Shards for Performance

Use Kinesis Worker Library to Process Data

aws.amazon.com/kinesis

AWS Kinesis

Data Ingestion & Storage

Millions of sources

producing 100s of TB per hour

FrontEnd

AuthenticationAuthorization

AZAZAZDurable, consistent replicas

across three AWS Availability Zones

Amazon Web Services RegionInexpensive: $0.0165 per million PUT Payload Units

(in EU Ireland)

Aggregate and archive to S3

Real-time dashboards and alarms

Machine learning algorithms

Aggregate analysis in Hadoop or a data warehouse

Ordered stream of events supporting multiple readers

Data Ingestion & Storage

AWS Kinesis Architecture

As a startup, using AWS has

allowed us to scale nicely and use resources without spending a lot

of capital.

Brian Langel

CTO

Dash

• Needed scale IT resources to create an app that would offer real-time information to drivers

• Developed and deployed the Dash application on the AWS Cloud

• Streams more than 1 TB of real-time data per day using Amazon Kinesis and processes billions of entries using Amazon DynamoDB

• Scaled up to support large traffic spikes–several thousand updates per second–in app usage

• Reduced operating costs by $200,000 per year

Using AWS, Dash Streams More Than 1 TB of Real-Time Data Per Day

Find out more here: aws.amazon.com/solutions/case-studies/dash/

Data Ingestion Ecosystem

Log Analysis

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

CloudWatch LoggingAutomated Log Ingestion from Amazon Linux

Agents

Create Log Streams, Groups of Logs, and Log

Event Types

Analyze Log Data using Search Patterns

Alarms on Application Log Events

Integration with RSysLog

STRUCTURED DATA MANAGEMENT

Database

Relational Database ServiceManaged Oracle, MySQL, SQL Server & Aurora

Dynamo DBManaged NOSQL Database

ElastiCacheManaged In Memory Caching

RDS Dynamo DB

Redshift Elasticache

Amazon RedshiftMassively Parallel Petabyte Scale Data Warehouse

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Database

Relational Database ServiceDatabase-as-a-Service

No need to install or manage database instances

Scalable and fault tolerant configurations

Integration with Data Pipeline

RDS Dynamo DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Database

DynamoDBProvisioned throughput NoSQL database; single-

digit millisecond latency at any scale

Fast, predictable, configurable performance

Fully distributed, fault tolerant HA architecture

Supports both document, key-value and graph

Integration with EMR & Hive

RDS Dynamo DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

• Writes• Writes are acknowledged

(committed) once they exist in at least two physical data centers

• Writes are persisted to SSD

• Reads• Tunable for Application

Requirements

• No reduction in durability or consistency in order to achieve throughput

Dynamo Consistency

Eventually Consistent Read Strongly Consistent Read

Stale Values reads possible No Stale Values read

Highest Throughput Lower Potential Throughput

√ √

Database

RDS Dynamo DB

Redshift Elasticache

ElastiCacheIn Memory Caching

Memcached or Redis

Automatic Node Failover / Replacement

Multi-AZ Standby

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Database

RedshiftManaged Massively Parallel Petabyte Scale Data

Warehouse

Streaming Backup/Restore to S3

Load data from S3, DynamoDB and EMR

Extensive Security Features

Scale from 160 GB -> 2 PB Online

RDS Dynamo DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Amazon Redshift parallelizes and distributes

everything

Query

Load

Backup

Restore

Resize

ComputeNode

ComputeNode

ComputeNode

LeaderNode

Common BI Tools

JDBC/ ODBC

10GigE Mesh

Redshift lets you start small and grow big

Small Nodes: (dc1.l & ds2.xl)

3 spindles, 15-30GiB RAM 2 or 4 virtual cores, 10GigE

Single Node (160GB SSD or 2TB Magnetic)

Cluster 2-32 Nodes (320GB SSD – 64TB Magnetic)

Large Nodes: (dc1.8xl & ds2.8xl)

24 spindles, 120-244GiB RAM, 2.56TB SSD or 16TB Magnetic, 16 or 32 virtual cores, 10GigE

Cluster 2-100 Nodes (5TB SSD – 1.6PB Magnetic)

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

L

COMPLEX ANALYTICS

Elastic MapReduceManaged, elastic Hadoop (1.x & 2.x) cluster

Integrates with S3, DynamoDB and Redshift

Install Storm, Spark & Shark, Hive, Pig, Impala &

End User Tools Automatically

Support for Spot Instances

Integrated HBase NOSQL Database

Analytics

Elastic MapReduce

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Analytics

Analytics languages/enginesData management

AmazonRedshift

AmazonKinesis

AmazonS3

AmazonDynamoDB

AmazonRDSEMR

Data Sources

AWSData Pipeline

Ecosystem

S&P Capital IQ Uses AWS for Big Data Processing

Provides data to 4200+ top global investment firms

Launched Hadoop faster, Learned Hadoop faster

S3 Hadoop Cluster

http://aws.amazon.com/solutions/case-studies/sp-capital-iq

Event Processing

AWS LambdaFully Managed Event Processor

Node.js, Integrated AWS SDK & ImageMagick

Natively Compile & Install any Node.js modules

Specify Runtime RAM & Timeout

Automatically Scaled to support Event Volume

Events from S3, Dynamo DB, Kinesis & Lambda

Integrated CloudWatch Logging

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Analytics of the Internet of Things

Input Datanode: This could be a S3 bucket, RDS table, EMR Hive table, etc.

Activity: This is a data aggregation, manipulation, or copy that runs on a user-configured schedule.

Output Datanode: This supports all the same datasources as the input datanode, but they don’t have to be the same type.

Analytics Orchestration

Data PipelineAutomatically Provision EC2 & EMR Resources

Manage Dependencies & Scheduling

Automatically Retry and Notify of Success &

Failure

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Output: S3 filePath: s3://trend-data/#{year-month-day}.csv

Activity: EMR TransformHive Query: user-metrics.hqlFrequency: Daily

Input: RDS TableTable: User-DemographicsSQL Precondition: “Select last_update from table“ > #{YY-MM-DD}

Input: DynamoDB TableTable: User-Event-Data-#{year-month}

Success Notification: metrics@example.comFailure Notification: emr-admin@example.comDelay Notification: : emr-admin@example.com

Sample Use Case

Train and optimize models on GBs of data

Batch process predictions

Real-time prediction API in one-click

No servers to provision or manage

Amazon Machine Learning

END USER REPORTING

End User Reporting

Redshift

S3

EMR

Dynamo DB

End User Reporting – Customer Issues

Realizing the “Virtual Desktop Dream”BYOD is increasingly popular

Workforces are increasingly diverse

Tablet adoption significant

Keeping all these desktops secure

End User Reporting - Workspaces

WorkSpaces

Fully Managed

Support Multiple Devices

Keep Data Secure and Available

Choose Software & Hardware

Pay as You Go

Corporate Directory Integration

No data stored on end-user device

Only Pixels delivered to users (PCoIP)

User volume backed by Amazon S3

INTEGRATED ANALYTICS

Integrated Analytics

Integrated Analytics

TBs of logs sent daily

Logs stored inAmazon S3

Amazon EMR clusters

Hive Metastoreon Amazon EMR

Interactive query

Integrated Analytics

Batch Processing

GBs of logs pushed to Amazon

S3 hourly

Daily Amazon EMR cluster using Hive to

process data

Input and output stored in Amazon S3

Load subset into Amazon Redshift

Integrated Analytics

Streaming Data Processing

Clickstream logs streamed to Kinesis

Logs stored in Amazon Kinesis

Amazon Kinesis Client Library

AWS Lambda

Amazon EMR

Amazon EC2

Integrated Analytics

Real Time Predictions

Your applicationAmazon

DynamoDB

+

Trigger event with Lambda+

Query for predictions with the Amazon Machine Learning

real-time API

Integrated Analytics

Batch Predictions

Structured datain Amazon Redshift

Load predictions intoAmazon Redshift Predictions

in Amazon S3

Query for predictions with

Amazon ML batch API

Your application -or-

Read prediction resultsdirectly from S3

aws.amazon.com/architecture/

Certification

aws.amazon.com/certification

Self-Paced Labs

aws.amazon.com/training/

self-paced-labs

Try products, gain new skills, and get hands-on practice working

with AWS technologies

aws.amazon.com/training

Training

Validate your proven skills and expertise

with the AWS platform

Build technical expertise to design

and operate scalable, efficient applications

on AWS

AWS Training & Certification

Large Scale Data Analytics with Amazon Web Services

Ian Meyers, Principal Solution Architect

October 28th, 2015

A customer has built a new Oil Pipeline, the North Sea Anglian

System (the Flying Scotsman) which ships Crude Oil from the North Sea to

London.

Built on Next Generation Sensor Technology, this Pipeline emits

operational metrics from every Sensor using Internet of Things technology.

With every measurement, each sensor can track the ambient

temperature, corrosivity, Pressure and Flow Rate, as well as physical

orientation of the segment of Pipeline being monitored.

Provide an Operational Analytics Pipeline which allows for real time

monitoring of the Pipeline, as well as historical analysis of all data.

Getting the Data In

Amazon EC2

Amazon Kinesis

MQTT

HTTPS

Application Services

Amazon Kinesis Managed Service for Real Time Big Data Processing

Create Streams to Produce & Consume Data

Elastically Add and Remove Shards for Performance

Use Kinesis Worker Library to Process Data

Integration with S3, Redshift and Dynamo DB

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Data

Sources

App.4

[Machine

Learning]

AW

S E

nd

po

int

App.1

[Aggregate &

De-Duplicate]

Data

Sources

Data Sources

Data

Sources

App.2

[Metric

Extraction]

S3

DynamoDB

Redshift

App.3

[Sliding

Window

Analysis]

Data

Sources

Availability Zone

Amazon Kinesis

Availability Zone

Availability

Zone

Shard 1

Shard 2

Shard N

Native Code Module to perform efficient writes to Multiple

Kinesis Streams

C++/Boost

Asynchronous Execution

Configurable Aggregation of Events

Introducing the Kinesis Producer Library

My Application KPL Daemon

PutRecord(s)

Kinesis Stream

Kinesis Stream

Kinesis Stream

Kinesis Stream

Async

KPL Aggregation

My Application KPL Daemon

PutRecord(s)

Kinesis Stream

Kinesis Stream

Kinesis Stream

Kinesis Stream

Async

1MB Max Event Size

Aggregate

100k 20k 500k 200k

40k 20k 40k

500k 100k 200k 20k

40k

40k

20k

Protobuf Header Protobuf Footer

KCL Libraries available for Java, Ruby,

Node, Go, and a Multi-Lang

Implementation with Native Python

support

All State Management in Dynamo DB

Kinesis Client Library

DynamoDB

AWS Analytics Demo

Long Term Durability

Amazon EC2

Amazon Kinesis

MQTT

HTTPS

Amazon EC2

Amazon S3

Amazon Kinesis

Amazon EC2

MQTT

HTTPS

Kinesis Connectors

• S3

Batch Write Files for Archive into S3

Extensible file naming

• Redshift

Once Written to S3, load to Redshift

Manifest support

User defined transformers

• DynamoDB

BatchPut append to table

User defined transformers

• Spark • Spark Streaming RDD’s

• Storm

Use Kinesis as a Spout

• ElasticSearch

Automatically index stream contents

Storm

S3

DynamoDB

Redshift

Kinesis

ElasticSearch

Connectors Architecture

Elastic Block Store High performance block storage

device

1GB to 1TB in size

Mount as drives to instances with

snapshot/cloning functionalities

IMAGE

Availability 99.99%

Durability 99.999999999%

Is a Web Store Not a file system

No Single Points of Failure Eventually consistent

Paradigm Object store

Performance Very Fast

Redundancy Across Availability Zones

Security Public Key / Private Key

Pricing $0.095/GB/month

Typical use case Write once, read many

Limits 100 Buckets, Unlimited Storage, 5TB Objects

Simple Storage Service Highly scalable object storage for the internet

1 byte to 5TB in size

99.999999999% durability

Amazon S3 provides near linear scalability

S3 Streaming Performance 100 VMs; 9.6GB/s; $26/hr

350 VMs; 28.7GB/s; $90/hr

34 secs per terabyte

GB/Second

Rea

de

r C

on

ne

ctions

S3 Performance & Scalability

AWS Analytics Demo

Real Time Analytics

Amazon EC2

Amazon S3

Amazon Kinesis

Amazon EC2

MQTT

HTTPS

Amazon EC2

Elastic Beanstalk

DynamoDB

Amazon S3

Amazon Kinesis CloudWatch

Amazon EC2

MQTT

HTTPS json

Deployment & Admin

Elastic Beanstalk 1 click deployment from Eclipse, Visual Studio and Git

Rapid deployment of applications

All AWS resources automatically created

Feature Details

Platform support Containers for Java, .net , Ruby and PHP

Resource creation Creates load balancer, instances, autoscaling and monitoring

automatically

Monitoring & Logs Integrated with Cloud Watch and consolidates server logs

Versioning Manage versions of applications and easily rollback deployments

Notifications Receive alerts on key events

Full resource access Access all underlying AWS resources as necessary

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

KCL Libraries available for Java, Ruby,

Node, Go, and a Multi-Lang

Implementation with Native Python

support

All State Management in Dynamo DB

Kinesis Client Library

DynamoDB

Kinesis Aggregators

Kinesis Aggregators provide a powerful and simple mechanism for creating Real Time Aggregates of data as it traverses Kinesis Simple Configuration

Create a configuration file defining the Aggregations required Run the application using Elastic Beanstalk

Data is persisted automatically to Dynamo DB, Dynamo Provisioning is fully managed Data can be graphed using CloudWatch Utilities to integrate Real Time Aggregates with Elastic MapReduce Hive or Amazon Redshift

Σ

Database

DynamoDB Provisioned throughput NoSQL database

Fast, predictable, configurable performance

Fully distributed, fault tolerant HA architecture

Integration with EMR & Hive

RDS Dynamo DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

CloudWatch Integration

Σ

AWS Analytics Demo

Massively Parallel Transformations

Amazon EC2

Elastic Beanstalk

DynamoDB

Amazon S3

Amazon Kinesis CloudWatch

Amazon EC2

MQTT

HTTPS json

Amazon EC2

Elastic Beanstalk

DynamoDB

Amazon S3

Amazon Kinesis

Amazon EMR

CloudWatch

Amazon EC2

MQTT

HTTPS json

Elastic MapReduce Managed, elastic Hadoop (1.x & 2.x) cluster

Integrates with S3, DynamoDB and Redshift

Install Storm, Spark & Shark, Hive, Pig, Impala &

End User Tools Automatically

Support for Spot Instances

Integrated HBase NOSQL Database

Analytics

Elastic MapReduce

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

AWS Analytics Demo

Accessible for Analysts & Dashboards

Amazon EC2

Elastic Beanstalk

DynamoDB

Amazon S3

Amazon Kinesis

Amazon EMR

CloudWatch

Amazon EC2

MQTT

HTTPS json

Amazon EC2

Elastic Beanstalk

DynamoDB

Amazon S3

Amazon Kinesis

Amazon Redshift

Amazon EMR

CloudWatch

Amazon EC2

MQTT

HTTPS json

AWS Lambda

S3 Events

AWS Lambda

SQS Queues

SNS Topics

Amazon S3 Bucket

RRS Object Lost

Object Deleted

Object Delete Marker Created

Object Created (Put)

Object Created (Post)

Object Created (Copy)

Object Created (Multi-Part)

Event Processing

AWS Lambda Fully Managed Event Processor

Node.js, Integrated AWS SDK & ImageMagick

Natively Compile & Install any Node.js modules

Specify Runtime RAM & Timeout

Automatically Scaled to support Event Volume

Events from S3, Dynamo DB, Kinesis & Lambda

Integrated CloudWatch Logging

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics

Database

Redshift Managed Massively Parallel Petabyte Scale Data

Warehouse

Streaming Backup/Restore to S3

Load data from S3, DynamoDB and EMR

Extensive Security Features

Scale from 160GB -> 2 PB Online

RDS Dynamo DB

Redshift Elasticache

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Analytics