Introduction to Amazon Redshift

Post on 11-May-2015

4.183 views 5 download

Tags:

description

An introduction to Amazon Redshift.

Transcript of Introduction to Amazon Redshift

Introducing Amazon

Redshift

David Pearson Business Development Manager

http://aws.amazon.com/resources/databaseservices/webinars

What is AWS?

Compute Storage

AWS Global Infrastructure

Database

Application Services

Deployment & Administration

Networking

Amazon DynamoDB Fast, Predictable, Highly-Scalable NoSQL Data Store

Amazon RDS Managed Relational Database Service for

MySQL, Oracle and SQL Server

Amazon ElastiCache In-Memory Caching Service

Amazon Redshift Fast, Powerful, Fully Managed, Petabyte-Scale

Data Warehouse Service

Compute Storage

AWS Global Infrastructure

Database

Application Services

Deployment & Administration

Networking

AWS Database Services

Scalable High Performance Application Storage in the Cloud

Why Data Warehousing?

No upfront costs, pay as you go

Really fast performance at a really low price

Open and flexible with support for popular tools

Easy to provision and scale up massively

Amazon Redshift

data warehouse service

petabyte-scale fast and fully managed

objectives design and build a petabyte-scale data warehouse service

Amazon Redshift

A Whole Lot Simpler

A Lot Cheaper

A Lot Faster

Redshift Dramatically Reduces I/O

• Direct-attached storage • Large data block sizes • Columnar storage • Data compression • Zone maps

Id Age State 123 20 CA 345 25 WA 678 40 FL

Row storage Column storage

Redshift Runs on Optimized Hardware

HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate

HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage

• Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2

Redshift Runs on Optimized Hardware

HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate

HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage

Start Small

1 x XL = 2TB

Grow Big

100 x 8XL = 1.6PB

Load Query Resize Backup Restore

Redshift Parallelizes and Distributes Everything

Compute Node 16TB

10 GigE (HPC)

Ingestion Backup Restore

SQL Clients / BI Tools

Amazon S3

Client VPC

Compute Node 16TB

Compute Node 16TB

Leader Node

data v

olume

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

data available for analysis

data generated

Gap

Redshift is Priced to Analyze All Your Data

$0.85 per hour for on-demand (2TB) $999 per TB per year (3-yr reservation)

Working with Redshift

differentiated effort increases the uniqueness of an application

Redshift Simplifies Provisioning

• Create a cluster in minutes

• Automatically patch your OS and data warehouse software

• Scale up to 1.6PB with a few clicks and no downtime

Integrate Redshift with remote data

centers

Compute Node 2TB

Compute Node 2TB

Compute Node 2TB

Compute Node 2TB

Leader Node

Compute Node 2TB

Compute Node 2TB

Leader Node

Amazon S3

SQL Clients / BI Tools

1. Cluster placed in read-only mode 2. New cluster provisioned 3. Data copied across (MPP)

1. Cluster placed in read-only mode 2. New cluster provisioned 3. Data copied across (MPP) 4. DNS switched to new cluster (read-write) 5. Source cluster is de-provisioned

Compute Node 2TB

Compute Node 2TB

Compute Node 2TB

Compute Node 2TB

Leader Node

Compute Node 2TB

Compute Node 2TB

Leader Node

Amazon S3

SQL Clients / BI Tools

Integrates With Existing BI Tools

Amazon Redshift

JDBC/ODBC

Amazon Redshift

Live Demonstration

Jeremy Winters

Lead Architect and Database Warehouse Designer

Getting Started

Reporting Warehouse

• Accelerated operational reporting • Support for short-time use cases • Data compression, index redundancy

RDBMS Redshift

OLTP ERP Reporting

and BI

Data Integration Partners*

On-Premises Integration

RDBMS Redshift

OLTP ERP Reporting

and BI

* as of 3/14/2013

Live Archive for (Structured) Big Data

• Direct integration with copy command • High velocity data ages into Redshift • Low cost, high scale option for new apps

DynamoDB Redshift

OLTP Web Apps Reporting

and BI

Cloud ETL for Big Data

• Maintain online SQL access to historical logs • Transformation and enrichment with EMR • Longer history ensures better insight

Redshift Reporting and BI Elastic MapReduce

S3

Redshift

Fast Low Cost less than $1 / hour to get started less than $1K / TB to run Redshift for a year Easy To Get Started Please visit: http://aws.amazon.com/redshift/

“up to 50 times faster than our current OLAP solution” “exponential gains in performance”

Questions?

http://aws.amazon.com/resources/databaseservices/webinars