Introduction to Amazon Redshift
-
Upload
amazon-web-services -
Category
Documents
-
view
4.183 -
download
5
description
Transcript of Introduction to Amazon Redshift
Introducing Amazon
Redshift
David Pearson Business Development Manager
http://aws.amazon.com/resources/databaseservices/webinars
What is AWS?
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
Amazon DynamoDB Fast, Predictable, Highly-Scalable NoSQL Data Store
Amazon RDS Managed Relational Database Service for
MySQL, Oracle and SQL Server
Amazon ElastiCache In-Memory Caching Service
Amazon Redshift Fast, Powerful, Fully Managed, Petabyte-Scale
Data Warehouse Service
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database Services
Scalable High Performance Application Storage in the Cloud
Why Data Warehousing?
No upfront costs, pay as you go
Really fast performance at a really low price
Open and flexible with support for popular tools
Easy to provision and scale up massively
Amazon Redshift
data warehouse service
petabyte-scale fast and fully managed
objectives design and build a petabyte-scale data warehouse service
Amazon Redshift
A Whole Lot Simpler
A Lot Cheaper
A Lot Faster
Redshift Dramatically Reduces I/O
• Direct-attached storage • Large data block sizes • Columnar storage • Data compression • Zone maps
Id Age State 123 20 CA 345 25 WA 678 40 FL
Row storage Column storage
Redshift Runs on Optimized Hardware
HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate
HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage
• Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2
Redshift Runs on Optimized Hardware
HS1.8XL: 128GB RAM, 16 Cores, 24 Spindles, 16TB Storage, 2GB/sec scan rate
HS1.XL: 16GB RAM, 2 Cores, 3 Spindles, 2TB Storage
Start Small
1 x XL = 2TB
Grow Big
100 x 8XL = 1.6PB
Load Query Resize Backup Restore
Redshift Parallelizes and Distributes Everything
Compute Node 16TB
10 GigE (HPC)
Ingestion Backup Restore
SQL Clients / BI Tools
Amazon S3
Client VPC
Compute Node 16TB
Compute Node 16TB
Leader Node
data v
olume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
data available for analysis
data generated
Gap
Redshift is Priced to Analyze All Your Data
$0.85 per hour for on-demand (2TB) $999 per TB per year (3-yr reservation)
Working with Redshift
differentiated effort increases the uniqueness of an application
Redshift Simplifies Provisioning
• Create a cluster in minutes
• Automatically patch your OS and data warehouse software
• Scale up to 1.6PB with a few clicks and no downtime
Integrate Redshift with remote data
centers
Compute Node 2TB
Compute Node 2TB
Compute Node 2TB
Compute Node 2TB
Leader Node
Compute Node 2TB
Compute Node 2TB
Leader Node
Amazon S3
SQL Clients / BI Tools
1. Cluster placed in read-only mode 2. New cluster provisioned 3. Data copied across (MPP)
1. Cluster placed in read-only mode 2. New cluster provisioned 3. Data copied across (MPP) 4. DNS switched to new cluster (read-write) 5. Source cluster is de-provisioned
Compute Node 2TB
Compute Node 2TB
Compute Node 2TB
Compute Node 2TB
Leader Node
Compute Node 2TB
Compute Node 2TB
Leader Node
Amazon S3
SQL Clients / BI Tools
Integrates With Existing BI Tools
Amazon Redshift
JDBC/ODBC
Amazon Redshift
Live Demonstration
Jeremy Winters
Lead Architect and Database Warehouse Designer
Getting Started
Reporting Warehouse
• Accelerated operational reporting • Support for short-time use cases • Data compression, index redundancy
RDBMS Redshift
OLTP ERP Reporting
and BI
Data Integration Partners*
On-Premises Integration
RDBMS Redshift
OLTP ERP Reporting
and BI
* as of 3/14/2013
Live Archive for (Structured) Big Data
• Direct integration with copy command • High velocity data ages into Redshift • Low cost, high scale option for new apps
DynamoDB Redshift
OLTP Web Apps Reporting
and BI
Cloud ETL for Big Data
• Maintain online SQL access to historical logs • Transformation and enrichment with EMR • Longer history ensures better insight
Redshift Reporting and BI Elastic MapReduce
S3
Redshift
Fast Low Cost less than $1 / hour to get started less than $1K / TB to run Redshift for a year Easy To Get Started Please visit: http://aws.amazon.com/redshift/
“up to 50 times faster than our current OLAP solution” “exponential gains in performance”
Questions?
http://aws.amazon.com/resources/databaseservices/webinars