AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

44
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Vidhya Srinivasan, General Manager, AWS November 30 th , 2016 What’s New with Amazon Redshift BDA304

Transcript of AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Page 1: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Vidhya Srinivasan, General Manager, AWS

November 30th, 2016

What’s New with Amazon Redshift

“It’s our biggest driver of growth in our biggest markets, and is a feature of the

company” …on Data Mining in Redshift– Chris Lambert, Lyft CTO

“The doors were blown wide open to create custom dashboards for anyone to

instantly go in and see and assess what is going in our ad delivery landscape,

something we have never been able to do until now.”– Bryan Blair, Vevo’s VP of Ad Operations

“Analytical queries are 10 times faster in Amazon Redshift than they

were with our previous data warehouse.”– Yuki Moritani, NTT Docomo Innovation Manager

“We have several petabytes of data and use a massive Redshift

cluster. Our data science team can get to the data faster and then

analyze that data to find new ways to reduce costs, market

products, and enable new business.”– Yuki Moritani, NTT Docomo Innovation Manager

“We saw a 2x performance improvement on a wide variety of

workloads. The more complex the queries, the higher the

performance improvement..”- Naeem Ali, Director of Software Development, Data

Science at Cablevision (Optimum)

“Over the last few years, we’ve tried all kinds of databases in search of more

speed, including $15k of custom hardware. Of everything we’ve tried,

Amazon Redshift won out each time.”– Periscope Data, Analyst’s Guide to Redshift

“We took Amazon Redshift for a test run the moment it was

released. It’s fast. It’s easy. Did I mention it’s ridiculously fast?

We’re using it to provide our analysts an alternative to Hadoop.”– Justin Yan, Data Scientist at Yelp

“The move to Redshift also significantly improved dashboard query

performance… Redshift performed ~200% faster than the

traditional SQL Server we had been using in the past.”

- Dean Donovan, Product Development at DiamondStream

“…[Redshift] performance has blown away everyone here (we

generally see 50-100x speedup over Hive)”

- Jie Li Data Infrastructure at Pinterest

“450,000 online queries 98 percent faster than previous traditional data

center, while reducing infrastructure costs by 80 percent.”

- John O’Donovan, CTO, Financial Times

“We needed to load six months' worth of data, about 10 TB of data, for a

campaign. That type of load would have taken about 20 days with our previous

solution. By using Amazon Redshift, it only took six hours to load the data.”

- Zhong Hong, VP of Infrastructure, Vivaki (Publicis Groupe)

“We regularly process multibillion row datasets and we do that in a

matter of hours. We are heading to up to 10 times more data volumes in

the next couple of years, easily.”

- Bob Harris, CTO, Channel 4

“On our previous big data warehouse system, it took around 45

minutes to run a query against a year of data, but that number went

down to just 25 seconds using Amazon Redshift”

- Kishore Raja Director of Strategic Programs and R&D, Boingo Wireless

“Most competing data warehousing solutions would have cost us up

to $1 million a year. By contrast, Amazon Redshift costs us just

$100,000 all-in, representing a total cost savings of around 90%”

- Joel Cumming, Head of Data, Kik Interactive

“Annual costs of Redshift are equivalent to just the annual

maintenance of some of the cheaper on-premises options for

data warehouses..”

- Kevin Diamond, CTO, HauteLook (Nordstrom)

“Our data volume keeps growing, and we can support that

growth because Amazon Redshift scales so well.. We wouldn’t

have that capability using the supporting on-premises hardware in

our previous solution.”

- Ajit Zadgaonkar, Director of Ops. and Infrastructure, Edmunds

“With Amazon Redshift and Tableau, anyone in the company can set up

any queries they like - from how users are reacting to a feature, to growth by

demographic or geography, to the impact sales efforts had in different areas”

- Jon Hoffman, Head of Engineering, Foursquare

BDA304

Page 2: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

AWS Big Data Portfolio

Collect Store Analyze

Amazon Kinesis Firehose

AWS Direct Connect

Amazon Snowball

Amazon Kinesis Analytics

Amazon KinesisStreams

Amazon S3 Amazon Glacier

Amazon CloudSearch

Amazon RDS, Amazon Aurora

Amazon Dynamo DB

Amazon Elasticsearch

Service

Amazon EMR Amazon EC2

Amazon

Redshift

Amazon Machine

Learning

Amazon

QuickSight

AWS Data PipelineAWS Database Migration Service

Page 3: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Legacy architectural models lead to dark data

0

200

400

600

800

1000

1200

Enterprise Data Data in Warehouse

Very Expensive

complicated

Slow at scale

Page 4: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift

shiftFast, simple, petabyte-scale data warehousing for $1,000/TB/Year

Page 5: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift is fast

“Did I mention that it’s ridiculously fast? We’re using

it to provide our analysts with an alternative to Hadoop”

“After investigating Redshift, Snowflake, and

BigQuery, we found that Redshift offers top-of-the-

line performance at best-in-market price points”

“…[Redshift] performance has blown away everyone

here. We generally see 50-100X speedup over Hive”

“We regularly process multibillion row datasets

and we do that in a matter of hours. We are heading

to up to 10 times more data volumes in the next couple

of years, easily”

“We saw a 2X performance improvement on a wide

variety of workloads. The more complex the queries,

the higher the performance improvement”

“On our previous big data warehouse system, it took

around 45 minutes to run a query against a year of

data, but that number went down to just 25 seconds

using Amazon Redshift”

Page 6: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift is easy to use

“With Amazon Redshift and Tableau, anyone in the

company can set up any queries they like - from how

users are reacting to a feature, to growth by demographic or

geography, to the impact sales efforts had in different areas”

“The doors were blown wide open to create custom

dashboards for anyone to instantly go in and see and

assess what is going in our ad delivery landscape,

something we have never been able to do until now.”

Provides an easy-to-use mechanism for querying data with

quick and uniform response times that analysts can use to

run research projects and perform in-depth analysis…We don’t

have to pre-allocate resources and can easily scale up to meet

demand and then scale down for efficiency”

Page 7: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift is inexpensive

“450,000 online queries 98 percent faster than previous

traditional data center, while reducing infrastructure costs by

80 percent.”

“Annual costs of Redshift are equivalent to just the annual

maintenance of some of the cheaper on-premises options

for data warehouses..”

“Most competing data warehousing solutions would have cost

us up to $1 million a year. By contrast, Amazon Redshift costs

us just $100,000 all-in, representing a total cost savings of

around 90%”

Page 8: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Selected Amazon Redshift customers

Page 9: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Traditional Data Warehousing

Business

Reporting

Complex pipelines and

queriesSecure and

Compliant

Simple Migration – Point & Click using AWS Database Migration Service

Secure & Compliant – End-to-End Encryption. SOC 1/2/3, PCI-DSS, HIPAA and FedRAMP compliant

Large Ecosystem – Variety of cloud and on-premises BI and ETL tools

Japanese Mobile Phone

Provider

World’s Largest Children’s

Book Publisher

Powering 100 marketplaces in

50 countries

Bulk Loads and

Updates

Page 10: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Log Analysis

Log & Machine

IOT Data

Clickstream

Events Data

Time-Series

Data

Inexpensive – Analyze large volumes of data cost-effectively

Fast – Massively Parallel Processing (MPP) and columnar architecture for fast queries and parallel loads

Near real-time – Micro-batch loading and Amazon Kinesis Firehose for near-real time analytics

Interactive data analysis and

recommendation engine

Ride analytics for pricing and

product development

Ad prediction and on-demand

analytics

Page 11: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Business Applications

Multi-Tenant BI

Applications

Back-end

services

Analytics as a

Service

Fully Managed – Provisioning, backups, upgrades, security, compression all come built-in so you can focus on

your business applications

Simple Chargeback – Pay as you go, add clusters as needed. A few big common clusters, several data marts

Service Oriented Architecture – Integrated with other AWS services. Easy to plug into your pipeline

Infosys Information

Platform (IIP)

Analytics-as-a-

Service

Product and

Consumer Analytics

Page 12: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Thousands of companies run mission-critical

workloads on Amazon Redshift

Page 13: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Redshift is used for mission-critical workloads

Financial and

management reporting

Payments to suppliers

and billing workflows

Web/Mobile clickstream

and event analysis

Recommendation and

predictive analytics

Page 14: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift is available everywhere AWS is

Dublin

Frankfurt

London

Seoul

Sydney

Tokyo

Singapore

Beijing

Mumbai

Sao Paulo

US East - Virginia

US West - Oregon

US West – Northern California

GovCloud

Columbus Ohio

Montreal

Currently Available

Coming soon

Page 15: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Redshift is fast and has gotten faster...

5X Query throughput improvement this year

Memory allocation (launched)

Improved commit and I/O logic (launched)

Queue hopping (launched)

Query monitoring rules (coming soon)

Power start (coming soon)

Short query bias (coming soon)

10X Vacuuming performance improvement

Ensures data is sorted for efficient and fast I/O

Reclaims space from deleted rows

Enhanced vacuum performance leads to better system throughput

Fast

Efficient

Page 16: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift Cluster

The life of a query

BI tools

SQL clients

Analytics tools

Client

Leader node

Compute node

Compute node

Compute node

Queue 1

Queue 2

1

4

32

Page 17: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift Workload Management

Waiting

Coming soon: Short query bias

BI tools

SQL clients

Analytics tools

Client

Running

Queue 1

Queue 2

4 Slots

2 Slots

Short queries go to

the head of the

queue

1

1

Page 18: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift Cluster

Coming soon: Power start

BI tools

SQL clients

Analytics tools

Client

Leader node

Compute node

Compute node

Compute node

All queries receive a

power start. Shorter

queries benefit the

most

2

2

2

2

Page 19: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift Workload Management

Waiting

Queue hopping

BI tools

SQL clients

Analytics tools

Client

Running

Queue 1

Queue 2

4 Slots

2 Slots

Release slot if a query

exceeds timeout

3

3

Page 20: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Coming soon: Query monitoring rules

• Allows automatic handling of runaway (poorly written) queries

• Metrics with operators and values (e.g. query_cpu_time > 1000) create a predicate.

• Multiple predicates can be AND-ed together to create a rule.

• Multiple rules can be defined for a queue in WLM. These rules are OR-ed together.

If { rule } then [action]

{ rule : metric operator value } eg: rows_scanned > 100000

• Metric : cpu_time, query_blocks_read, rows scanned, query

execution time, cpu & io skew per slice, join_row_count, etc.

• Operator : <, >, ==

• Value : integer

[action] : hop, log, abort

Page 21: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Coming soon: Query monitoring rules

Monitor and control

cluster resources

consumed by a query

Get notified, abort and

reprioritize long-

running / bad queries

Pre-defined templates

for common use

cases

Page 22: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift is easy to use

Provisioning in

minutes

Automatic Patching SQL - Data loading

Backups built-in Security is built-in Compression is built-in

Page 23: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

And is getting easier…

Automatic and incremental background VACUUM (coming soon)

Reclaims space and sorts when Redshift clusters are idle

VACUUM is initiated when performance can be enhanced

Improves ETL and query performance

Automatic data compression for CTAS

CREATE TABLE AS (CTAS) command creates a new table

The new table leverages compression automatically

Automatic compression for new tables (coming soon)

All newly created tables will leverage default encoding

Provides higher compression rates

010101010101

Page 24: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

And is getting easier…

Enhanced localization

Support for Timestamp with Time zone : New TIMESTAMPTZ data type to input complete timestamp values that

include the date, the time of day, and a time zone. Eg: 30 Nov 07:37:16 2016 PST

Support for Multi-byte (UTF-8) characters for tables, columns, and other database object names

Resource management

Connection limits: You can now set a limit on the number of database connections a user is permitted to have open

concurrently

Query Monitoring rules (coming soon)

Table level restore

Restore tables that you might have dropped accidentally

Reconcile data from an older table that you might have updated or deleted unintentionally

Page 25: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Coming soon: Ingestion improvements

Faster migrations

Schema Conversion Tool

Already supports schema conversion from Oracle, Teradata, Netezza and Greenplum

Coming soon: Support for Vertica and SQL Server

AWS Database Migration Service

Supports data migration with CDC from Oracle, SQL Server, PostgreSQL, MySQL and Aurora

Will add more DW engines as sources

Data Loading

Enhanced JSON & AVRO ingestion performance

Page 26: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Durability and availability

Automated Backups Cross-region backups Cluster-level mirroring

Streaming restore Monitoring Ticketing

Page 27: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Compute node

Durability

1. Multiple copies of a block within the cluster

2. Continuous and incremental backups to Amazon

S3 (automatic)

3. Continuous and incremental backups across

regions (checkbox enable)

4. Streaming restore – database is available for

query once metadata restored. Data streams

down in background or when queried. Enables

cross-AZ restores. Amazon S3

Region 1

Region 2

Compute node Compute node

Amazon S3

Page 28: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

1. Disk failures

2. Availability Zone failures

3. Region level failures

4. Node failures

Fault tolerance

Compute node

Amazon S3

Region 1

Region 2

Compute node Compute node

Amazon S3

Page 29: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Node fault tolerance

Leader nodeClient

Data-path monitoring agents

Compute node

Compute node

Compute node

Node level monitoring

can detect SW/HW

issues and take action

Page 30: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Node fault tolerance

Leader nodeClient

Data-path monitoring agents

Compute node

Compute node

Compute node

Failure is detected at one

of the compute nodes

Page 31: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Node fault tolerance

Leader nodeClient

Data-path monitoring agents

Compute node

Compute node

Compute node

Redshift parks the

connections

Next, the node is

replaced

Page 32: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Node fault tolerance

Leader nodeClient

Data-path monitoring agents

Compute node

Compute node

Compute node

Queries are re-submitted

(coming soon)

Page 33: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Node fault tolerance

Leader nodeClient

Data-path monitoring agents

Compute node

Compute node

Compute node

Cluster-level monitoring agents

Additional monitoring

layer for the leader

node and network

Page 34: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift is secure

End-to-End

data encryption

Alerts and notifications Amazon VPC

AWS KMS and HSM Audit logging Certifications & Compliance

Page 35: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Leader node

Compute node

10 GigE

(HPC)

Ingestion

Backup

Restore

Customer VPC

Internal VPC

BI tools SQL clientsAnalytics tools

Compute node Compute node

JDBC/ODBC

Load encrypted from S3

SSL to secure data in transit

ECDHE perfect forward secrecy

Amazon VPC for network isolation

Encryption to secure data at rest

All blocks on disks and in S3 encrypted

Block key, cluster key, master key (AES-256)

On-premises HSM & AWS CloudHSM support

Audit logging and AWS CloudTrail integration

SOC 1/2/3, PCI-DSS, FedRAMP, BAA

Amazon Redshift is secure

Amazon S3 Amazon EMR Amazon Dynamo DB SSH

Page 36: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

We have added even more security features...

IAM support for data LOAD/UNLOAD

Use IAM roles to perform LOAD/UNLOAD operations

Cluster is given IAM credentials

Cluster can be granted access to specific S3 buckets

Simplify credentials management

More secure

Enhanced VPC Routing

All query traffic only flows through customer VPC

Strict data traffic management between Amazon Redshift and other data sourcesSecure

Simple

Page 37: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

BI tools SQL clientsAnalytics tools

Client AWS

Redshift

ADFS

Corporate

Directory IAM

User groups Individual user

Coming soon: IAM Authentication for DB users

Identity providers

IdP registers with

IAM

1

1

Page 38: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

BI tools SQL clientsAnalytics tools

Client AWS

Redshift

ADFS

Corporate

Directory IAM

User groups Individual user

Single Sign-On

Coming soon: IAM Authentication for DB users

Identity providers

End-users can

leverage SSO

2

2

Page 39: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

BI tools SQL clientsAnalytics tools

Client AWS

Redshift

ADFS

Corporate

Active Directory IAM

Amazon Redshift

ODBC/JDBC

User groups Individual user

Single Sign-On

Coming soon: IAM Authentication for DB users

Identity providers

New Redshift

ODBC/JDBC drivers.

Grab the ticket

(userid) and get a

SAML assertion

3

3

Page 40: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

BI tools SQL clientsAnalytics tools

Client AWS

Redshift

ADFS

Corporate

Active Directory IAM

Amazon Redshift

ODBC/JDBC

User groups Individual user

Coming soon: IAM Authentication for DB users

Identity providers

Send SAML

assertion to IAM.

Obtain IAM role

credentials

4

4

Page 41: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

BI tools SQL clientsAnalytics tools

Client AWS

Redshift

ADFS

Corporate

Active Directory IAM

Amazon Redshift

ODBC/JDBC

User groups Individual user

Coming soon: IAM Authentication for DB users

Identity providers

Use IAM role to

obtain temporary

cluster credentials.

Use temporary

credentials to log

into the database

5

5

Page 42: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Amazon Redshift

shiftContact us at [email protected]

Available now

Queue hopping

10X VACUUM performance improvement

Node fault tolerance

Enhanced VPC routing

IAM support for Copy/Unload

Auto compression for CTAS

TimestampTZ datatype

Coming soon

Query Monitoring rules

Automatic and incremental background VACUUM

Short query bias

Power start

IAM Authentication for DB users

Auto compression for new tables

Enhanced JSON & AVRO ingestion performance

Page 43: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Thank you!

Page 44: AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)

Remember to complete

your evaluations!