Amazon Redshift & Amazon...

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

2014-05-15

Amazon Redshift & Amazon DynamoDB

Amazon Redshift

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Amazon Redshift

A fully managed data warehouse service •  Massively parallel relational data warehouse •  Takes care of cluster management and

distribution of your data •  Columnar data store with variable compression •  Optimized for complex queries across many

large tables •  Use standard SQL & standard BI tools

Amazon DynamoDB

A fully managed fast key-value store •  Fast, predictable performance •  Simple and fast to deploy •  Easy to scale as you go, up to millions of IOPS •  Pay only for what you use: Read / write IOPS + storage •  Data is automatically replicated across data centers

Amazon DynamoDB

Amazon DynamoDB Amazon Redshift

•  Fast insert & update •  Limited query

capability (single table only)

•  NoSQL database

•  Fast queries •  Flexible queries

(JOINs, aggregation functions, …)

•  SQL

Queries in Amazon DynamoDB

Queries in Amazon DynamoDB •  Query or BatchQuery APIs retrieve items •  Scan & filter to comb through a whole table •  You have to join tables in your own code!

Amazon DynamoDB

Queries in Amazon DynamoDB (2) •  Apache Hive on Amazon EMR can access data

in DynamoDB •  Run HiveQL queries for bulk processing •  Can integrate data in HDFS, Amazon S3, …

Amazon DynamoDB HiveQL queries on Amazon EMR

Queries in Amazon DynamoDB (3) •  Import data into Amazon Redshift •  Use SQL queries, use BI tools etc. •  Powerful analytics and aggregation functions

Amazon Redshift Amazon DynamoDB

Importing Data into Amazon Redshift

TMTOWTDI

Query & Insert

Amazon Redshift

Amazon DynamoDB

#1 Query / BatchQuery

#2 Retrieve Items

#3 INSERT … INTO (…)

Query & Insert The Good •  Full control over queries •  Decide which items you

want to move to Redshift •  Process data on the way

The Bad •  Slow •  Inefficient on the Redshift

side of things •  Does not scale well

The COPY Command

Amazon Redshift

Amazon DynamoDB

#1 COPY FROM …

#2 Politely ask for a table

#3 Return whole table

The COPY Command

Amazon Redshift

Amazon DynamoDB

#1 COPY FROM …

#2 Parallel Scans

The COPY Command

Amazon Redshift

Amazon DynamoDB

#1 COPY FROM …

#3 Return Items

The COPY Command •  COPY a single table at a time •  From one Amazon DynamoDB table into one

Amazon Redshift table •  Fast – executed in parallel on all data nodes in

the Amazon Redshift cluster •  Can be limited to use a certain percentage of

provisioned throughput on the DynamoDB table

The COPY Command COPY <table_name> (col1, col2, …)

FROM 'dynamodb://<table_name2>'

CREDENTIALS 'aws_access_key_id=…;aws_secret_access_key=…'

READRATIO 10 -- use 10% of available read capacity

COMPROWS 0 -- how many rows to read to determine

-- compression

[…other options…]

The COPY Command •  Attributes are mapped to columns by name •  Case of column names is ignored •  Attributes that do not map are ignored •  Missing attributes are stored as NULL or empty

values •  Only works for STRING and NUMBER attributes

The COPY Command The Good •  Easy to use •  Fast •  Efficient use of resources •  Scales linearly with

cluster size •  Only uses certain

percentage of read throughput

The Bad •  Whole tables only •  No processing in between •  Can only copy from

DynamoDB in same region •  Only works with STRING

and NUMBER types

Query & Insert at Scale

Amazon Redshift

Amazon DynamoDB

#2 Retrieve Items

#3 INSERT … INTO (…) in parallel in parallel

Amazon EMR

Amazon Redshift

Amazon DynamoDB

#2 Retrieve Items

Amazon EMR

Amazon Redshift

Amazon DynamoDB

#2 Retrieve Items

Amazon EMR

Query & Import using Amazon EMR

Amazon Redshift

Amazon DynamoDB

#2 Retrieve Items

in parallel

Amazon S3

#3 Export to file(s) on S3

#5 Retrieve files

#4 COPY… FROM s3://

Amazon EMR

Query & Import using Amazon EMR

Amazon Redshift

Amazon DynamoDB

#2 Retrieve Items

in parallel

#3 COPY … FROM emr://

#4 Retrieve files from HDFS

Query & Import using Amazon EMR The Good •  Decide which items you

want to move to Redshift •  Full control over queries •  Process data on the way •  Scales well •  Integrates with other data

sources easily

The Bad •  Additional complexity •  Additional cost (for EMR) •  Slower than direct COPY

from Amazon DynamoDB

Please welcome Erez Hadas-Sonnenschein, Sr. Product Manager Witali Stohler, Datawarehouse & BI Specialist

clipkit GmbH

Video Syndication – The Possibilities

News Sports Cars/motor Business/finances Music Gaming Cinema Cooking/food Lifestyle/fashion Traveling Computer/mobile Fitness/wellness Knowledge/hobby entertaintment

Content – Partner Overview

clipkit Player – Analytics (Metrics)

Full Screen

Amazon Redshift & Amazon...

Documents

Transcript of Amazon Redshift & Amazon...

Amazon Redshift - Cluster Management Guide · 2021. 1. 12. · • Amazon Redshift Cluster Management Guide (this guide) – This guide shows you how to create and manage Amazon Redshift

Amazon Redshift - Cluster Management Guide · Welcome to the Amazon Redshift Cluster Management Guide. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in

Amazon Redshift (February 2016)

Amazon Redshift Insider Series

Amazon RedShift - Ianni Vamvadelis

Introduction to Amazon Redshift

QlikView Integration with Amazon Redshift

Amazon Redshift Deep Dive

Amazon Redshift Database Developer Guide

Datawarehouse migration to Amazon Redshift

Deep Dive on Amazon Redshift

Amazon Redshift - クラスター管理ガイド · Amazon Redshift クラスター管理ガイド Amazon Redshift を初めてお使いになる方向けの情報 Amazon Redshift

Amazon Redshift

Amazon Redshift - Cluster Management Guide · Amazon Redshift Cluster Management Guide Are You a First-Time Amazon Redshift User? What Is Amazon Redshift? Welcome to the Amazon Redshift

Começando com Amazon Redshift

Amazon Redshift - Guia de conceitos básicos...Amazon Redshift Guia de conceitos básicos Etapa 3: Criar um cluster 3. No painel do Amazon Redshift, selecione Quick launch cluster

Deep Dive on Amazon Redshift - Amazon Web Serviceslondon-summit-slides-2017.s3.amazonaws.com/Deep Dive on...Deep Dive on Amazon Redshift Agenda Deep inside Redshift Architecture Integration

Amazon Redshift - Datenbankentwicklerhandbuch · Amazon Redshift Datenbankentwicklerhandbuch User groups.....354

Amazon Redshift for Business Intelligence

Getting started with Amazon Redshift