2 one spot redshift bigdatacamp 1.02

Post on 13-Jul-2015

185 views 0 download

Transcript of 2 one spot redshift bigdatacamp 1.02

Copyright © 2013 OneSpot, Proprietary & Confidential 1

Amazon Redshift:How we managed 300 billion rows with no DBA

Matt Cohen

Founder & Presidentmatt@onespot.com

December 10th, 2013

What is OneSpot?

• OneSpot is a content advertising platform that distributes content as ads that people want to click on.– Fortune 2000 clients

– Realtimead exchange bidding

– Adaptive machine learning

– Seed funded until $5.3M Series A last month

• Big data, big analysis

Copyright © 2013 OneSpot, Proprietary & Confidential 2

What is Redshift?

1. When light from a receding object appears

shifted to the red end of the spectrum

– A consequence of the expanding universe.

2. A cheap, fast, Petabyte-scale, managed

SQL data warehouse service from Amazon

Web Services

– A consequence of the expanding cloud ecosystem

Copyright © 2013 OneSpot, Proprietary & Confidential 3

Why Redshift?

• Cheap

• Fast

• Petabyte-scale

• Managed Service

• SQL

• Data Warehouse

• From AWS

Copyright © 2013 OneSpot, Proprietary & Confidential 4

SQL Data Warehouse

• Based on the commercial ParAccel database– Which is based on Postgres

• Standards-based tools and knowledge

• Built for data warehousing– Column-oriented

– Cluster architecture

– Read optimized

– No relational integrity

– Almost no SQL extensions

Copyright © 2013 OneSpot, Proprietary & Confidential 5

SQL Data Warehouse

• Column-oriented

Copyright © 2013 OneSpot, Proprietary & Confidential 6

SQL Data Warehouse

• Column-oriented

• 11 different compression techniques

Copyright © 2013 OneSpot, Proprietary & Confidential 7

SQL Data Warehouse

• Cluster architecture

Copyright © 2013 OneSpot, Proprietary & Confidential 8

SQL Data Warehouse

• Read optimized

– Large block size (1MB)

– Data replication

• 2x live, 1x S3

• No relational integrity

– No indexes:

sort and distribution keys

• Almost no SQL

extensions

Copyright © 2013 OneSpot, Proprietary & Confidential 9

Fast = Cheap

• Starts with 1 XL node

– 85¢ an hour ($620/month) on demand

– 50¢ an hour ($365) 1 year reserved

• Benchmarks say:

– Scales linearly

– 5-10x faster than Hadoop/Hive

Copyright © 2013 OneSpot, Proprietary & Confidential 10

Petabyte scale

• Up to

– 32 XL nodes (64 Terabytes)

– 100 8XL nodes (1.6 Petabytes)

Copyright © 2013 OneSpot, Proprietary & Confidential 11

Managed Service from AWS

• Managed Service

– Incredibly easy

– Nice UI

– Most SQL tools

• From AWS

– Free data transfer

– Easy load from S3

– Use AWS Data Pipeline

Copyright © 2013 OneSpot, Proprietary & Confidential 12

The TL;DR

• Pros

– Standard SQL

– Super easy

– Very fast

– Affordable

– Integrates with AWS

– No DBA

– No Sysadmin

• Cons

– Standard SQL

– Almost no SQL

extensions

– Best with Star Schema

• Big joins can be slow

– No MapReduce

– Fixed columns

– Consistency

– 1.6 Pbyte limit

Copyright © 2013 OneSpot, Proprietary & Confidential 13

Copyright © 2013 OneSpot, Proprietary & Confidential 14

Amazon Redshift:How we managed 300 billion rows with no DBA

Matt Cohen

Founder & Presidentmatt@onespot.com

December 10th, 2013