2 one spot redshift bigdatacamp 1.02
-
Upload
bigdatacamp -
Category
Technology
-
view
185 -
download
0
Transcript of 2 one spot redshift bigdatacamp 1.02
![Page 1: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/1.jpg)
Copyright © 2013 OneSpot, Proprietary & Confidential 1
Amazon Redshift:How we managed 300 billion rows with no DBA
Matt Cohen
Founder & [email protected]
December 10th, 2013
![Page 2: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/2.jpg)
What is OneSpot?
• OneSpot is a content advertising platform that distributes content as ads that people want to click on.– Fortune 2000 clients
– Realtimead exchange bidding
– Adaptive machine learning
– Seed funded until $5.3M Series A last month
• Big data, big analysis
Copyright © 2013 OneSpot, Proprietary & Confidential 2
![Page 3: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/3.jpg)
What is Redshift?
1. When light from a receding object appears
shifted to the red end of the spectrum
– A consequence of the expanding universe.
2. A cheap, fast, Petabyte-scale, managed
SQL data warehouse service from Amazon
Web Services
– A consequence of the expanding cloud ecosystem
Copyright © 2013 OneSpot, Proprietary & Confidential 3
![Page 4: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/4.jpg)
Why Redshift?
• Cheap
• Fast
• Petabyte-scale
• Managed Service
• SQL
• Data Warehouse
• From AWS
Copyright © 2013 OneSpot, Proprietary & Confidential 4
![Page 5: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/5.jpg)
SQL Data Warehouse
• Based on the commercial ParAccel database– Which is based on Postgres
• Standards-based tools and knowledge
• Built for data warehousing– Column-oriented
– Cluster architecture
– Read optimized
– No relational integrity
– Almost no SQL extensions
Copyright © 2013 OneSpot, Proprietary & Confidential 5
![Page 6: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/6.jpg)
SQL Data Warehouse
• Column-oriented
Copyright © 2013 OneSpot, Proprietary & Confidential 6
![Page 7: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/7.jpg)
SQL Data Warehouse
• Column-oriented
• 11 different compression techniques
Copyright © 2013 OneSpot, Proprietary & Confidential 7
![Page 8: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/8.jpg)
SQL Data Warehouse
• Cluster architecture
Copyright © 2013 OneSpot, Proprietary & Confidential 8
![Page 9: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/9.jpg)
SQL Data Warehouse
• Read optimized
– Large block size (1MB)
– Data replication
• 2x live, 1x S3
• No relational integrity
– No indexes:
sort and distribution keys
• Almost no SQL
extensions
Copyright © 2013 OneSpot, Proprietary & Confidential 9
![Page 10: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/10.jpg)
Fast = Cheap
• Starts with 1 XL node
– 85¢ an hour ($620/month) on demand
– 50¢ an hour ($365) 1 year reserved
• Benchmarks say:
– Scales linearly
– 5-10x faster than Hadoop/Hive
Copyright © 2013 OneSpot, Proprietary & Confidential 10
![Page 11: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/11.jpg)
Petabyte scale
• Up to
– 32 XL nodes (64 Terabytes)
– 100 8XL nodes (1.6 Petabytes)
Copyright © 2013 OneSpot, Proprietary & Confidential 11
![Page 12: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/12.jpg)
Managed Service from AWS
• Managed Service
– Incredibly easy
– Nice UI
– Most SQL tools
• From AWS
– Free data transfer
– Easy load from S3
– Use AWS Data Pipeline
Copyright © 2013 OneSpot, Proprietary & Confidential 12
![Page 13: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/13.jpg)
The TL;DR
• Pros
– Standard SQL
– Super easy
– Very fast
– Affordable
– Integrates with AWS
– No DBA
– No Sysadmin
• Cons
– Standard SQL
– Almost no SQL
extensions
– Best with Star Schema
• Big joins can be slow
– No MapReduce
– Fixed columns
– Consistency
– 1.6 Pbyte limit
Copyright © 2013 OneSpot, Proprietary & Confidential 13
![Page 14: 2 one spot redshift bigdatacamp 1.02](https://reader035.fdocuments.us/reader035/viewer/2022071813/55a2b8021a28abf50c8b484c/html5/thumbnails/14.jpg)
Copyright © 2013 OneSpot, Proprietary & Confidential 14
Amazon Redshift:How we managed 300 billion rows with no DBA
Matt Cohen
Founder & [email protected]
December 10th, 2013