CS 839: Design the Next-Generation Database Lecture 22:...
Transcript of CS 839: Design the Next-Generation Database Lecture 22:...
![Page 1: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/1.jpg)
Xiangyao Yu4/9/2020
CS 839: Design the Next-Generation DatabaseLecture 22: Snowflake
1
![Page 2: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/2.jpg)
Announcements
2
Course project• Submission deadline: Apr. 23• Peer review: Apr. 23 – Apr. 30• Presentation: Apr. 28 & 30• Submission deadline: May 4
Will create google sheet for presentation signup
![Page 3: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/3.jpg)
Discussion Highlights
3
Optimal design that combines the advantages? • Athena with instances pre-running• Hybrid instance store and S3; decide caching based on the workload• High-quality code compilers• Heterogeneous system that combines all the existing systems together
Optimization opportunities for serverless databases?• Optimize resource sharing among users (e.g., cache, computation)• SW/HW codesign • Heterogeneous hardware and storage (e.g., different function on different hardware)• Scale computation and storage on demand• Keep instances pre-warmed to reduce cold starts
Cloud databases benefit from new hardware?• Using GPU• SmartSSD• RDMA and SmartNIC (e.g., shared cache in SSD, computation offloading)• Persistent memory to improve bandwidth and aid fast restarts
![Page 4: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/4.jpg)
Today’s Paper
4SIGMOD 2016
![Page 5: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/5.jpg)
On-Premises vs. Cloud
5
On-premises• Fixed and limited hardware
resources
Cloud• Virtually infinite computation & storage• Pay-as-you-go
CPU
Mem
HDD
CPU
Mem
HDD
CPU
Mem
HDD
CPU
Mem
HDD
CPU
Mem
HDD
CPU
Mem
HDD
… …… …
![Page 6: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/6.jpg)
Shared Nothing – Advantages
6
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
Scalability: horizontal scaling• Scales well for star-schema
queriesDimension Table
Fact Table
![Page 7: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/7.jpg)
Shared Nothing – Disadvantages
7
Workload A Workload B
More CPU intensive Less CPU intensive
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
Heterogeneous workload
![Page 8: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/8.jpg)
Shared Nothing – Disadvantages
8
Heterogeneous workloadMembership changes• Add a node: data redistribution
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
![Page 9: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/9.jpg)
Shared Nothing – Disadvantages
9
Heterogeneous workloadMembership changes• Add a node: data redistribution• Delete a node: fault tolerance
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
![Page 10: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/10.jpg)
Shared Nothing – Disadvantages
10
Heterogeneous workloadMembership changesOnline upgrade• Similar to membership change
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
CPU
Mem
VM
HDD
![Page 11: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/11.jpg)
Web User Interface
Serverless (similar to Athena) 11
![Page 12: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/12.jpg)
Multi-Cluster Shared-Data Architecture
12
Control layer
Compute layer
Storage layer
![Page 13: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/13.jpg)
Architecture – Storage
13
Data format: PAX
Data horizontally partitioned into immutable files (~16MB)• An update = remove and add an entire file• Queries download file headers and columns they are interested in
Intermediate data spilling to S3
![Page 14: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/14.jpg)
Architecture – Virtual Warehouse
14
T-Shirt sizes: XS to 4XL
Elasticity and Isolation• Created, destroyed, or resized at any point (may shutdown all VWs)• User may create multiple VWs for multiple queries
Workload A Workload B
More CPU intensive Less CPU intensive
Large VW Small VW
![Page 15: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/15.jpg)
Architecture – Virtual Warehouse
15
Local caching• S3 data can be cached in local memory or disk
CPU CPU CPU
HDD HDD HDD HDD HDD
![Page 16: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/16.jpg)
Architecture – Virtual Warehouse
16
Local caching• S3 data can be cached in local memory or disk
CPU CPU CPU
HDD HDD HDD HDD HDD
Consistent hashing• When the hash table (n keys and
m slots) is resized, only n/m keys need to be remapped
![Page 17: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/17.jpg)
Architecture – Virtual Warehouse
17
Local caching• S3 data can be cached in local memory or disk
CPU CPU CPU
HDD HDD HDD HDD HDD
CPUConsistent hashing• When the hash table (n keys and
m slots) is resized, only n/m keys need to be remapped
![Page 18: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/18.jpg)
Architecture – Virtual Warehouse
18
Local caching• S3 data can be cached in local memory or disk
CPU CPU CPU
HDD HDD HDD HDD HDD
CPUConsistent hashing• When the hash table (n keys and
m slots) is resized, only n/m keys need to be remapped• When a VW is resized, no data
shuffle required; rely on LRU to replace cache content
![Page 19: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/19.jpg)
Architecture – Virtual Warehouse
19
Local caching• S3 data can be cached in local memory or disk
CPU CPU CPU
HDD HDD HDD HDD HDD
CPUConsistent hashing• When the hash table (n keys and
m slots) is resized, only n/m keys need to be remapped• When a VW is resized, no data
shuffle required; rely on LRU to replace cache content
File stealing to tolerate skew
![Page 20: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/20.jpg)
Architecture – Virtual Warehouse
20
Execution engine• Columnar: SIMD, compression• Vectorized: process a group of elements at a time• Push-based
![Page 21: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/21.jpg)
Architecture – Cloud Services
21
Multi-tenant layer shared across multiple users
Query optimizationConcurrency control • Isolation: snapshot isolation (SI)• S3 data is immutable, update entire files with MVCC• Versioned snapshots used for time traveling
Pruning• Snowflake has no index (same in Athena, Presto, Hive, etc)• Min-max based pruning: store min and max values for a data block
![Page 22: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/22.jpg)
High Availability and Fault Tolerance
22
Stateless services
![Page 23: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/23.jpg)
High Availability and Fault Tolerance
23
Replicated metadata
![Page 24: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/24.jpg)
High Availability and Fault Tolerance
24
One node failure in VW • Re-execute with failed node
immediately replaced• Re-execute with reduced
number of nodesWhole AZ failure
• Re-execute by re-provisioning a new VW
Hot-standby nodes
![Page 25: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/25.jpg)
High Availability and Fault Tolerance
25
S3 is highly available and durable
![Page 26: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/26.jpg)
Online Upgrade
26
Deploy new versions of services and VWs
![Page 27: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/27.jpg)
Semi-Structured Data
27
Extensible Markup Language (XML) JavaScript Object Notation(JSON)
![Page 28: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/28.jpg)
Extract-Transform-Load (ETL)
Transform (e.g., converting to column format) adds latency to the system
28
![Page 29: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/29.jpg)
ETL vs. ELT
29Picture from https://aws.amazon.com/blogs/big-data/etl-and-elt-design-patterns-for-lake-house-architecture-using-amazon-redshift-part-1/
![Page 30: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/30.jpg)
Optimization for Semi-Structured DataAutomatic type inferenceHybrid columnar format• Frequently paths are detected, projected out, and stored in separate
columns in table file (typed and compressed)• Collect metadata on these columns for optimization (e.g., pruning)
30
![Page 31: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/31.jpg)
SummarySnowflake vs shared nothing • Heterogeneous workload• Membership changes
Snowflake vs. Redshift (Spectrum)
Snowflake vs. Athena
Snowflake vs. Presto/Hive/Vertica
31
![Page 32: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/32.jpg)
Snowflake – Q/A Storage system better than S3 (e.g., allow updates)Row store for transaction processing?Server-side cursor? Min-max based pruning replacing indices?Other systems similar to Snowflake? Pay-as-you-go?Push vs. pull?Pruning requires sorting?Snowflake autoscaling compute based on demand?
32
![Page 33: CS 839: Design the Next-Generation Database Lecture 22: Snowflakepages.cs.wisc.edu/~yxy/cs839-s20/slides/L22.pdf · 2020. 4. 9. · Snowflake –Q/A Storage system better than S3](https://reader035.fdocuments.us/reader035/viewer/2022071501/6120c2315c44ae73295c0ac0/html5/thumbnails/33.jpg)
Group DiscussionHow far away is Snowflake from the “optimal design” that you discussed last time?• High-quality code compilers• Athena with instances pre-running• Hybrid instance store and S3; decide caching based on the workload• Heterogeneous system that combines all the existing systems together
Can you come up with a nice way of combining cloud data warehousing (e.g., Snowflake) with cloud transaction processing (e.g., Aurora)?
33