Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and...
Transcript of Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and...
![Page 2: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/2.jpg)
What AreData-Intensive Systems?Relational databases: most popular type of data-intensive system (MySQL, Oracle, etc)
Many systems facing similar concerns:message queues, key-value stores, streaming systems, ML frameworks, your custom app?
CS 245 2
Goal: learn the main issues and principles that span all data-intensive systems
![Page 3: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/3.jpg)
Typical System Challenges
Reliability in the face of hardware crashes, bugs, bad user input, etc
Concurrency: access by multiple users
Performance: throughput, latency, etc
Access interface from many, changing apps
Security and data privacy
CS 245 3
![Page 4: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/4.jpg)
Basic Components
CS 245 4
Logical dataset(e.g. table, graph)
Data mgmt. system Physical storage
(data structures)
Administrator
Clients / users
Queries
![Page 5: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/5.jpg)
Two Big Ideas
Declarative interfaces» Apps specify what they want, not how to do it» Example: “store a table with 2 integer columns”,
but not how to encode it on disk» Example: “count records where column1 = 5”
Transactions» Encapsulate multiple app actions into one atomic request (fails or succeeds as a whole)
» Concurrency models for multiple users» Clear interactions with failure recovery
CS 245 6
![Page 6: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/6.jpg)
Key Concepts: Architecture
Traditional RDBMS: self-contained end to end system
Data lake: separate storage from compute engines to let many engines use same data
CS 245 7
![Page 7: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/7.jpg)
Key Concepts: Hardware
CS 245 8
latency (s)
throughput (bytes/s)
storage capacity(bytes, bytes/$)
CPU
Latency, throughput, capacity
Random vs sequential I/Os
Caching & 5-minute rule
![Page 8: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/8.jpg)
Key Concepts: Data Storage
Field encoding
Record encoding: fixed/variable format, etc
Table encoding: row or column oriented
Data ordering
Indexes: dense, sparse, B+ trees, hashing, multi-dimensional
CS 245 9
![Page 9: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/9.jpg)
Key Concepts: Query Execution
CS 245 10
Query representation (e.g. SQL)
Logical query plan(e.g. relational algebra)
Optimized logical plan
Physical plan(code/operators to run)
Many execution methods: per-record exec, vectorization,
compilation
![Page 10: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/10.jpg)
Key Concepts:Relational Algebra
∩, ⋃, –, ⨯, σ, P, ⨝, G
Algebraic rules involving these
CS 245 11
![Page 11: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/11.jpg)
Key Concepts: Optimization
Rule-based: systematically replace some expressions with other expressions
Cost-based: propose several execution plans and pick best based on a cost model
Adaptive: update execution plan at runtime
Data statistics: can be computed or estimated cheaply to guide decisions
CS 245 12
![Page 12: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/12.jpg)
Key Concepts: Correctness
Consistency constraints: generic way to define correctness with Boolean predicates
Transaction: collection of actions that preserve consistency
Transaction API: commit, abort, etc
CS 245 13
ConsistentDB
ConsistentDB’T
![Page 13: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/13.jpg)
Key Concepts: Recovery
Failure models
Undo, redo, and undo/redo logging
Recovery rules for various algorithms (including handling crashes during recovery)
Checkpointing and its effect on recovery
External actions → idempotence, 2PC
CS 245 14
![Page 14: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/14.jpg)
Key Concepts: Concurrency
Isolation levels, especially serializability» Testing for serializability: conflict
serializability, precedence graphs
Locking: lock modes, hierarchical locks, and lock schedules (well formed, legal, 2PL)
Optimistic validation: rules and pros+cons
Recoverable, ACR & strict schedules
CS 245 15
![Page 15: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/15.jpg)
Categories ofSchedules
CS 245 16
Val
2PL
Conflict serializable
Serializable
Serial
![Page 16: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/16.jpg)
Key Concepts: DistributedPartitioning and replication
Consensus: nodes eventually agree on one value despite up to F failures
2-Phase commit: parties all agree to commit unless one aborts (no permanent failures)
Parallel queries: comm cost, load balance, faults
BASE and relaxing consistency
CS 245 17
![Page 17: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/17.jpg)
Key Concepts: Security and Data PrivacyThreat models
Security goals: authentication, authorization, auditing, confidentiality, integrity etc
Differential privacy: definitions, computing sensitivity & stability
CS 245 18
![Page 18: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/18.jpg)
Putting These Concepts Together
How can you integrate these different concepts into a coherent system design?
How to change system to meet various goals (performance, concurrency, security, etc)?
CS 245 19
![Page 19: Course Review - Stanford Universityweb.stanford.edu/class/cs245/slides/19-Review.pdfUndo, redo, and undo/redo logging Recovery rulesfor various algorithms (including handling crashes](https://reader033.fdocuments.us/reader033/viewer/2022053020/5f2af6743d5d2b1e7e56ead7/html5/thumbnails/19.jpg)
Send Us Your Feedback!
We want to keep improving the course and tuning the content, so write a course eval
CS 245 20