CS345: Advanced Databases
description
Transcript of CS345: Advanced Databases
![Page 1: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/1.jpg)
CS345: Advanced Databases
Chris Ré
![Page 2: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/2.jpg)
What this course isDatabase fundamentals:– Theory– Old Crusty, Good SQL stuff– No/New/Not-Yet SQL
New stuff: Knowledge bases & Inference
Databases is a strange and beautiful area: Theory, Algorithms, Systems, & Applications
It’s a bit scattered, and I love it.
![Page 3: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/3.jpg)
A Brief, BiasedDatabase History
![Page 4: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/4.jpg)
Three Turing Award Winners
Charles Bachmann
Edgar Codd
JimGray
Seminal contributions made in Industry
![Page 5: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/5.jpg)
The Birth of the Relational Model(1971)
database: a handful of relations (tables) with fixed schema.
WorksIn(Employee,Dept)
Query with small # of operations:Selection (filter),
Projection, Join, Union.
Basically, an operational finite model theory.
![Page 6: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/6.jpg)
Data and Query ModelR(A,B) = { (a1,b2),…,(an,bn) }S(B,C,D) = { (b’1,c1,d1),…,(b’m,cm,dm) }
PA(R) ={ a : exists b. (a,b) in R } Projection
SelectionsF(R) ={ (a,b) : F( (a,b) ) for t in R }
F : D(R) -> {True, False}
Join(R,S) = { (a,b,c,d) : (a,b) in R & (b,c,d) in S} Join
Data
![Page 7: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/7.jpg)
Key idea of the Relational Model
Declarative User says what they want---
not how to get it.
![Page 8: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/8.jpg)
Key question: Can one implement the Relational
Model efficiently?
![Page 9: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/9.jpg)
System R
In,1974 System R shows possible to get good performance.
1st Implementation of SQL.
IBM didn’t Push it,worried about IMS cannibalization, but…
Pat Selinger
![Page 10: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/10.jpg)
Others Come on to the Scene…
Larry Ellison hears about IBM’s Research prototype and founds a company….
![Page 11: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/11.jpg)
Fast Forward to TodayRelational model is dominate model of
data.
![Page 12: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/12.jpg)
Takeaways about Database Research
Started with mathematical elegance and with close ties to industry.
Improve runtime performance as a proxy to increase programmer
productivity.
![Page 13: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/13.jpg)
The Big Ideas
![Page 14: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/14.jpg)
Independence
Declarative languages can improve productivity– Different team members work
independently• Backend, Storage, UI, BI, Etc.
– Transactional model.– Challenge: Support efficient concurrent
access?
![Page 15: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/15.jpg)
Performance
Parallel programming is hard; SQL is most popular parallel programming language.– How do you deal with asymmetry of
memory hierarchy (Disk/MM/Cache)? – How do you structure parallel
optimization?– Concurrency?
![Page 16: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/16.jpg)
Manageability
Systems live over time, and the system should automate many routine tasks.–Maintain derived data products (views)– Self-monitoring systems (autonomic)
![Page 17: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/17.jpg)
Course Topics
![Page 18: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/18.jpg)
A user says what they want—not how to get it.
![Page 19: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/19.jpg)
Topic 1: QP FundamentalsQuery Processing Fundamentals
1. Empirical Join evaluation from 70s!2. System R: The Archetype (Cardinalityw)3. Formal Query Languages4. Acyclic Query Evaluation (Structure)5. Worst-case Optimal Join Algorithms (S
+ C)This will be the most
formal part of the course.
![Page 20: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/20.jpg)
Analyzing your data before it was big (when it was just very large…)
![Page 21: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/21.jpg)
Topic 2: OLAP-Style Analytics
Building new and old data systems:1. Theory of Materialized View2. Gamma (Parallel DBs) 3. MapReduce & the Rise of NoSQL
(2000s)4. NewSQL & Optimizing Joins on MR
(theory)5. Fagin’s Algorithm (theory)6. Statistical Analytic Systems
![Page 22: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/22.jpg)
My biased view of the future…
![Page 23: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/23.jpg)
Topic 3: Next-Generation Systems
1. Information Extraction2. Probabilistic Query Evaluation
(Theory)3. Scalable Inference4. Knowledge Bases
![Page 24: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/24.jpg)
Transactions.
![Page 25: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/25.jpg)
Topic 4: OLTP StyleTransactional Systems1. The rise of Key-Value Stores2. The case for determinism3. CALM & CAPs 4. The Return of Main Memory DBs.5. Spanner, F1, and Data Centers
![Page 26: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/26.jpg)
Course Logistics
![Page 27: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/27.jpg)
Grading• Course Project (More next)– Do something interesting with data.– Teams OK– Form teams soon and email me by Jan
12.
• Midterm Exam
![Page 28: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/28.jpg)
Projects in each topic1. Knowledgebase Construction– Pick a domain and build a KBC system for it with
DeepDive
2. Join Algorithms– Certificate versions (see me)– MapReduce? GraphLab? Spark?
3. Analytics Systems
4. Transactional Systems.
You are free to choose other
projects
![Page 29: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/29.jpg)
Datasets• Snapshot of the web marked up with NLP tools
and structured data (KBP and KBA challenges)
• 500k+ docs used by PaleoBiologists and structured data.
• We can mark up even more stuff.
• Benchmark ML, graphs if you want to work on analytics or join evaluation.
![Page 30: CS345: Advanced Databases](https://reader035.fdocuments.us/reader035/viewer/2022062316/5681690f550346895de026d2/html5/thumbnails/30.jpg)
Wednesday
• Wednesday we begin the ancient art of join evaluation. All who pass this way must pass through this ancient topic!
• Read: Shapiro.– not too carefully, we’ll go through
details