CPS 216: Advanced Database Systems

23
CPS 216: Advanced Database Systems Shivnath Babu

description

CPS 216: Advanced Database Systems. Shivnath Babu. Outline for Today. What this class is about: Data management What we will cover in this class Logistics. What does a Database System mean to you? (Hint: What are they used for? Give examples). User/Application. Data. Data Management. - PowerPoint PPT Presentation

Transcript of CPS 216: Advanced Database Systems

Page 1: CPS 216: Advanced Database Systems

CPS 216: Advanced Database Systems

Shivnath Babu

Page 2: CPS 216: Advanced Database Systems

Outline for Today

• What this class is about: Data management• What we will cover in this class• Logistics

What does a Database System mean to you? (Hint: What are they used for? Give examples)

Page 3: CPS 216: Advanced Database Systems

Data Management

Data

QueryQuery Query

Use

r/A

pplic

atio

n

DataBase Management System (DBMS)

Page 4: CPS 216: Advanced Database Systems

Example: At a Company

ID Name DeptID Salary …

10 Nemo 12 120K …

20 Dory 156 79K …

40 Gill 89 76K …

52 Ray 34 85K …

… … … … …

ID Name …

12 IT …

34 Accounts …

89 HR …

156 Marketing …

… … …

Employee Department

Query 1: Is there an employee named “Nemo”?Query 2: What is “Nemo’s” salary?Query 3: How many departments are there in the company?Query 4: What is the name of “Nemo’s” department?Query 5: How many employees are there in the “Accounts” department?

Page 5: CPS 216: Advanced Database Systems

DataBase Management System (DBMS)

High-levelHigh-levelQuery QQuery Q

DBMS

Data

Answer

Translates Q intobest execution plan

for current conditions,runs plan

Page 6: CPS 216: Advanced Database Systems

Example: Store that Sells Cars

Make Model OwnerID

Honda Accord 12

Toyota Camry 34

Mini Cooper 89

Honda Accord 156

… … …

ID Name Age

12 Nemo 22

34 Ray 42

89 Gill 36

156 Dory 21

… … …

Cars Owners

Filter (Make = Honda andModel = Accord)

Join (Cars.OwnerID = Owners.ID)

Make Model OwnerID ID Name Age

Honda Accord 12 12 Nemo 22

Honda Accord 156 156 Dory 21

Owners ofHonda Accords

who are <=23 years old

Filter (Age <= 23)

Page 7: CPS 216: Advanced Database Systems

DataBase Management System (DBMS)

High-levelHigh-levelQuery QQuery Q

DBMS

Data

Answer

Translates Q intobest execution plan

for current conditions,runs plan

Keeps data safe and correct

despite failures, concurrent

updates, online processing, etc.

Page 8: CPS 216: Advanced Database Systems

DBMS is multi-user

• ExampleGet account balance from database;If balance > amount of withdrawal then balance = balance - amount of withdrawal; dispense cash; store new balance into database;

• Homer at ATM1 withdraws $100• Marge at ATM2 withdraws $50• Initial balance = $400, final balance = ?

– Should be $250 no matter who goes first

Page 9: CPS 216: Advanced Database Systems

Final balance = $250

read balance; $400if balance > amount then balance = balance - amount; $300 write balance; $300

read balance; $300if balance > amount then balance = balance - amount; $250 write balance; $250

Homer withdraws $100:

Marge withdraws $50:

Page 10: CPS 216: Advanced Database Systems

Final balance = $300

read balance; $400

if balance > amount then balance = balance - amount; $300 write balance; $300

read balance; $400If balance > amount then balance = balance - amount; $350 write balance; $350

Homer withdraws $100:Marge withdraws $50:

Page 11: CPS 216: Advanced Database Systems

Final balance = $350

read balance; $400

if balance > amount then balance = balance - amount; $300 write balance; $300

read balance; $400

if balance > amount then balance = balance - amount; $350 write balance; $350

Homer withdraws $100:Marge withdraws $50:

Page 12: CPS 216: Advanced Database Systems

Concurrency control in DBMS

• Similar to concurrent programming problems– But data is not all in main-memory

• Appears similar to file system concurrent access?– Approach taken by MySQL initially; now

MySQL offers better alternatives• But want to control at much finer granularity

• Or else one withdrawal would lock up all accounts!

Page 13: CPS 216: Advanced Database Systems

Recovery in DBMS

• Example: balance transferdecrement the balance of account X by $100;increment the balance of account Y by $100;

• Scenario 1: Power goes out after the first instruction

• Scenario 2: DBMS buffers and updates data in memory (for efficiency); before they are written back to disk, power goes out

• Log updates; undo/redo during recovery

Page 14: CPS 216: Advanced Database Systems

DataBase Management System (DBMS)

High-levelHigh-levelQuery QQuery Q

DBMS

Data

Answer

Translates Q intobest execution plan

for current conditions,runs plan

Keeps data safe and correct

despite failures, concurrent

updates, online processing, etc.

Page 15: CPS 216: Advanced Database Systems

Summary of modern DBMS features

• Persistent storage of data• Logical data model; declarative queries and

updates ! physical data independence• Multi-user concurrent access• Safety from system failures• Performance, performance, performance

– Massive amounts of data (terabytes ~ petabytes)– High throughput (thousands ~ millions

transactions per minute)– High availability (¸ 99.999% uptime)

Page 16: CPS 216: Advanced Database Systems

Modern DBMS Architecture

Disk(s)

Applications

OS

Parser

Query Optimizer

Query Executor

Storage Manager

Logical query plan

Physical query plan

Access method API calls

SQL

File system API callsStorage system API calls

DBMS

Page 17: CPS 216: Advanced Database Systems

Course Outline• 40% of the class is about core DBMS concepts

– Query execution, query optimization, transactions, recovery, etc.

– Textbook material

• 60% of the class is on “what is happening today in data management”– New developments on textbook material– Data streams– Web search – Google, Yahoo!– Data integration (structured data + unstructured data)– Data mining– Unsolved challenges

Page 18: CPS 216: Advanced Database Systems

Using a Traditional DBMS

User/ApplicationUser/Application

LoaderLoader

QueryQuery ResultResult

Table R

Table S

ResultResult……

QueryQuery……

Page 19: CPS 216: Advanced Database Systems

New Approach for Data Streams

User/ApplicationUser/Application

Register Register Continuous QueryContinuous Query(Standing Query)(Standing Query)

Stream QueryProcessorInput streams

ResultResult

Page 20: CPS 216: Advanced Database Systems

Example Continuous (Standing) Queries

• Web – Amazon’s best sellers over last hour

• Network Intrusion Detection– Track HTTP packets with destination address

matching a prefix in given table and content matching “*\.ida”

• Finance – Monitor NASDAQ stocks between $20 and

$200 that have moved down more than 2% in the last 20 minutes

Page 21: CPS 216: Advanced Database Systems

New Challenges in DBMSs

High-levelHigh-levelQuery QQuery Q

DBMS

Answer

Data

TeraBytes PetaBytes<CD> <TITLE>Empire B.</TITLE> <ARTIST>Bob Dylan</ARTIST> <COUNTRY>USA</COUNTRY> <COMPANY>Columbia</COMPANY><PRICE>10.90</PRICE></CD>

Page 22: CPS 216: Advanced Database Systems

Course Logistics

• Reference: Database Systems: The Complete Book, by H. Garcia-Molina, J. D. Ullman, and J. Widom

• Web site: http://www.cs.duke.edu/courses/fall07/cps216

• Grading:– Project 30%– Homework Assignments 20%– Midterm 20%– Final 30%

Page 23: CPS 216: Advanced Database Systems

Summary: Data Management is Important

• Core aspect of most sciences and engineering today

• Core need in industry

• Cool mix of theory and systems

• Chances are you will find something interesting even if you primary interest is elsewhere