CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf ·...
Transcript of CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf ·...
![Page 1: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/1.jpg)
CISC 7610 Lecture 2Review of relational databases
Topics:Relational database management systems
Example data modeling problemSchema normalization
SQL queries
![Page 2: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/2.jpg)
Relational database management systems
![Page 3: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/3.jpg)
A relational database management system (RDBMS)
● Uses relational data structures
● Has a declarative data manipulation language at least as powerful as the relational algebra
● Not required, but typically also– Supports ACID transactions
– Uses SQL as the data manipulation language
![Page 4: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/4.jpg)
Uses relational data structures
● Relation: table with rows and columns
● Attribute: column
● Tuple: row
● Key: combination of attributes that uniquely identifies each row
● Integrity rules: Constraints imposed upon the database
![Page 5: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/5.jpg)
Uses relational data structures
Artist Album Track Num Released Track Dur
David Bowie
Space Oddity
1 1969 Space Oddity
5:15
David Bowie
… Ziggy Stardust ...
10 1972 Suffragette city
3:25
David Bowie
Best of Bowie
1 2002 Space Oddity
5:15
David Bowie
Best of Bowie
8 2002 Suffragette city
3:25
Queen Hot space 11 1982 Under pressure
4:02
![Page 6: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/6.jpg)
Uses relational data structures
Artist Album Track Num Released Track Dur
David Bowie
Space Oddity
1 1969 Space Oddity
5:15
David Bowie
… Ziggy Stardust ...
10 1972 Suffragette city
3:25
David Bowie
Best of Bowie
1 2002 Space Oddity
5:15
David Bowie
Best of Bowie
8 2002 Suffragette city
3:25
Queen Hot space 11 1982 Under pressure
4:02
Relation
Tuple
Attribute Key
![Page 7: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/7.jpg)
Has a declarative data manipulation language
● Declarative: says what, not how to manipulate data– Other examples of declarative programming languages?
● Relational algebra– Selection: extract a subset of tuples
– Projection: extract a subset of attributes
– Cartesian product: extract all combinations of pairs of tuples from two relations
– Union: combine two sets of tuples
– Set difference: remove one set of tuples from another
![Page 8: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/8.jpg)
Supports ACID transactions
● Transaction: A sequence of DB operations that represents a single real-world operation
● ACID properties – Guaranteed by RDBMSs– Atomicity: all operations happen or none
– Consistency: transaction moves DB from one state that meets integrity constraints to another
– Isolation: concurrent transactions have the same effect as serial
– Durability: once committed, transaction’s effects are permanent
● Example: bank account transfer
● Relaxed by NoSQL databases in various combinations
![Page 9: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/9.jpg)
Structured query language (SQL)
● Data definition language– Define relational schemata (plural of “schema”)
– Create/alter/delete tables and their attributes
● Data manipulation language– Insert/delete/modify tuples in relations
– Query one or more tables
● Can implement relational algebra, but also takes some liberties with it
![Page 10: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/10.jpg)
Example data modeling problem
![Page 11: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/11.jpg)
Example data: Music collection
● Artists: Name
● Albums: Name, Release date
● Tracks: Name, Duration, Number
● Each album has one artist
● Tracks can appear on multiple albums (compilations)
![Page 12: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/12.jpg)
Entity-relationship diagrams
EntityAttribute
Relationship
Cardinality
Entity2
Cardinality2
![Page 13: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/13.jpg)
Do: Draw ER diagram for ex data
● Artists: Name
● Albums: Name, Release date
● Tracks: Name, Duration, Number
● Each album has one artist
● Tracks can appear on multiple albums (compilations)
![Page 14: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/14.jpg)
Entity-relationship diagrams
Artist
NameContains
many
Album
Name
Release date
Album
Created
Name
Duration
Track number
many
many
one
![Page 15: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/15.jpg)
Translating ER diagrams to schema
● Entities become tables
● Attributes become their attributes
● Many-to-many relationships become join tables– Can have additional attributes
● Other relationships become foreign keys– One-to-one, many-to-one, one-to-many
– Attributes added to table
![Page 16: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/16.jpg)
Do: Translate ER diagram to schema for example data
![Page 17: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/17.jpg)
Translating ER diagrams to schema
Artists
Id Name
Albums
Id Name Release ArtistId
AlbumsHaveTracks
AlbumId TrackId Number
Track
Id Name Duration
![Page 18: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/18.jpg)
SQL CREATE statement
CREATE TABLE table_name
(
column_name1 data_type(size),
column_name2 data_type(size),
column_name3 data_type(size),
....
);
![Page 19: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/19.jpg)
SQL INSERT statement
INSERT INTO table_name
(column1,column2,column3,...)
VALUES
(value1,value2,value3,...);
![Page 20: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/20.jpg)
Do: Populate tables with ex data
Artists
Id Name
1 David Bowie
2 Queen
Albums
Id Name Release ArtistId1 Space oddity 1969 1
2 … Ziggy startdust ...
1972 1
3 Best of Bowie 2002 1
4 Hot space 1982 2
AlbumsHaveTracks
AlbumId TrackId Number
1 1 1
2 2 10
3 1 1
3 2 8
4 3 11
Track
Id Name Duration
1 Space oddity
5:15
2 Suffragette city
3:25
3 Under pressure
4:02
![Page 21: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/21.jpg)
Schema normalization
![Page 22: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/22.jpg)
Schema normalization:Unnormalized data
Artist Album Released Track Num Track Dur
David Bowie Space Oddity
1969 1 Space Oddity
5:15
David Bowie … Ziggy Stardust ...
1972 10 Suffragette city
3:25
David Bowie Best of Bowie
2002 1 Space Oddity
5:15
David Bowie Best of Bowie
2002 8 Suffragette city
3:25
Queen Hot space 1982 11 Under pressure
4:02
![Page 23: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/23.jpg)
Schema normalization:Anomalies in unnormalized data
● The above example unnormalized schema can suffer from three types of “anomalies”–
![Page 24: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/24.jpg)
Schema normalization:Anomalies in unnormalized data
● The above example unnormalized schema can suffer from three types of “anomalies”– Update anomaly: repeated data could be inconsistent
between rows
– Insertion anomaly: can’t add info on artist or album that doesn’t have a track
– Deletion anomaly: deleting the last track deletes an album or artist
![Page 25: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/25.jpg)
Schema normalization:Normal forms
● Schema normalization factors logically independent data into independent relations
● And links them using foreign key relationships● Projection is the process of factoring an unnormalized
relation into separate normalized relations● Boyce-Codd normal form: there are only non-trivial
functional dependencies from superkeys (sets of attributes that uniquely identify entities) to other attributes
![Page 26: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/26.jpg)
Schema normalization:Unnormalized data
Artist Album Released Track Num Track Dur
David Bowie Space Oddity
1969 1 Space Oddity
5:15
David Bowie … Ziggy Stardust ...
1972 10 Suffragette city
3:25
David Bowie Best of Bowie
2002 1 Space Oddity
5:15
David Bowie Best of Bowie
2002 8 Suffragette city
3:25
Queen Hot space 1982 11 Under pressure
4:02
![Page 27: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/27.jpg)
Schema normalization:Normalized data
Artists
Id Name
1 David Bowie
2 Queen
Albums
Id Name Release ArtistId1 Space oddity 1969 1
2 … Ziggy startdust ...
1972 1
3 Best of Bowie 2002 1
4 Hot space 1982 2
AlbumsHaveTracks
AlbumId TrackId Number
1 1 1
2 2 10
3 1 1
3 2 8
4 3 11
Track
Id Name Duration
1 Space oddity
5:15
2 Suffragette city
3:25
3 Under pressure
4:02
![Page 28: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/28.jpg)
Reminder: Main question of course
How can systems process and store multimedia data so that users can find what they are looking for in
the future?
![Page 29: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/29.jpg)
SQL queries
![Page 30: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/30.jpg)
Queries: find what they are looking for
● Search through the data
● Search through complex relationships
● Aggregate over the data for reporting
● And do all of this efficiently...
![Page 31: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/31.jpg)
SQL SELECT, single table
SELECT attribute1, attribute2
FROM relation
WHERE attribute1 = 'condition'
ORDER BY attribute2;
![Page 32: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/32.jpg)
Do: Write a select query to answer
What is the duration of “Suffragette City”?
![Page 33: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/33.jpg)
SQL SELECT, multiple tables
SELECT r1.attribute1, r2.attribute1
FROM relation1 AS r1,
Relation2 AS r2
WHERE attribute1 = 'condition'
AND r1.attribute1 = r2.attribute2
ORDER BY r1.attribute1;
![Page 34: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/34.jpg)
Do: Write a select query to answer
Find the AlbumIds of all of David Bowie's albums
![Page 35: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/35.jpg)
Do: Write a select query to answer
Find the TrackIds of all of David Bowie's tracks
![Page 36: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/36.jpg)
How would you write a select query to answer
● Find all songs containing David Bowie's vocals
● Find all songs at 120 beats per minute
● Find all songs sampled by other artists– These all require further modeling or analysis of the
audio...
![Page 37: CISC 7610 Lecture 2 Review of relational databasesm.mr-pc.org/t/cisc7610/2018fa/lecture02.pdf · David Bowie Best of Bowie 2002 1 Space Oddity 5:15 David Bowie Best of Bowie 2002](https://reader030.fdocuments.us/reader030/viewer/2022020303/5b924e2609d3f215288d9992/html5/thumbnails/37.jpg)
How do we make databases that are
● Effective (correct, durable, coherent, ...)– Transactions
● Efficient– Concurrency
– Memory hierarchy
– Indexing
– Query optimization