CS304: Database Systemsdjmvfb/courses/cs2300/static...Displaying a relational database schema and...

Post on 12-Jul-2020

7 views 1 download

Transcript of CS304: Database Systemsdjmvfb/courses/cs2300/static...Displaying a relational database schema and...

CS2300: File Structures and

Introduction to Database Systems

Lecture 9: Relational Model &

Relational Algebra

Doug McGeehan

1

222

Brief Review

• Relational model conceptsInformal Terms Formal Terms

Table Relation

Column Attribute

Row Tuple

All possible values in a column Domain

Table Definition Schema of Relation

Populated Table Extension/State

333

Brief Review

• Relational model concepts

• Keys and Superkeys

– e.g. Vehicle(VIN, Reg#, State, MPG, Odometer)

– Keys: {VIN} and {Reg#, State}

– Superkeys: {VIN, Odometer}

or {Reg#, State, MPG}

• They both include a key; one attribute can be removed

– Candidate keys: {VIN}, {Reg#, State}

– Primary key: {VIN}

444

Brief Review

• Relational model concepts

• Keys and Superkeys

• Relational model constraints

– Domain / NOT NULL constraints (on attributes)

– Key constraints (on a single relation)

– Entity integrity constraint (on a single relation)

– Referential integrity constraint (on two relations)

Referential Integrity

A referential integrity constraint can be

displayed in a relational database schema as

a directed arc from R1.FK to R2.

SID Name Address SID Course Grade

R1 R2

5

Valid and Invalid State

• Valid state: A database state satisfies all

integrity constraints.

• Invalid state: A database state that does not

obey some integrity constraint(s).

SID Name Address

101 Alice Rolla

SID Course Grade

111 CS238 A

101 CS304 A

R1 R2Invalid state

6

Valid and Invalid State

• Valid state: A database state satisfies all

integrity constraints.

• Invalid state: A database state that does not

obey some integrity constraint(s).

SID Name Address

101 Alice Rolla

SID Course Grade

101 CS238 A

101 CS304 A

R1 R2Valid state

7

8

Displaying a relational database

schema and its constraints

• Relation schema: displayed as a row of attribute names

• Name of the relation: above the attribute names

• Primary key attribute(s): underlined

• Foreign key (referential integrity) constraint:– A directed arc (arrow) from the foreign key attributes to the

referenced relation

– For clarity, can also point to the primary key of the referenced relation.

How to identify a foreign key?

• Start from a Candidate Key

• Check if it is referenced by other

relations or even in the same relation

9

SID Name Address SID Course Grade

R1 R2

10

Referencing and

referenced relations

can be the same

11

ExerciseConsider the following relations for a database that keeps track of student

enrollment in courses and the books adopted for each course:

STUDENT(SSN, Name, Major, Bdate)

COURSE(Course#, Cname, Dept)

ENROLL(SSN, Course#, Quarter, Grade)

BOOK_ADOPTION(Course#, Quarter, Book_ISBN)

TEXT(Book_ISBN, Book_Title, Publisher, Author)

Draw a relational schema diagram specifying the foreign keys for

this schema.

11

Question

• Car(State, Reg#, SerialNo, Make, Model, Year)

– Car has two candidate keys: {state, reg#}, {SerialNo}

• Accident(SerialNo, date)

Is SerialNo a foreign key in Car or Accident relation?

Answer: SerialNo is a foreign key in Accident relation

because it is a candidate key in Car relation.

12

Operations of the Relational Model

• Operations can be categorized into

– Retrieval

• Query a database

– Updates

• Change the database

• Basic update operations for changing the database:

– INSERT a new tuple in a relation

– DELETE an existing tuple from a relation

– MODIFY an attribute of an existing tuple

13

Update Operations on Relations

• Integrity constraints should not be

violated by the update operations.

• Updates may propagate to cause

other updates automatically.

– This may be necessary to maintain

integrity constraints.

14

Example Relations

coursesidtid

os20220

calculus45010

db 20220

db24020 depttnametid

mathcohen10

cslevy20

yearsnamesid

3white240

3jones202

1adams450

S

T

R

15

Example Relations

coursesidtid

os20220

calculus45010

db 20220

db24020 depttnametid

mathcohen10

cslevy20

yearsnamesid

3white240

3jones200

1adams450

S

T

R

16

Example Relations

coursesidtid

os20020

calculus45010

db 20020

db24020 depttnametid

mathcohen10

cslevy20

yearsnamesid

3white240

3jones200

1adams450

S

T

R

17

Possible Violations for Insert Operation

• INSERT may violate any of the constraints:

– Domain constraint / NOT NULL constraint:

• A new attribute in new tuple is not in attribute’s domain

– Key constraint:

• Key attribute in new tuple already exists in its relation

– Entity integrity:

• Primary key value is NULL in the new tuple

– Referential integrity:

• Foreign key in new tuple references non-existent primary key

18

Possible Violations for Delete Operation

• DELETE may violate only referential integrity:

– If the primary key value of the tuple being deleted is

referenced from other tuples in the database

– Can be remedied by several actions

• Reject the deletion

• Propagate the change to the referencing tuples

• Set the foreign keys of the referencing tuples to NULL

when the referencing attributes are not part of the

primary key

– One of the above options must be specified during

database design for each foreign key constraint

19

Possible Violations for Modify Operation

• MODIFY may violate any of the constraints when

– Updating the primary key (PK)

• Similar to a DELETE followed by an INSERT

• Need to specify similar remedies to DELETE

– Updating a foreign key (FK)

• May violate referential integrity

– Updating an ordinary attribute (neither PK nor FK):

• Can only violate domain constraints

or NOT NULL constraint

20

How to Deal With Violations?

• In case of integrity violation caused by any operation,

several actions can be taken:

– Reject the operation that causes the violation

– Correct the violation by triggering additional

updates

– Perform the operation but inform the user of the

violation

– Execute a user-specified error-correction routine

21

Relational Algebra

Chapter 8

22

What is an “Algebra”?

• Mathematical system consisting of:

– Operands : variables or values from which

new values can be constructed.

– Operators : symbols denoting procedures

that construct new values from given

values.

23

What is Relational Algebra?

• The relational model is an abstract

(mathematical) modeling of a table

• Relational algebra (RA): an algebra whose

operands are relations or variables that

represent relations.

– A (mathematical) query language for relations

– Query language = languages for writing questions

about the data

24

Why do we need to understand RA?

• Real queries are written in SQL, but are

translated by the query processor into

relational algebra

• Why?

– SQL is declarative, RA provides operations for

execution

– Optimization is easier in RA, since we can take

advantage of (provable) expression equivalences

25

What you should know from this course

1. How to write queries in relational algebra

2. How to calculate the result of a relational

algebra expression over a set of relations

3. How to determine whether two relational

algebra expressions are equivalent

26

Relational Algebra

• Relational algebra has a collection of operators

on relations

• Operators may be unary or binary

• The output of an operator is a relation

– Another way to say this is that the algebra is “closed”

– Therefore, operators can be composed one on

another

27

Basic Operators

• Relational Algebra has 5 basic operators:

– Project (π)

– Select ()

– Union (U)

– Set difference (-)

– Cartesian product (X)

• Other operators can be defined using the basic ones:

Intersection, Join, Division …

• A useful syntactic operator: Rename

28

Example Relations

coursesidtid

os20220

calculus45010

db 20220

db24020 depttnametid

mathcohen10

cslevy20

yearsnamesid

3white240

3jones202

1adams450

S = Students

T = Teachers

R = Studies

S

T

R

29

The Project Operation

• The Project operation is unary (i.e., it is

applied to a single relation)

• Denoted as: A1,…,An (R)

– A1,…,An are attributes

• Returns a new relation of only A1,…,An

from the original relation

• Output relation does not have a name

30

Example: Find the teacher’s ID (tid) for each course

coursetid

os20

calculus10

db20

coursesidtid

os20220

calculus45010

db20220

db24020

R

Less tuples in result. Why?

tid,course R

31

The Project Operation

• Duplicate elimination

– The Project operation removes any duplicate tuples,

so the output is a valid relation

• When computing a projection on a relation with

n tuples:

– What is the minimum cardinality of the result?

– What is the maximum cardinality?

32

The Project Operation

• list1 (list2 R) = list1 R

• Note:

– list2 must contain attributes in list1

– i.e. Intersection must not be empty

– Otherwise, the left-hand side is incorrect.

33

The Select Operation

• Unary operator, written as C(R)

– C is a Boolean condition over each single

tuple in R

– e.g. name = ‘Doug’(Instructors)

• Returns the tuples that satisfy C

34

Example

• Return the courses taught by teacher number 20

35

coursesidtid

os20220

db20220

db24020

coursesidtid

os20220

calculus45010

db20220

db24020

Rtid = 20 R

What types of Conditions can be used?

• The condition is made up of comparisons that

are connected using logical operators (and, or)

• Comparisons are between 2 attributes or

between an attribute and a constant

– attribute1 op attribute2 (e.g. balance<credit_limit)

– attribute1 op constant (e.g. tid=20)

Important! Conditions are evaluated a

single tuple at a time. 36

Types of Comparisons

• We can use any of the operators:

• When comparing an attribute with a

string, the string is written in single

quotes

= ,,,,,

)('' Tcohentname=37

The Select Operation

• Examples

– tid = 20 and course = `os’ R

38

coursesidtid

os20220

calculus45010

db20220

db24020

R

What are the results

of the above queries?

The Select Operation

• Examples

– tid = 20 and course = `os’ R

– tid = 10 and course = `db’ R

39

coursesidtid

os20220

calculus45010

db20220

db24020

R

What are the results

of the above queries?

The Select Operation

• Examples

– tid = 20 and course = `os’ R

– tid = 10 and course = `db’ R

– tid = 20 or sid=202 or sid=450 R

40

coursesidtid

os20220

calculus45010

db20220

db24020

R

What are the results

of the above queries?

The Select Operation

• c1 (c2 (R)) = c2 (c1 (R))

= c1 and c2 (R)

• When computing a selection on a relation

with n tuples,

– Minimum cardinality of result?

– Maximum cardinality of result?

41

Combining Selection and Projection

• What does this compute?

42

coursesidtid

os20220

calculus45010

db20220

db24020

Can we change

the order of

these two

operations?

course(tid = 20 R)

42

Example

• How would you find the names of the

third year students?

43

S yearsnamesid

3white240

3jones202

1adams450

sname

white

Jones

?

43

Example

• Find id of students named jones who are in

their first year or third year of studies

44

))(( )31('' Syearyearjonessnamesid ===

S yearsnamesid

3white240

3jones202

1adams450

sid

202

44

Once Again

45

• What is in the result of the query when

we have this instance of S?

S yearsnamesid

3white240

3jones202

1jones450

sid

202

45045

))(( )31('' Syearyearjonessnamesid ===

What’s Next

• More relational algebra operators …

• Project Phase 1 report due Monday

11:59pm

46