Week 5-relational algebra

12
1 Relational Algebra Basic Operators Select, project, cross, diff, union, rename Advanced Operators Joins (inner, outer) Extended Relational Algebra Relational Query Languages Query languages : Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization. Query Languages != programming languages QLs not expected to be “Turing complete”. QLs not intended to be used for complex calculations. QLs support easy, efficient access to large data sets. Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra : More operational (procedural), useful for representing execution plans. Relational Calculus : Allows users to describe what they want, rather than how to compute it: Non-operational, declarative . Preliminaries A query is applied to relation instances, and the result of a query is also a relation instance. Schemas of input relations for a query are fixed. The schema for the result of a given query is also fixed! - determined by definition of query language constructs. Positional vs. named-field notation: Positional notation easier for formal definitions, named-field notation more readable. Both used in SQL Relational Algebra o Procedural language o Six basic operators o select: o project: o union: o set difference: o Cartesian product: x o rename: o The operators take one or two relations as inputs and produce a new relation as a result. Relational Algebra Basic operations: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2. renaming ( ): Not essential, but (very!) useful. Additional operations: Intersection, join , division, Since each operation returns a relation, operations can be composed: algebra is “closed”.

description

 

Transcript of Week 5-relational algebra

Page 1: Week 5-relational algebra

1

Relational Algebra

• Basic Operators

–Select, project, cross, diff, union, rename

• Advanced Operators

– Joins (inner, outer)

• Extended Relational Algebra

Relational Query Languages

• Query languages: Allow manipulation and

retrieval of data from a database.

• Relational model supports simple, powerful

QLs:– Strong formal foundation based on logic.

– Allows for much optimization.

• Query Languages != programming

languages– QLs not expected to be “Turing complete”.

– QLs not intended to be used for complex calculations.

– QLs support easy, efficient access to large data sets.

Formal Relational Query Languages

• Two mathematical Query Languages form

the basis for “real” languages (e.g. SQL),

and for implementation:

– Relational Algebra: More operational

(procedural), useful for representing execution

plans.

– Relational Calculus: Allows users to describe

what they want, rather than how to compute it:

Non-operational, declarative.

Preliminaries

• A query is applied to relation instances, and the result of

a query is also a relation instance.

– Schemas of input relations for a query are fixed.

– The schema for the result of a given query is also

fixed! - determined by definition of query language

constructs.

• Positional vs. named-field notation:

– Positional notation easier for formal definitions,

named-field notation more readable.

– Both used in SQL

Relational Algebrao Procedural language

o Six basic operators

o select:

o project:

o union:

o set difference: –

o Cartesian product: x

o rename:

o The operators take one or two relations as inputs

and produce a new relation as a result.

Relational Algebra

• Basic operations:– Selection ( ) Selects a subset of rows from relation.

– Projection ( ) Deletes unwanted columns from relation.

– Cross-product ( ) Allows us to combine two relations.

– Set-difference ( ) Tuples in reln. 1, but not in reln. 2.

– Union ( ) Tuples in reln. 1 and in reln. 2.

– renaming ( ): Not essential, but (very!) useful.

• Additional operations:

– Intersection, join, division,

• Since each operation returns a relation, operations can be

composed: algebra is “closed”.

Page 2: Week 5-relational algebra

2

Formal Definition• A basic expression in the relational algebra consists of

either one of the following:

– A relation in the database

– A constant relation

• Let E1 and E2 be relational-algebra expressions; the

following are all relational-algebra expressions:

– E1 E2

– E1 – E2

– E1 x E2

– p (E1), P is a predicate on attributes in E1

– s(E1), S is a list consisting of some of the attributes in E1

– x (E1), x is the new name for the result of E1

Select Operation

• Notation: p(r)

• p is called the selection predicate

• Defined as:

p(r) = {t | t r and p(t)}

Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not)Each term is one of:

<attribute> op <attribute> or <constant>

where op is one of: =, , >, . <.

• Example of selection:

branch_name=“Perryridge”(account)

Select operation returns a relation that satisfies the

given predicate from the original relation.

Select Operation – Example Relation r

A B C D

1

5

12

23

7

7

3

10

A=B ^ D > 5 (r)A B C D

1

23

7

10

Project Operation

• Notation:

where A1, A2 are attribute names and r is a relation name.

• The result is defined as the relation of k columns obtained

by erasing the columns that are not listed

• Duplicate rows removed from result, since relations are

sets

• Example: To eliminate the branch_name attribute of

account

account_number, balance (account)

)( ,,, 21

rkAAA

Returns a relation with only the specified attributes.

Project Operation – Example

• Relation r:A B C

10

20

30

40

1

1

1

2

A C

1

1

1

2

=

A C

1

1

2

A,C (r)

Union Operation

• Notation: r s

• Defined as:

r s = {t | t r or t s}

• For r s to be valid.

1. r, s must have the same arity (same number of attributes)

2. The attribute domains must be compatible (example: 2nd column

of r deals with the same type of values as does the 2nd

column of s)

• Example: to find all customers with either an account or a

loan

customer_name (depositor) customer_name (borrower)

Results in a relation with all of the tuples that appear in either

or both of the argument relations.

Page 3: Week 5-relational algebra

3

Union Operation – Example

• Relations r, s:

r s:

A B

1

2

1

A B

2

3

rs

A B

1

2

1

3

Set Difference Operation

• Notation r – s

• Defined as:

r – s = {t | t r and t s}

• Set differences must be taken between compatible

relations.

– r and s must have the same arity

– attribute domains of r and s must be compatible

R – S produces all tuples in R but not in S

Set Difference

• Relations r, s:

r – s:

A B

1

2

1

A B

2

3

r

s

A B

1

1

Cartesian-Product Operation

• Notation r x s

• Defined as:

r x s = {t q | t r and q s}

• Assume that attributes of r(R) and s(S) are disjoint.

(That is, R S = ).

• If attributes of r and s are not disjoint, then renaming

must be used.

Combines any two relations

Output has the attributes of both relations

Repeated attribute names are preceded by the relation they originated

from.

Example: r= borrower × loan

(borrower.customer-name, borrower.loan-number,

loan.loan-number, loan.branch-name, loan.amount)

Cartesian-Product Relations r, s:

r x s:

A B

1

2

A B

1

1

1

1

2

2

2

2

C D

10

10

20

10

10

10

20

10

E

a

a

b

b

a

a

b

b

C D

10

10

20

10

E

a

a

b

br

s

Rename Operation• Allows us to name, and therefore to refer to, the results of

relational-algebra expressions.

• Allows us to refer to a relation by more than one name.

• Example:

x (E)

returns the expression E under the name X

• If a relational-algebra expression E has arity n, then

returns the result of expression E under the name X, and with the

attributes renamed to A1 , A2 , …., An .

)(),...,,( 21E

nAAAx

Useful for naming the unnamed relations returned from

other operations.

Page 4: Week 5-relational algebra

4

Composition of Operations

• Can build expressions using multiple operations

• Example: A=C(r x s)

• r x s

• A=C(r x s)

A B

1

1

1

1

2

2

2

2

C D

10

10

20

10

10

10

20

10

E

a

a

b

b

a

a

b

b

A B C D E

1

2

2

10

10

20

a

a

b

•Results of relational

operations are relations

themselves.

•Compositions of

operations form a

relational-algebra

expression.

Set-Intersection Operation

• Notation: r s

• Defined as:

• r s = { t | t r and t s }

• Assume:

– r, s have the same arity

– attributes of r and s are compatible

• Note: r s = r – (r – s)

Results in a relation that contains only the tuples

that appear in both relations.

Set-Intersection• Relation r, s:

• r s

A B

1

2

1

A B

2

3

r s

A B

2

Example Instances

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

sid sname rating age

28 yuppy 9 35.0

31 lubber 8 55.5

44 guppy 5 35.0

58 rusty 10 35.0

sid bid day

22 101 10/10/96

58 103 11/12/96

R1

S1

S2

• “Sailors” and

“Reserves” relations for

our examples.

Projection sname rating

yuppy 9

lubber 8guppy 5rusty 10

sname ratingS

,( )2

age

35.055.5

age S( )2

• Deletes attributes that are not in

projection list.

• Schema of result contains exactly

the fields in the projection list, with

the same names that they had in

the input relation.

• Projection operator has to

eliminate duplicates! Why?

– Note: real systems typically

don’t do duplicate elimination

unless the user explicitly asks

for it (by DISTINCT). Why not?

Selection

ratingS

82( )

sid sname rating age28 yuppy 9 35.058 rusty 10 35.0

sname rating

yuppy 9

rusty 10

sname rating ratingS

,( ( ))

82

• Selects rows that satisfy

selection condition.

• No duplicates in result!

Why?

• Schema of result identical

to schema of input

relation.

• What is Operator

composition?

• Selection is distributive

over binary operators

• Selection is commutative

Page 5: Week 5-relational algebra

5

Union, Intersection, Set-

Difference• All of these operations take

two input relations, which

must be union-compatible:

– Same number of fields.

– `Corresponding’ fields

have the same type.

• What is the schema of

result?

sid sname rating age

22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0

sid sname rating age

31 lubber 8 55.558 rusty 10 35.0

S S1 2

S S1 2

sid sname rating age

22 dustin 7 45.0

S S1 2

Banking Example

branch (BN, BC, Assets)

customer (CN, CS, CC)

account (AN, BN, balance)

depositor (CN, AN)

loan (LN, BN, amount)

borrower (CN, LN)

Banking Example

branch (branch_name, branch_city, assets)

customer (customer_name, customer_street, customer_city)

account (account_number, branch_name, balance)

loan (loan_number, branch_name, amount)

depositor (customer_name, account_number)

borrower (customer_name, loan_number)

Example

Queries

• Find all loans of over $1200

Find the loan number for each loan of an amount greater than

$1200

amount > 1200 (loan)

loan_number ( amount > 1200 (loan))

Find the names of all customers who have a loan, an account, or both,

from the bank

customer_name (borrower) customer_name (depositor)

• Find the names of all customers who have a loan at the Perryridge

branch.

Find the names of all customers who have a loan at the

Perryridge branch but do not have an account at any branch of

the bank.

customer_name ( branch_name = “Perryridge

( borrower.loan_number = loan.loan_number(borrower x loan))) –

customer_name(depositor)

customer_name ( branch_name=“Perryridge”

( borrower.loan_number = loan.loan_number(borrower x

loan)))

• Find the names of all customers who have a loan at the

Perryridge branch.

customer_name( loan.loan_number =

borrower.loan_number (( branch_name = “Perryridge” (loan)) x borrower))

customer_name ( branch_name = “Perryridge” (

borrower.loan_number = loan.loan_number (borrower x loan)))

Page 6: Week 5-relational algebra

6

Bank Example Querieso Find the largest account balance

o Strategy:

o Find those balances that are not the largest

o Rename account relation as d so that we can compare

each account balance with all others

o Use set difference to find those account balances that were

not found in the earlier step.

o The query is:

o

balance(account) - account.balance

( account.balance < d.balance (account x d (account)))

Notation: r s Natural-Join Operation

• Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R S obtained as

follows:

– Consider each pair of tuples tr from r and ts from s.

– If tr and ts have the same value on each of the attributes in R S, add a tuple t to the result, where

• t has the same value as tr on r

• t has the same value as ts on s

• Example:

R = (A, B, C, D)

S = (E, B, D)

– Result schema = (A, B, C, D, E)

– r s is defined as:

r.A, r.B, r.C, r.D, s.E ( r.B = s.B r.D = s.D (r x s))

Natural Join Operation – Example• Relations r, s:

A B

1

2

4

1

2

C D

a

a

b

a

b

B

1

3

1

2

3

D

a

a

a

b

b

E

r

A B

1

1

1

1

2

C D

a

a

a

a

b

E

s

r s

• Find the name of all customers who have a

loan at the bank and the loan amount

customer_name, loan_number, amount (borrower loan)

customer_name, loan.loan_number, amount

( borrower.loan_number = loan.loan_number(borrower x loan))

Theta Join• Condition Join:

• Result schema same as that of cross-product.

• Fewer tuples than cross-product, might be able to

compute more efficiently

• Sometimes called a theta-join.

R c S c R S ( )

(sid) sname rating age (sid) bid day

22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

S RS sid R sid

1 11 1

. .

Extended RA Operations

• Generalized Projection

• Outer Join

• Aggregate Functions

Page 7: Week 5-relational algebra

7

Generalized Projection

• Extends the projection operation by allowing arithmetic

functions to be used in the projection list.

F1, F2, …, Fn (E)

• E is any relational-algebra expression

• Each of F1, F2, …, Fn are arithmetic expressions

involving constants and attributes in the schema of E

• Given relation credit-info(customer-name, limit, credit-

balance), find how much more each person can spend

customer-name, limit – credit-balance (credit-info)

Aggregate Functions and Operations• Aggregation function takes a collection of values and

returns a single value as a result.

– Ex: Avg, Min, Max, Sum, Count

• Aggregate operation in relational algebra

G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E)

– E is any relational-algebra expression

– G1, G2 …, Gn is a list of attributes on which to group (can

be empty)

– Each Fi is an aggregate function

– Each Ai is an attribute name

Aggregate Operation – Examples

• Relation r

A B C

7

7

3

10

g sum(c) (r)sum-C

27

• List all the branch names along with the total balance of all accounts

branch-name g sum(balance) (account)

branch-name account-number balance

Perryridge

Perryridge

Brighton

Brighton

Redwood

A-102

A-201

A-217

A-215

A-222

400

900

750

750

700

branch-name balance

Perryridge

Brighton

Redwood

1300

1500

700

• Result of aggregation does not have a

name

– Can use rename operation to give it a

name

– For convenience, we permit renaming as

part of aggregate operation

branch-name g sum(balance) as sum-balance (account)

Null Valueso It is possible for tuples to have a null value, denoted by null,

for some of their attributes

o null signifies an unknown value or that a value does not

exist

o The result of any arithmetic expression involving null is null

o Aggregate functions simply ignore null values

o For duplicate elimination and grouping, null is treated like

any other value, and two nulls are assumed to be the same

o We follow the semantics of SQL in its handling of null

values

Page 8: Week 5-relational algebra

8

Outer Join

• An extension of the join operation that avoids loss

of information.

• Computes the join and then adds tuples from one

relation that does not match tuples in the other

relation to the result of the join.

• Uses null values:

– null signifies that the value is unknown or does not exist

Outer Join – Example

Relation loan

Relation borrower

customer-name loan-number

Jones

Smith

Hayes

L-170

L-230

L-155

3000

4000

1700

loan-number amount

L-170

L-230

L-260

branch-name

Downtown

Redwood

Perryridge

Inner Join

loan Borrower

loan-number amount

L-170

L-230

3000

4000

customer-name

Jones

Smith

branch-name

Downtown

Redwood

Jones

Smith

null

loan-number amount

L-170

L-230

L-260

3000

4000

1700

customer-namebranch-name

Downtown

Redwood

Perryridge

Left Outer Join

loan Borrower

Right Outer Join

loan borrower

loan borrowerFull Outer Join

loan-number amount

L-170

L-230

L-155

3000

4000

null

customer-name

Jones

Smith

Hayes

branch-name

Downtown

Redwood

null

loan-number amount

L-170

L-230

L-260

L-155

3000

4000

1700

null

customer-name

Jones

Smith

null

Hayes

branch-name

Downtown

Redwood

Perryridge

null

Division Operation

• Suited to queries that include the phrase “for all”

• Let r and s be relations on schemas R and S

respectively where

– R = (A1, …, Am, B1, …, Bn)

– S = (B1, …, Bn)

• The result of r s is a relation on schema

R – S = (A1, …, Am) such that

r s = { t | (t R-S(r)) ( u s ( tu r )) }

Notation r s

• Find all customers who have an account at all

branches located in Brooklyn city.

customer-name, branch-name (depositor account)

branch-name ( branch-city = “Brooklyn” (branch))

Page 9: Week 5-relational algebra

9

Division Operation – Example 1

Relations r, s

r s

A

A B

1

2

3

1

1

1

3

4

6

1

2

r

B

1

2

s

Division Operation – Example 2

Relations r, s

r s

A B

a

a

C

A B

a

a

a

a

a

a

a

a

C D

a

a

b

a

b

a

b

b

E

1

1

1

1

3

1

1

1

r

D

a

b

E

1

1

s

Assignment OperationNotation or :=

• The assignment operation provides a convenient way to

express complex queries

– Write query as a sequential program consisting of

• a series of assignments

• followed by an expression whose value is displayed

as a result of the query.

– Assignment must be made to a temporary relation

variable.

Assignment Operation

• Example: Write r s as

temp1 R-S (r )

temp2 R-S ((temp1 x s ) – R-S,S (r ))

result = temp1 – temp2

– The result to the right of the is assigned to the relation

variable on the left of the .

– May use variable in subsequent expressions.

53

Assignment Operator• Lots of time convenient to write relational algebra

expressions in parts using assignment to temporary relational variables.

• For this purpose use assignment operator, denoted by :=

• E.g., Who makes more than their manager?

– E(emp, dept, sal) M(mgr, dept) ESM(emp, sal, mgr) := Proj[emp, sal, mgr] ( E M)(Proj[ESM.emp](ESM [mgr = E.emp & ESM.sal >E.sal] E) )

Division (con’t)

Page 10: Week 5-relational algebra

10

A B

α 1

α 2

α 3

β 1

γ 1

δ 1

δ 3

δ 4

δ 6

ε 1

ε 2

Relation r, s: r B

1

2

s

A

α

ε

q

Properties of Division Operation

• Let q = r s

Then q is the largest relation satisfying: q s r

Schema for Student Registration System

Student (Id, Name, Addr, Status)

Professor (Id, Name, DeptId)

Course (DeptId, CrsCode, CrsName, Descr)

Transcript (StudId, CrsCode, Semester, Grade)

Teaching (ProfId, CrsCode, Semester)

Department (DeptId, Name)

Division - Example• List the Ids of students who have passed all courses that

were taught in second semester 2007-08

• Numerator: StudId and CrsCode for every course passed by every student

– StudId, CrsCode ( Grade „NC‟ (Transcript) )

• Denominator: CrsCode of all courses taught in spring 2000

– CrsCode ( Semester=„S2007‟ (Teaching) )

• Result is numerator/denominator

Division• Not supported as a primitive operator, but useful for

expressing queries like:

Find sailors who have reserved all boats.

• Let A have 2 fields, x and y; B have only field y:

– A/B =

– i.e., A/B contains all x tuples (sailors) such that for

every y tuple (boat) in B, there is an xy tuple in A.

– Or: If the set of y values (boats) associated with an x value

(sailor) in A contains all y values in B, the x value is in A/B.

• In general, x and y can be any lists of fields; y is the list of

fields in B, and x y is the list of fields of A.

x x y A y B| ,

Examples of Division A/B

sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2

s4 p2

s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3

Expressing A/B Using Basic Operators

• Division is not essential op; just a useful shorthand.

– (Also true of joins, but joins are so common that systems

implement joins specially.)

• Idea: For A/B, compute all x values that are not `disqualified’

by some y value in B.

– x value is disqualified if by attaching y value from B, we

obtain an xy tuple that is not in A.

Disqualified x values:

A/B:

x x A B A(( ( ) ) )

x A( ) all disqualified tuples

Page 11: Week 5-relational algebra

11

Examples

sid sname rating age

22 dustin 7 45.0

31 lubber 8 55.5

58 rusty 10 35.0

bid bname color

101 Interlake Blue

102 Interlake Red

103 Clipper Green

104 Marine Red

sid bid day

22 101 10/10/96

58 103 11/12/96

Reserves

Sailors

Boats

Find names of sailors who’ve reserved

boat #103

• Solution 1: sname bidserves Sailors(( Re ) )

103

• Solution 2: sname bidserves Sailors( (Re ))

103

Solution 3:

( , Re )Temp servesbid

1103

( , )Temp Temp Sailors2 1

sname Temp( )2

Find names of sailors who’ve reserved a red boat

• Information about boat color only available in

Boats; so need an extra join:

sname color redBoats serves Sailors((

' ') Re )

A more efficient solution:

sname sid bid color redBoats s Sailors( ((

' ') Re ) )

A query optimizer can find this given the first solution!

Find sailors who’ve reserved a red or a

green boat• Can identify all red or green boats, then

find sailors who’ve reserved one of

these boats:

( , (' ' ' '

))Tempboatscolor red color green

Boats

sname Tempboats serves Sailors( Re )

Can also define Tempboats using union! (How?)

Find sailors who’ve reserved a red and a

green boat• Cut-and-paste previous slide?

(Tempboats,(color 'red ' color 'green '

Boats))

sname Tempboats serves Sailors( Re )

Find sailors who’ve reserved a red and a

green boat• Previous approach won’t work! Must identify

sailors who’ve reserved red boats, sailors

who’ve reserved green boats, then find the

intersection (note that sid is a key for Sailors):

( , ((' '

) Re ))Tempredsid color red

Boats serves

sname Tempred Tempgreen Sailors(( ) )

( , ((' '

) Re ))Tempgreensid color green

Boats serves

Page 12: Week 5-relational algebra

12

Find the names of sailors who’ve

reserved all boats

• Uses division; schemas of the input

relations to / must be carefully chosen:

( , (,

Re ) / ( ))Tempsidssid bid

servesbid

Boats

sname Tempsids Sailors( )

To find sailors who’ve reserved all ‘Interlake’ boats:

/ (' '

)bid bname Interlake

Boats.....