Week 5-relational algebra
-
Upload
garapatiavinash -
Category
Documents
-
view
151 -
download
3
description
Transcript of Week 5-relational algebra
1
Relational Algebra
• Basic Operators
–Select, project, cross, diff, union, rename
• Advanced Operators
– Joins (inner, outer)
• Extended Relational Algebra
Relational Query Languages
• Query languages: Allow manipulation and
retrieval of data from a database.
• Relational model supports simple, powerful
QLs:– Strong formal foundation based on logic.
– Allows for much optimization.
• Query Languages != programming
languages– QLs not expected to be “Turing complete”.
– QLs not intended to be used for complex calculations.
– QLs support easy, efficient access to large data sets.
Formal Relational Query Languages
• Two mathematical Query Languages form
the basis for “real” languages (e.g. SQL),
and for implementation:
– Relational Algebra: More operational
(procedural), useful for representing execution
plans.
– Relational Calculus: Allows users to describe
what they want, rather than how to compute it:
Non-operational, declarative.
Preliminaries
• A query is applied to relation instances, and the result of
a query is also a relation instance.
– Schemas of input relations for a query are fixed.
– The schema for the result of a given query is also
fixed! - determined by definition of query language
constructs.
• Positional vs. named-field notation:
– Positional notation easier for formal definitions,
named-field notation more readable.
– Both used in SQL
Relational Algebrao Procedural language
o Six basic operators
o select:
o project:
o union:
o set difference: –
o Cartesian product: x
o rename:
o The operators take one or two relations as inputs
and produce a new relation as a result.
Relational Algebra
• Basic operations:– Selection ( ) Selects a subset of rows from relation.
– Projection ( ) Deletes unwanted columns from relation.
– Cross-product ( ) Allows us to combine two relations.
– Set-difference ( ) Tuples in reln. 1, but not in reln. 2.
– Union ( ) Tuples in reln. 1 and in reln. 2.
– renaming ( ): Not essential, but (very!) useful.
• Additional operations:
– Intersection, join, division,
• Since each operation returns a relation, operations can be
composed: algebra is “closed”.
2
Formal Definition• A basic expression in the relational algebra consists of
either one of the following:
– A relation in the database
– A constant relation
• Let E1 and E2 be relational-algebra expressions; the
following are all relational-algebra expressions:
– E1 E2
– E1 – E2
– E1 x E2
– p (E1), P is a predicate on attributes in E1
– s(E1), S is a list consisting of some of the attributes in E1
– x (E1), x is the new name for the result of E1
Select Operation
• Notation: p(r)
• p is called the selection predicate
• Defined as:
p(r) = {t | t r and p(t)}
Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not)Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, , >, . <.
• Example of selection:
branch_name=“Perryridge”(account)
Select operation returns a relation that satisfies the
given predicate from the original relation.
Select Operation – Example Relation r
A B C D
1
5
12
23
7
7
3
10
A=B ^ D > 5 (r)A B C D
1
23
7
10
Project Operation
• Notation:
where A1, A2 are attribute names and r is a relation name.
• The result is defined as the relation of k columns obtained
by erasing the columns that are not listed
• Duplicate rows removed from result, since relations are
sets
• Example: To eliminate the branch_name attribute of
account
account_number, balance (account)
)( ,,, 21
rkAAA
Returns a relation with only the specified attributes.
Project Operation – Example
• Relation r:A B C
10
20
30
40
1
1
1
2
A C
1
1
1
2
=
A C
1
1
2
A,C (r)
Union Operation
• Notation: r s
• Defined as:
r s = {t | t r or t s}
• For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd column
of r deals with the same type of values as does the 2nd
column of s)
• Example: to find all customers with either an account or a
loan
customer_name (depositor) customer_name (borrower)
Results in a relation with all of the tuples that appear in either
or both of the argument relations.
3
Union Operation – Example
• Relations r, s:
r s:
A B
1
2
1
A B
2
3
rs
A B
1
2
1
3
Set Difference Operation
• Notation r – s
• Defined as:
r – s = {t | t r and t s}
• Set differences must be taken between compatible
relations.
– r and s must have the same arity
– attribute domains of r and s must be compatible
R – S produces all tuples in R but not in S
Set Difference
• Relations r, s:
r – s:
A B
1
2
1
A B
2
3
r
s
A B
1
1
Cartesian-Product Operation
• Notation r x s
• Defined as:
r x s = {t q | t r and q s}
• Assume that attributes of r(R) and s(S) are disjoint.
(That is, R S = ).
• If attributes of r and s are not disjoint, then renaming
must be used.
Combines any two relations
Output has the attributes of both relations
Repeated attribute names are preceded by the relation they originated
from.
Example: r= borrower × loan
(borrower.customer-name, borrower.loan-number,
loan.loan-number, loan.branch-name, loan.amount)
Cartesian-Product Relations r, s:
r x s:
A B
1
2
A B
1
1
1
1
2
2
2
2
C D
10
10
20
10
10
10
20
10
E
a
a
b
b
a
a
b
b
C D
10
10
20
10
E
a
a
b
br
s
Rename Operation• Allows us to name, and therefore to refer to, the results of
relational-algebra expressions.
• Allows us to refer to a relation by more than one name.
• Example:
x (E)
returns the expression E under the name X
• If a relational-algebra expression E has arity n, then
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An .
)(),...,,( 21E
nAAAx
Useful for naming the unnamed relations returned from
other operations.
4
Composition of Operations
• Can build expressions using multiple operations
• Example: A=C(r x s)
• r x s
• A=C(r x s)
A B
1
1
1
1
2
2
2
2
C D
10
10
20
10
10
10
20
10
E
a
a
b
b
a
a
b
b
A B C D E
1
2
2
10
10
20
a
a
b
•Results of relational
operations are relations
themselves.
•Compositions of
operations form a
relational-algebra
expression.
Set-Intersection Operation
• Notation: r s
• Defined as:
• r s = { t | t r and t s }
• Assume:
– r, s have the same arity
– attributes of r and s are compatible
• Note: r s = r – (r – s)
Results in a relation that contains only the tuples
that appear in both relations.
Set-Intersection• Relation r, s:
• r s
A B
1
2
1
A B
2
3
r s
A B
2
Example Instances
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sid bid day
22 101 10/10/96
58 103 11/12/96
R1
S1
S2
• “Sailors” and
“Reserves” relations for
our examples.
Projection sname rating
yuppy 9
lubber 8guppy 5rusty 10
sname ratingS
,( )2
age
35.055.5
age S( )2
• Deletes attributes that are not in
projection list.
• Schema of result contains exactly
the fields in the projection list, with
the same names that they had in
the input relation.
• Projection operator has to
eliminate duplicates! Why?
– Note: real systems typically
don’t do duplicate elimination
unless the user explicitly asks
for it (by DISTINCT). Why not?
Selection
ratingS
82( )
sid sname rating age28 yuppy 9 35.058 rusty 10 35.0
sname rating
yuppy 9
rusty 10
sname rating ratingS
,( ( ))
82
• Selects rows that satisfy
selection condition.
• No duplicates in result!
Why?
• Schema of result identical
to schema of input
relation.
• What is Operator
composition?
• Selection is distributive
over binary operators
• Selection is commutative
5
Union, Intersection, Set-
Difference• All of these operations take
two input relations, which
must be union-compatible:
– Same number of fields.
– `Corresponding’ fields
have the same type.
• What is the schema of
result?
sid sname rating age
22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.044 guppy 5 35.028 yuppy 9 35.0
sid sname rating age
31 lubber 8 55.558 rusty 10 35.0
S S1 2
S S1 2
sid sname rating age
22 dustin 7 45.0
S S1 2
Banking Example
branch (BN, BC, Assets)
customer (CN, CS, CC)
account (AN, BN, balance)
depositor (CN, AN)
loan (LN, BN, amount)
borrower (CN, LN)
Banking Example
branch (branch_name, branch_city, assets)
customer (customer_name, customer_street, customer_city)
account (account_number, branch_name, balance)
loan (loan_number, branch_name, amount)
depositor (customer_name, account_number)
borrower (customer_name, loan_number)
Example
Queries
• Find all loans of over $1200
Find the loan number for each loan of an amount greater than
$1200
amount > 1200 (loan)
loan_number ( amount > 1200 (loan))
Find the names of all customers who have a loan, an account, or both,
from the bank
customer_name (borrower) customer_name (depositor)
• Find the names of all customers who have a loan at the Perryridge
branch.
Find the names of all customers who have a loan at the
Perryridge branch but do not have an account at any branch of
the bank.
customer_name ( branch_name = “Perryridge
( borrower.loan_number = loan.loan_number(borrower x loan))) –
customer_name(depositor)
customer_name ( branch_name=“Perryridge”
( borrower.loan_number = loan.loan_number(borrower x
loan)))
• Find the names of all customers who have a loan at the
Perryridge branch.
customer_name( loan.loan_number =
borrower.loan_number (( branch_name = “Perryridge” (loan)) x borrower))
customer_name ( branch_name = “Perryridge” (
borrower.loan_number = loan.loan_number (borrower x loan)))
6
Bank Example Querieso Find the largest account balance
o Strategy:
o Find those balances that are not the largest
o Rename account relation as d so that we can compare
each account balance with all others
o Use set difference to find those account balances that were
not found in the earlier step.
o The query is:
o
balance(account) - account.balance
( account.balance < d.balance (account x d (account)))
Notation: r s Natural-Join Operation
• Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R S obtained as
follows:
– Consider each pair of tuples tr from r and ts from s.
– If tr and ts have the same value on each of the attributes in R S, add a tuple t to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
– Result schema = (A, B, C, D, E)
– r s is defined as:
r.A, r.B, r.C, r.D, s.E ( r.B = s.B r.D = s.D (r x s))
Natural Join Operation – Example• Relations r, s:
A B
1
2
4
1
2
C D
a
a
b
a
b
B
1
3
1
2
3
D
a
a
a
b
b
E
r
A B
1
1
1
1
2
C D
a
a
a
a
b
E
s
r s
• Find the name of all customers who have a
loan at the bank and the loan amount
customer_name, loan_number, amount (borrower loan)
customer_name, loan.loan_number, amount
( borrower.loan_number = loan.loan_number(borrower x loan))
Theta Join• Condition Join:
• Result schema same as that of cross-product.
• Fewer tuples than cross-product, might be able to
compute more efficiently
• Sometimes called a theta-join.
R c S c R S ( )
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96
S RS sid R sid
1 11 1
. .
Extended RA Operations
• Generalized Projection
• Outer Join
• Aggregate Functions
7
Generalized Projection
• Extends the projection operation by allowing arithmetic
functions to be used in the projection list.
F1, F2, …, Fn (E)
• E is any relational-algebra expression
• Each of F1, F2, …, Fn are arithmetic expressions
involving constants and attributes in the schema of E
• Given relation credit-info(customer-name, limit, credit-
balance), find how much more each person can spend
customer-name, limit – credit-balance (credit-info)
Aggregate Functions and Operations• Aggregation function takes a collection of values and
returns a single value as a result.
– Ex: Avg, Min, Max, Sum, Count
• Aggregate operation in relational algebra
G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E)
– E is any relational-algebra expression
– G1, G2 …, Gn is a list of attributes on which to group (can
be empty)
– Each Fi is an aggregate function
– Each Ai is an attribute name
Aggregate Operation – Examples
• Relation r
A B C
7
7
3
10
g sum(c) (r)sum-C
27
• List all the branch names along with the total balance of all accounts
branch-name g sum(balance) (account)
branch-name account-number balance
Perryridge
Perryridge
Brighton
Brighton
Redwood
A-102
A-201
A-217
A-215
A-222
400
900
750
750
700
branch-name balance
Perryridge
Brighton
Redwood
1300
1500
700
• Result of aggregation does not have a
name
– Can use rename operation to give it a
name
– For convenience, we permit renaming as
part of aggregate operation
branch-name g sum(balance) as sum-balance (account)
Null Valueso It is possible for tuples to have a null value, denoted by null,
for some of their attributes
o null signifies an unknown value or that a value does not
exist
o The result of any arithmetic expression involving null is null
o Aggregate functions simply ignore null values
o For duplicate elimination and grouping, null is treated like
any other value, and two nulls are assumed to be the same
o We follow the semantics of SQL in its handling of null
values
8
Outer Join
• An extension of the join operation that avoids loss
of information.
• Computes the join and then adds tuples from one
relation that does not match tuples in the other
relation to the result of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist
Outer Join – Example
Relation loan
Relation borrower
customer-name loan-number
Jones
Smith
Hayes
L-170
L-230
L-155
3000
4000
1700
loan-number amount
L-170
L-230
L-260
branch-name
Downtown
Redwood
Perryridge
Inner Join
loan Borrower
loan-number amount
L-170
L-230
3000
4000
customer-name
Jones
Smith
branch-name
Downtown
Redwood
Jones
Smith
null
loan-number amount
L-170
L-230
L-260
3000
4000
1700
customer-namebranch-name
Downtown
Redwood
Perryridge
Left Outer Join
loan Borrower
Right Outer Join
loan borrower
loan borrowerFull Outer Join
loan-number amount
L-170
L-230
L-155
3000
4000
null
customer-name
Jones
Smith
Hayes
branch-name
Downtown
Redwood
null
loan-number amount
L-170
L-230
L-260
L-155
3000
4000
1700
null
customer-name
Jones
Smith
null
Hayes
branch-name
Downtown
Redwood
Perryridge
null
Division Operation
• Suited to queries that include the phrase “for all”
• Let r and s be relations on schemas R and S
respectively where
– R = (A1, …, Am, B1, …, Bn)
– S = (B1, …, Bn)
• The result of r s is a relation on schema
R – S = (A1, …, Am) such that
r s = { t | (t R-S(r)) ( u s ( tu r )) }
Notation r s
• Find all customers who have an account at all
branches located in Brooklyn city.
customer-name, branch-name (depositor account)
branch-name ( branch-city = “Brooklyn” (branch))
9
Division Operation – Example 1
Relations r, s
r s
A
A B
1
2
3
1
1
1
3
4
6
1
2
r
B
1
2
s
Division Operation – Example 2
Relations r, s
r s
A B
a
a
C
A B
a
a
a
a
a
a
a
a
C D
a
a
b
a
b
a
b
b
E
1
1
1
1
3
1
1
1
r
D
a
b
E
1
1
s
Assignment OperationNotation or :=
• The assignment operation provides a convenient way to
express complex queries
– Write query as a sequential program consisting of
• a series of assignments
• followed by an expression whose value is displayed
as a result of the query.
– Assignment must be made to a temporary relation
variable.
Assignment Operation
• Example: Write r s as
temp1 R-S (r )
temp2 R-S ((temp1 x s ) – R-S,S (r ))
result = temp1 – temp2
– The result to the right of the is assigned to the relation
variable on the left of the .
– May use variable in subsequent expressions.
53
Assignment Operator• Lots of time convenient to write relational algebra
expressions in parts using assignment to temporary relational variables.
• For this purpose use assignment operator, denoted by :=
• E.g., Who makes more than their manager?
– E(emp, dept, sal) M(mgr, dept) ESM(emp, sal, mgr) := Proj[emp, sal, mgr] ( E M)(Proj[ESM.emp](ESM [mgr = E.emp & ESM.sal >E.sal] E) )
Division (con’t)
10
A B
α 1
α 2
α 3
β 1
γ 1
δ 1
δ 3
δ 4
δ 6
ε 1
ε 2
Relation r, s: r B
1
2
s
A
α
ε
q
Properties of Division Operation
• Let q = r s
Then q is the largest relation satisfying: q s r
Schema for Student Registration System
Student (Id, Name, Addr, Status)
Professor (Id, Name, DeptId)
Course (DeptId, CrsCode, CrsName, Descr)
Transcript (StudId, CrsCode, Semester, Grade)
Teaching (ProfId, CrsCode, Semester)
Department (DeptId, Name)
Division - Example• List the Ids of students who have passed all courses that
were taught in second semester 2007-08
• Numerator: StudId and CrsCode for every course passed by every student
– StudId, CrsCode ( Grade „NC‟ (Transcript) )
• Denominator: CrsCode of all courses taught in spring 2000
– CrsCode ( Semester=„S2007‟ (Teaching) )
• Result is numerator/denominator
Division• Not supported as a primitive operator, but useful for
expressing queries like:
Find sailors who have reserved all boats.
• Let A have 2 fields, x and y; B have only field y:
– A/B =
– i.e., A/B contains all x tuples (sailors) such that for
every y tuple (boat) in B, there is an xy tuple in A.
– Or: If the set of y values (boats) associated with an x value
(sailor) in A contains all y values in B, the x value is in A/B.
• In general, x and y can be any lists of fields; y is the list of
fields in B, and x y is the list of fields of A.
x x y A y B| ,
Examples of Division A/B
sno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2
s4 p2
s4 p4
pnop2
pnop2p4
pnop1p2p4
snos1s2s3s4
snos1s4
snos1
A
B1B2
B3
A/B1 A/B2 A/B3
Expressing A/B Using Basic Operators
• Division is not essential op; just a useful shorthand.
– (Also true of joins, but joins are so common that systems
implement joins specially.)
• Idea: For A/B, compute all x values that are not `disqualified’
by some y value in B.
– x value is disqualified if by attaching y value from B, we
obtain an xy tuple that is not in A.
Disqualified x values:
A/B:
x x A B A(( ( ) ) )
x A( ) all disqualified tuples
11
Examples
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
bid bname color
101 Interlake Blue
102 Interlake Red
103 Clipper Green
104 Marine Red
sid bid day
22 101 10/10/96
58 103 11/12/96
Reserves
Sailors
Boats
Find names of sailors who’ve reserved
boat #103
• Solution 1: sname bidserves Sailors(( Re ) )
103
• Solution 2: sname bidserves Sailors( (Re ))
103
Solution 3:
( , Re )Temp servesbid
1103
( , )Temp Temp Sailors2 1
sname Temp( )2
Find names of sailors who’ve reserved a red boat
• Information about boat color only available in
Boats; so need an extra join:
sname color redBoats serves Sailors((
' ') Re )
A more efficient solution:
sname sid bid color redBoats s Sailors( ((
' ') Re ) )
A query optimizer can find this given the first solution!
Find sailors who’ve reserved a red or a
green boat• Can identify all red or green boats, then
find sailors who’ve reserved one of
these boats:
( , (' ' ' '
))Tempboatscolor red color green
Boats
sname Tempboats serves Sailors( Re )
Can also define Tempboats using union! (How?)
Find sailors who’ve reserved a red and a
green boat• Cut-and-paste previous slide?
(Tempboats,(color 'red ' color 'green '
Boats))
sname Tempboats serves Sailors( Re )
Find sailors who’ve reserved a red and a
green boat• Previous approach won’t work! Must identify
sailors who’ve reserved red boats, sailors
who’ve reserved green boats, then find the
intersection (note that sid is a key for Sailors):
( , ((' '
) Re ))Tempredsid color red
Boats serves
sname Tempred Tempgreen Sailors(( ) )
( , ((' '
) Re ))Tempgreensid color green
Boats serves
12
Find the names of sailors who’ve
reserved all boats
• Uses division; schemas of the input
relations to / must be carefully chosen:
( , (,
Re ) / ( ))Tempsidssid bid
servesbid
Boats
sname Tempsids Sailors( )
To find sailors who’ve reserved all ‘Interlake’ boats:
/ (' '
)bid bname Interlake
Boats.....