Concepts of DB(3) Relational Operation & SQL
-
Upload
claire-lamb -
Category
Documents
-
view
39 -
download
2
description
Transcript of Concepts of DB(3) Relational Operation & SQL
Page 1Spring, 2006 KUT
Concepts of DB(3)Relational Operation & SQL
한국기술대학교 인터넷미디어공학부
민준기
Page 2Spring, 2006 KUT
2
Relational Data Operation
• Procedural language– The user instructs the system to perform a sequence of
operations on the database to compute the desired result what & how
– Relational algebra• Nonprocedural language
– The user describes the information desired without a specific procedure for obtaining the information what
– Relational calculus• 1.Tuple Relational Calculus• 2.Domain Relational Calculus
• Relational Algebra and Relational Calculus have same expression/computing power
Page 3Spring, 2006 KUT
3
Relational Algebra• Relational Algebra consists of several groups of operations
– Relational Algebra Operations From Set Theory• UNION ( )• INTERSECTION ( )• DIFFERENCE (or MINUS, – )• CARTESIAN PRODUCT ( x )
– Relational Operations• Unary Relational Operations
– SELECT (symbol: (sigma))– PROJECT (symbol: (pi))– RENAME (symbol: (rho))
• Binary Relational Operations– JOIN (several variations of JOIN exist, )– DIVISION ( ÷ )
– Additional Relational Operations• OUTER JOINS, • OUTER UNION• AGGREGATE FUNCTIONS (These compute summary of information: for
example, SUM, COUNT, AVG, MIN, MAX)
• closure property– Operand and operation results are relation– Support nested expressions
Page 4Spring, 2006 KUT
UNION
DIFFERENCE
10
INTERSECT
a
b
c
x
y
a
a
b
b
c
c
x
y
x
y
x
y
PRODUCT RESTRICTION
a1
a2
a3
b1
b1
b2
b1
b2
b3
c1
c2
c3
a1
a2
a3
b1
b1
b2
c1
c1
c2
JOIN
DIVIDE
a
b
c
x
y
a
a
b
b
c
c
x
y
x
y
x
y
PROJECTION
Page 5Spring, 2006 KUT
5
Relational Algebra Operations From Set Theoryⅰ. union,∪
R S ∪ = { t | t R t S }∈ ∨ ∈ |R S| ≤ |R| + |S|∪
ⅱ. intersect,∩R∩S = { t | t R t S }∈ ∧ ∈|R∩S| ≤ min{ |R|, |S| }
ⅲ. difference,- R S = { t | t R t S }∈ ∧ |R S| ≤ |R|
ⅳ. Cartesian product,× R×S = { r·s | r R s S }∈ ∧ ∈
· : concatenation |R×S| = |R|×|S|degree = R’s degree + S’s degree
Page 6Spring, 2006 KUT
6
▶Relational Operations • Relation : R(X) = R(A1, ... , An)
• R’s tuple r : <a1, ... , an>
R={r | r = <a1, ... , an> }
– ai : tuple r’s attribute Ai value
– ai = r.Ai = r[Ai]
• In general,– <r.A1 , r.A2 ,…, r.An > = < r[A1], r[A2], …, r[An] >
= r[A1, A2, … An] = r[X]
Page 7Spring, 2006 KUT
Slide 6- 7
Unary Relational Operations: SELECT• The SELECT operation (denoted by (sigma)) is used to select a
subset of the tuples from a relation based on a selection condition.– The selection condition acts as a filter– Keeps only those tuples that satisfy the qualifying condition– Tuples satisfying the condition are selected whereas the
other tuples are discarded (filtered out) horizontal subset
Av(R) = { r | r R r.Aθv }∈ ∧ AB(R) = { r | r R r.Aθr.B }∈ ∧
where, θ(theta) : <, >, ≤, ≥, = , ≠ • Examples:
– Select the EMPLOYEE tuples whose department number is 4: DNO = 4 (EMPLOYEE)
– Select the employee tuples whose salary is greater than $30,000:
SALARY > 30,000 (EMPLOYEE)
Page 8Spring, 2006 KUT
Slide 6- 8
Unary Relational Operations: SELECT (contd.)
• SELECT Operation Properties– The SELECT operation <selection condition>(R) produces a relation S that
has the same schema (same attributes) as R
– SELECT is commutative: <condition1>( < condition2> (R)) = <condition2> ( < condition1> (R))
– Because of commutativity property, a cascade (sequence) of SELECT operations may be applied in any order:
<cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> ( R)))
– A cascade of SELECT operations may be replaced by a single selection with a conjunction of all the conditions:
<cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND < cond3>(R)))
– The number of tuples in the result of a SELECT is less than (or equal to) the number of tuples in the input relation R
– Selectivity
Page 9Spring, 2006 KUT
Join
• 세타조인 (theta-join)R(X), S(Y), A X, B Y ∈ ∈ 에 대하여R AθB S = { r · s | r R s S ( r.Aθs.B) }∈ ∧ ∈ ∧– A, B : joining attribute– 결과 차수 = R 의 차수 + S 의 차수
• example– 학생 학번 = 학번 등록
• 동일조인 (equi-join)세타조인에서 θ 가 " = " 인 경우R A=BS = { r·s | r R s S ( r.A∈ ∧ ∈ ∧ = s.B ) }
Page 10Spring, 2006 KUT
Slide 6- 10
Unary Relational Operations: PROJECT• PROJECT Operation is denoted by (pi) • In Relation R(X),
if Y X and⊆ Y={B1,B2, … ,Bm},
Y(R) = { <r.B1, ... , r.Bm> | r R }∈ vertical subset• This operation keeps certain columns (attributes)
from a relation and discards the other columns.– PROJECT creates a vertical partitioning
• The list of specified columns (attributes) is kept in each tuple• The other attributes in each tuple are discarded
• Example: To list each employee’s first and last name and salary, the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)
Page 11Spring, 2006 KUT
Slide 6- 11
Binary Relational Operations: JOIN• JOIN Operation (denoted by )
– The sequence of CARTESIAN PRODECT followed by SELECT is used quite commonly to identify and select related tuples from two relations
– A special operation, called JOIN combines this sequence into a single operation
– This operation is very important for any relational database with more than a single relation, because it allows us combine related tuples from various relations
– The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:
R <join condition>S– where R and S can be any relations that result from
general relational algebra expressions.
Page 12Spring, 2006 KUT
Slide 6- 12
Some properties of JOIN
• Consider the following JOIN operation:– R(A1, A2, . . ., An) S(B1, B2, . . ., Bm)
R.Ai=S.Bj
– Result is a relation Q with degree n + m attributes:• Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
– The resulting relation state has one tuple for each combination of tuples—r from R and s from S, but only if they satisfy the join condition r[Ai]=s[Bj]
– Hence, if R has nR tuples, and S has nS tuples, then the join result will generally have less than nR * nS tuples.
– Only related tuples (based on the join condition) will appear in the result
Page 13Spring, 2006 KUT
13
Theta JOIN
For R(X), S(Y), A X, B Y ,∈ ∈R AθB S = { r · s | r R s S ( r.Aθs.B) }∈ ∧ ∈ ∧
– A, B : join attribute– θ can be any general boolean expression on the
attributes of R and S; for example:• R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)
Page 14Spring, 2006 KUT
Slide 6- 14
NATURAL JOIN Operation
• NATURAL JOIN Operation – Another variation of JOIN called NATURAL JOIN — denoted by N— was
created to get rid of the second (superfluous) attribute in an EQUIJOIN condition.
• because one of each pair of attributes with identical values is superfluous
• natural join: N)
If R(X), S(Y)’s join attribute is Z(=X∩Y)
R NS
= {<r · s>[X Y] | r R s S r[Z]∪ ∈ ∧ ∈ ∧ = s[Z] }
= X Y∪ (Z=Z(R×S))
= X Y∪ (R Z=ZS)
– The standard definition of natural join requires that the two join attributes, or each pair of corresponding join attributes, have the same name in both relations
Page 15Spring, 2006 KUT
• example: Q R(A,B,C,D) N S(C,D,E)
– The implicit join condition includes each pair of attributes with
the same name, “AND”ed together:
• R.C=S.C AND R.D.S.D
– Result keeps only one attribute of each such pair:
• Q(A,B,C,D,E)
NATURAL JOIN Operation
Page 16Spring, 2006 KUT
16
DIVISION: ÷ (1)• DIVISION Operation
– The division operation is applied to two relations R(X), S(Y) – R÷S = { t | t ∈ D(R) t · s R for all s S }, where Y∧ ∈ ∈ X.
Let D = X-Y(and hence X =D Y); that is, let D be the set of attributes of R that are not attributes of S.
– The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear in R with tR [Y] = t, and with
• tR [X] = ts for every tuple ts in S.
– For a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in combination with every tuple in S.
• Note : ((R ÷ S) × S) R⊆
Page 17Spring, 2006 KUT
학번(SNO)
과목번호(CNO)
100 C413100 E412
200 C123
300 C312
300 C324
300 C413400 C312400 C324
400 C413
400 E412
500 C312
과목번호(CNO)C413
과목번호(CNO)C312C413
과목번호(CNO)C312C413
E412
학번(SNO)100
300
400
학번(SNO)300
400
학번(SNO)400
학과목 (SC) 과목 1(C1) 과목 2(C2) 과목 3(C3)
SC ÷ C1 SC ÷ C2 SC ÷ C3
Page 18Spring, 2006 KUT
18
Extension of Relational Algebra(1)
ⅰ. semijoin: – R S: R’s tuples that can be natual join with S
• Let R(X), S(Y)’s join attribute be Z(=X∩Y), R S = R N( Z(S)) = X(R NS)
• Property– R S ≠ S R
– R NS = (R S) NS = (S R) NR
Page 19Spring, 2006 KUT
19
Natual Join & SemiJoin
R S R S
R S X∩Y(S)
( 자연 조인 )
( 세미 조인 )
A B C
a1a2a3a4
b1b1b1b2
c1c1c2c3
B C D
b1b1b2
c1c1c3
d1d2d3
B C
b1b2
c1c3
A B C D
a1a1a2a2a4
b1b1b1b1b2
c1c1c1c1c3
d1d2d1d2d3
A B C
a1a2a4
b1b1b2
c1c1c3
NN N
N
Page 20Spring, 2006 KUT
20
Extension of Relational Algebra(2)
• The OUTER JOIN Operation– In NATURAL JOIN and EQUIJOIN, tuples without a
matching (or related) tuple are eliminated from the join result
• Tuples with null in the join attributes are also eliminated• This amounts to loss of information.
– A set of operations, called OUTER joins, can be used when we want to keep all the tuples in R, or all those in S, or all those in both relations in the result of the join, regardless of whether or not they have matching tuples in the other relation.
ⅱ. outerjoin, +
Page 21Spring, 2006 KUT
• The left outer join operation keeps every tuple in the first or left relation R in R S; if no matching tuple is found in S, then the attributes of S in the join result are filled or “padded” with null values.
• A similar operation, right outer join, keeps every tuple in the second or right relation S in the result of R S.
• A third operation, full outer join, denoted by
or + , keeps all tuples in both the left and the right relations when no matching tuples are found, padding them with null values as needed.
Extension of Relational Algebra(3)
Page 22Spring, 2006 KUT
22
Natural Join and Outer JoinR S
(Outer Join) (Natural Join)
R S
+R S
A B C
a1a2a3a4
b1b1b1b2
c1c1c2c3
B C D
b1b1b2b3
c1c1c3c3
d1d2d3d3
A B C D
a1a1a2a2a3a4
b1b1b1b1b1b2b3
c1c1c1c1c2c3c3
d1d2d1d2
d3d3
A B C D
a1a1a2a2a4
b1b1b1b1b2
c1c1c1c1c3
d1d2d1d2d3
N
+
N
Page 23Spring, 2006 KUT
23
Extension of Relational Algebra(4)
ⅲ. outer-union, ∪+
– The outer union operation was developed to take the union of tuples from two relations if the relations are not type compatible.
– This operation will take the union of tuples in two relations R(X, Y) and S(X, Z) that are partially compatible, meaning that only some of their attributes, say X, are type compatible.
– The attributes that are type compatible are represented only once in the result, and those attributes that are not type compatible from either relation are also kept in the result relation T(X, Y, Z).
Page 24Spring, 2006 KUT
24
Outer UnionR S
A B C
a1a2a3a4
b1b1b1b2
c1c1c2c3
B C D
b1b1b2
c1c1c2
d1d2d3
A B C D
a1a2a3a4
b1b1b1b2b1b1b2
c1c1c2c3c1c1c2
d1d2d3
∪+
Page 25Spring, 2006 KUT
25
Extension of Relational Algebra(5)
• A type of request that cannot be expressed in the basic relational algebra is to specify mathematical aggregate functions on collections of values from the database.
• Examples of such functions include retrieving the average or total salary of all employees or the total number of employee tuples.
– These functions are used in simple statistical queries that summarize information from the database tuples.
• Common functions applied to collections of numeric values include– SUM, AVERAGE, MAXIMUM, and MINIMUM.
• The COUNT function is used for counting tuples or values.– AVGgrade(enroll)
• retrieves the average score value from the enroll relation– GROUPyear(student)
• Grouping student into subgroups with respect to year– GROUPcnoAVGscore(enroll)
• Retrieve the average score of each cno group of enroll– General Form : GAFB(E)
• E : Relational Algebra expression• F : aggregation fuction ( SUM, AVG, MAX, MIN, COUNT)• B : Aggregate attribute• G : GROUP Function• A :Group attribute
Page 26Spring, 2006 KUT
26
▶ Algebra Expression
• Retrieve all students’ name and dept
sname,dept (student)
• Retrieve name and score of a studuent who register C413
course
sname,score(cno='C413' (student NRegister))
• Retrieve name of a professor who teaches ‘database’
profname(cname=‘database'(course))
Page 27Spring, 2006 KUT
SQL
• SQL– A standard language for RDB– Based on relational algebra and calculus
• Features of SQL– No-procedural language
• SQL language : handling a set of data satisfying the conditions
– Interactive or embedded
Page 28Spring, 2006 KUT
Sample Table
학번(SNO)
이름(SNANE)
학년(YEAR)
학과(DEPT)
100 나 연 묵 4 컴퓨터200 이 찬 영 3 전기300 정 기 태 1 컴퓨터400 송 병 호 4 컴퓨터500 박 종 화 2 산공
학생(STUDENT)
과목번호(CNO)
과목이름(CNANE)
학점(CREDIT)
학과(DEPT)
C123 프로그래밍 3 컴퓨터C312 자료 구조 3 컴퓨터C324 파일 처리 3 컴퓨터C413 데이타 베이스 3 컴퓨터C412 반 도 체 3 전자
담당교수(PRNAME)
김성기황수찬이규철이석호홍봉희
과목(COURSE)
Page 29Spring, 2006 KUT
Basic of SQL
• SELECT statement– Retrieve data from table– SELECT clause and FROM clause are mandatory
in SELECT statement– WHERE clause is optional to describe the
condition
SELECT SNAME
FROM STUDENT학번(SNO)
이름(SNANE)
학년(YEAR)
학과(DEPT)
100 나 연 묵 4 컴퓨터200 이 찬 영 3 전기300 정 기 태 1 컴퓨터400 송 병 호 4 컴퓨터500 박 종 화 2 산공
학생(STUDENT)
Page 30Spring, 2006 KUT
SQL 의 기초SELECT SNAME
FROM STUDENT
WHERE YEAR =4
학번(SNO)
이름(SNANE)
학년(YEAR)
학과(DEPT)
100 나 연 묵 4 컴퓨터200 이 찬 영 3 전기300 정 기 태 1 컴퓨터400 송 병 호 4 컴퓨터500 박 종 화 2 산공
학생(STUDENT)
Page 31Spring, 2006 KUT
SQL• UNION :
– SQL ex : SELECT a FROM R UNION SELECT b FROM S;
• INTERSECT : – SQL ex : SELECT a FROM R INTERSECT SELECT b FROM S;
• DIFFERENCE : – SQL ex : SELECT a FROM R MINUS SELECT b FROM S;
• PRODUCT : – SQL ex : SELECT a, b FROM R, S;
• RESTRICTION(SELECTION) : – SQL ex : SELECT * FROM R WHERE r.A=10;
• PROJECTION :– SQL ex : SELECT r.A1, r.A2 FROM R;
• JOIN– SQL ex : SELECT r.A, r.B FROM R, S WHERE r.A = s.B;
Page 32Spring, 2006 KUT
Query Processing
• Index– Random data(tuple) access inefficient
– Additional data structure
Page 33Spring, 2006 KUT
33
keyaddressK1K2K3
Index file Data File
▶Index method• indexed file consists of
– index file– data file
Page 34Spring, 2006 KUT
Traditional Index Structure
• B-Tree• B+-Tree
Page 35Spring, 2006 KUT
35
index : (1) B-tree• B-tree (degree = m)
– m-way search tree1. Except root and leaf, the number of subtrees of internal
node is at least ⌈m/2 , at most, m⌉1. at most, the number of key is ⌈m/2 -1⌉
2. if root is not a leaf, root has two subtree ats least.3. all leaf is same level
• balanced tree
Note: degree is the maximum number of subtrees
Page 36Spring, 2006 KUT
36
3-nary B-tree
18 20 30 36 42 50 58 62 65 70 110 120 130 136 140 1507 15
19
16 26 60 100 132 145
43 128 138
69 ^
^ 40 ^ ^ ^ ^
a
b c
d e f g h i
j k l m n o p q r s t u v
Page 37Spring, 2006 KUT
37
▶ Operation(1)• B-tree
– random access : branch by search key– sequential access : in order traversal– Insert/delete : keep balance
• split : by node overflow
• merge : by node underflow
• Insert– Insert done at leaf node
• has free space : simple insertion
• overflow(no free space) there m keys in a leaf node
1) split
2) m/2 th key insert parent node
3) remains left, right
Page 38Spring, 2006 KUT
38
Example • 59 insert
• 57 insert
50 50
o o’
57
60
58
f f
o
50 58
^p
60
50
o
b
59
o’p
b
Page 39Spring, 2006 KUT
39
Example• splite by insert 54
• in Parent node f, insert 54
60 58 60 54
f f f’
o o’ p o o’’ o’ p
50 50
o o’ o’’
57 57 54 goes to parent node f
58 goes to parent node b
^ ^
Page 40Spring, 2006 KUT
40
Example• parent node b, insert 58
• parent node a, insert 43
69
a
b c
43 69
a
b b’ c
^
5819 43
19
b b b’
d e f d e f f’
43 goesto parent a
^ ^
Page 41Spring, 2006 KUT
41
▶ Example (2)
• Delete– Delete is done at leaf node– Deletion key is not in leaf node
• swap with following key • deletion
– if # of key < m/2 -1, underflow⌈ ⌉1. redistribution
– sibling node having keys whose number >= m/2 ⌈ ⌉ (parent node key → underflow node key)
(sibling node key →parent node key)
merge– can not redistribution
(sibling node + parent node + underflow node)
Page 42Spring, 2006 KUT
42
Example• delete 60
• delete 20
60 62
50 6562 50 6560
b b
62
50 65
b
f f f
o o op p p5050
26
20 3630
b
e
l m
40
42n
30
26 36
b
e
l m
40
42n
Page 43Spring, 2006 KUT
B-Tree insertion (m =5)
• Insert 77 72 84
74 75 76 78 2 7 40 89 90 91
72 76 84
74 75 2 7 40 89 90 91 77 78
Split
Page 44Spring, 2006 KUT
B-Tree Deletion (m = 5)
• Delete 84 72 76 84
74 75 2 7 40 89 90 91 77 78
72 76 89
74 75 2 7 40 84 90 91 77 78
Swap & Delete
Page 45Spring, 2006 KUT
B-Tree Deletion
• Delete 74 72 76 89
74 75 2 7 40 90 91 77 78
72 76 89
74 75 2 7 40 90 91 77 78
Underflow 발생
Page 46Spring, 2006 KUT
B-Tree Deletion Redistribution Using A Adjacent Sibling whose number of key greater than or
equal to ceiling(m/2)
72 76 89
74 75 2 7 40 90 91 77 78
40 76 89
72 75 2 7 90 91 77 78
{2,7, 40, 72, 75} is redistributed , [m/2] th value ( 즉 , 40) go to parent node
Page 47Spring, 2006 KUT
B-Tree Deleton
• Delete 40 40 76 89
72 75 2 7 90 91 77 78
72 76 89
40 75 2 7 90 91 77 78
Swapcannot Redistribution
Page 48Spring, 2006 KUT
B-Tree Deletion
72 76 89
40 75 2 7 90 91 77 78
72 89
2 7 90 91 75 76 77 78
merge with right sibling and parent
Page 49Spring, 2006 KUT
49
(2) B+-Tree• B+-tree consists of index set and sequence set
1. index set– consists of internal node
– support access path to leaf nodes
– support direct access
2. sequence set– consists of leaf nodes
– leaf nodes store whole keys support sequential access
– leaf node and internal node has different structures
Page 50Spring, 2006 KUT
50
▶ B+-Tree(2)• B+-tree with degree m
– node structure < n, P0, K1, P1, K2, P2, … , Pn-1, Kn, Pn>
• n : # of keys ( 1≤n<m )
• P0, …, Pn :pointer to subtree
• K1, …, Kn : key value
– root has 0, 2~m subtrees– Except root and leaf, internal node has ⌈m/2 ~⌉ m
subtrees– All leaf nodes are same level– key values in nodes is ascending order
Page 51Spring, 2006 KUT
Slide 14- 51
Difference between B-tree and B+-tree
• In a B-tree, pointers to data records exist at all levels of the tree
• In a B+-tree, all pointers to data records exists at the leaf-level nodes
• A B+-tree can have less levels (or higher capacity of search values) than the corresponding B-tree
Page 52Spring, 2006 KUT
52
▶ B+-tree operation• search
– B+-tree index set : m-nary search tree– Record is obtained at leaf node
• Insert– similar to B-tree
• Delete– done at leaf (when redistribution/merge)
• key in index set is not deleted∵ it act as seperator access path
– redristribution: change key in index set– merge : delete key in index set
Page 53Spring, 2006 KUT
53
B+-tree with degree 3
Index
set
sequence set
69
20 11043
2015 4035 43 6955 9070 110 125120
a
b c
d e f g h
Page 54Spring, 2006 KUT
54
Example• B+-Tree , delete 43 43 in index set is not removed
• delete 125 (underflow redistribution)
69
20 11043
2015 4035 6955 9070 110 125120
69
20 90
2015 4035 6955 9070 120110
43
Page 55Spring, 2006 KUT
55
Exapme• delete 55(under flow merge)
69
20 90
2015 4035 9070 11069 120