CG096 Advanced Database Technologies
Lecture 1Relational Algebra
and SQL
CG096 Advanced Database Technologies Lecture 1: Relational Languages 2
Content
1 Models, Languages and their Use in databases
2 The Relational Algebra3 The Relational Calculus4 Implementation of relational algebraic
operations in SQL
CG096 Advanced Database Technologies Lecture 1: Relational Languages 3
1. Models, Languages and their Use
Data model and Data Specification: studying global properties of the data and the operations for processing it
Example: Consistency of the query, completeness of the answer
Database and Data Processing: processing of particular piece of data stored within the database using particular operation
Example: Memory storage for the result, complexity of the query
Information system and Data Communication: sending prescriptions for data processing from a particular point in space to the database
Example: Access to the data, Security of the operation specification
CG096 Advanced Database Technologies Lecture 1: Relational Languages 4
1.1 Relational and other data models
Relational models (relational algebra, relational calculus) – most of the contemporary RDBMS are based on them
Tree models (hierarchical, object-relational) – both legacy systems and new systems use them
Object-oriented models (ODMG) – recent development, still not widely employed
Note: XML native databases have some similarities with the hierarchical database systems (legacy systems), but they have more elaborated model and query languages, which are close to OQL (the standard query language of object-oriented databases)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 5
1.2 Relational Languages and their Use
Data Manipulation Language (DML) Use: Populates, updates, and queries relational DB Example: relational algebra, SQL DML
Data Definition Language (DDL) Use: Specifies the data structures and defines the relational
schema Example: domain calculus, SQL DDL
Data Control Language (DCL) Use: Specifies operation permissions, resource access discipline
and user profiles Example: SQL DCL, LDAP
Note: Contemporary relational languages often incorporate some object-relational features of the model – e.g. Oracle 8i SQL has types, Oracle 9i SQL has type inheritance
CG096 Advanced Database Technologies Lecture 1: Relational Languages 6
1.3 Can DB live without formal model?
The answer is NO for several reasons: as we will see, SQL has ambiguities, while the
relational algebra is unambiguous – so it can provide semantic interpretation for SQL
Moreover, because of the same reason SQL cannot be executed directly, it needs to be translated into a realistic structure of operations first, which can be interpreted then
Finally, if we want to control the execution of the SQL statements, we need to know how it works
CG096 Advanced Database Technologies Lecture 1: Relational Languages 7
2 The Relational Algebra Proposed by Codd in 1970 as a formal data model. Describes the
relations and the operations to manipulate relations Relational operations in relational algebra transform either a single relation
(unary operation), or a pair (binary operation) into another relation
Can be also used to specify retrieval requests (queries). Query result is also in the form of a relation.
Relational Operations: RESTRICT () and PROJECT () operations. Set operations: UNION (), INTERSECTION (), DIFFERENCE (—),
CARTESIAN PRODUCT (). JOIN operations (⋈). Other relational operations: DIVISION, OUTER JOIN,
AGGREGATE FUNCTIONS.
CG096 Advanced Database Technologies Lecture 1: Relational Languages 8
CG096 Advanced Database Technologies Lecture 1: Relational Languages 9
2.1 RESTRICT RESTRICT operation (called also SELECT- denoted by ):
Selects the tuples (rows) from a relation R that satisfy a certain selection condition c on the attributes of R : c(R)
Resulting relation includes each tuple in R whose attribute values satisfy c, i.e. it has the same attributes as R
Examples: DNO=4(EMPLOYEE)
SALARY>30000(EMPLOYEE)
(DNO=4 AND SALARY>25000) OR DNO=5(EMPLOYEE)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 10
PROJECT operation (denoted by ): Keeps only certain attributes (columns) from a relation R specified in an attribute
list L: L(R) Resulting relation has only those attributes of R specified in L
Example: FNAME,LNAME,SALARY(EMPLOYEE) The PROJECT operation eliminates duplicate tuples in the resulting relation so
that it remains a true set (no duplicate elements)
Example: SEX,SALARY(EMPLOYEE) If several male employees have salary 30000, only a single tuple
<M, 30000> is kept in the resulting relation.
2.2 PROJECT
CG096 Advanced Database Technologies Lecture 1: Relational Languages 11
CG096 Advanced Database Technologies Lecture 1: Relational Languages 12
2.3 Combining Operations Because of closure, several operations can be combined to form a
relational algebra expression. For example, the names and salaries of employees in Department 4:
FNAME,LNAME,SALARY(DNO=4(EMPLOYEE))
Alternatively, we could specify explicit intermediate relations for each step:
TEMP DNO=4(EMPLOYEE) R FNAME,LNAME,SALARY(TEMP)
Attributes can optionally be renamed in a left-hand-side relation (this may be required for some operations that will be presented later), e.g.R (FIRSTNAME,LASTNAME,SALARY) FNAME,LNAME,SALARY(TEMP)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 13
CG096 Advanced Database Technologies Lecture 1: Relational Languages 14
2.4 Set Operations Binary operations from set theory: UNION: R1 R2,INTERSECTION: R1 R2,
DIFFERENCE: R1 — R2,
For , , —, the operand relations R1(A1, ..., An) and R2(B1, ..., Bn) must have the same number of attributes, and the domains of attributes must be compatible; that is, dom(Ai)=dom(Bi) for i=1, 2, ..., n. This condition is called union compatibility.
The resulting relation for , , or — has the same attribute names as the first operand relation R1 (by convention).
CG096 Advanced Database Technologies Lecture 1: Relational Languages 15
CG096 Advanced Database Technologies Lecture 1: Relational Languages 16
CARTESIAN PRODUCTR(A1, A2, ..., Am, B1, ..., Bn) R1(A1, A2, ..., Am) R2 (B1, ..., Bn)
A tuple t exists in R for each combination of tuples t1 from R1 and t2 from R2 such that:
t [A1, A2, ..., Am] = t1 and t [B1, B2, ..., Bn] = t2
The resulting relation R has m1 + m2 columns If R1 has n1 tuples and R2 has n2 tuples, then R will have n1*n2 tuples.
More Set Operations
A B C D E1 2 32 3 43 4 5
a bb c
X1 2 3 a b1 2 3 b c2 3 4 a b2 3 4 b c3 4 5 a b3 4 5 b c
3 attributes + 2 attributes = 5 attributes
3 tuples * 2 tuples = 6 tuples
=R1 R2 A B C D ER
CG096 Advanced Database Technologies Lecture 1: Relational Languages 17
Obviously, CARTESIAN PRODUCT is useless if alone, since it generates all possible combinations. It can combine related tuples from two relations in a more informative way if followed by the appropriate RESTRICT operation Example: Retrieve a list of the names of dependents for each female employeeFEMALE_EMPS SEX=‘F’(EMPLOYEE)EMPNAMES FNAME,LNAME,SSN(FEMALE_EMPS)EMP_DEPENDENTS EMPNAMES DEPENDENT ACTUAL_DEPENDENTS SSN=ESSN(EMP_DEPENDENTS) RESULT FNAME,LNAME,DEPENDENT_NAME(ACTUAL_DEPENDENTS)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 18
CG096 Advanced Database Technologies Lecture 1: Relational Languages 19
2.5 JOIN Operations THETA JOIN: Similar to a CARTESIAN PRODUCT followed by a
RESTRICT. The condition c is called a join condition.R(A1, A2, ..., Am, B1, B2, ..., Bn)
R1(A1, A2, ..., Am) ⋈ c R2 (B1, B2, ..., Bn) EQUIJOIN: The join condition c includes equality comparisons involving
attributes from R1 and R2. That is, c is of the form:
(Ai=Bj) AND ... AND (Ah=Bk); 1<i,h<m, 1<j,k<n
In the above EQUIJOIN operation: Ai, ..., Ah are called the join attributes of R1
Bj, ..., Bk are called the join attributes of R2
Example: Retrieve each department's name and its manager's name:
T DEPARTMENT ⋈MGRSSN=SSN EMPLOYEE
RESULT DNAME,FNAME,LNAME(T)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 20
DEPT_MGR DNAME,…,MGRSSN,…LNAME,…,SSN… (DEPARTMENT ⋈ MGRSSN=SSN EMPLOYEE)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 21
In an EQUIJOIN R R1 ⋈c R2, the join attribute of R2 appear redundantly in the result relation R. In a NATURAL JOIN, the join attributes of R2 are eliminated from R. The equality is implied and there is no need to specify it. The form of the operator is
R R1 * R2
Example: Retrieve each project's details along with the details of its department:
Step 1: (Rename DNUMBER to DNUM)
DEPT (DNAME, DNUM, MGRSSN, MGRSTARTDATE)
DNAME, DNUMBER, MGRSSN, MGRSTARTDATE (DEPARTMENT)
Step 2: (Now both DEPT and PROJECT have DNUM)
PROJ_DEPT PROJECT * DEPT
Natural Join
CG096 Advanced Database Technologies Lecture 1: Relational Languages 22
CG096 Advanced Database Technologies Lecture 1: Relational Languages 23
Example: Retrieve each employee’s name and the name of the department he/she works for:
T EMPLOYEE ⋈ DNO=DNUMBER or SSN=MGRSSN DEPARTMENT
RESULT FNAME,LNAME,DNAME(T)
Multiple Join
JOIN ATTRIBUTES RELATIONSHIP
EMPLOYEE.SSN = DEPARTMENT.MGRSSN
EMPLOYEE manages the DEPARTMENT
EMPLOYEE.DNO = DEPARTMENT.DNUMBER
EMPLOYEE works in the DEPARTMENT
CG096 Advanced Database Technologies Lecture 1: Relational Languages 24
A relation can have a set of join attributes to join it with itself :
JOIN ATTRIBUTES RELATIONSHIPEMPLOYEE(1).SUPERSSN= EMPLOYEE(2) supervises
EMPLOYEE(2).SSN EMPLOYEE(1)
One can think of this as joining two distinct copies of the relation, although only one relation actually exists
In this case, renaming can be useful
Example: Retrieve each employee’s name and the name of his/her supervisor:
SUPERVISOR(SSSN,SFN,SLN) SSN,FNAME,LNAME(EMPLOYEE)
T EMPLOYEE ⋈SUPERSSN = SSSNSUPERVISORRESULT FNAME,LNAME,SFN,SLN(T)
Self Join
CG096 Advanced Database Technologies Lecture 1: Relational Languages 25
All the operations discussed so far can be described as a sequence of only the operations RESTRICT, PROJECT, UNION, SET DIFFERENCE, and CARTESIAN PRODUCT.
Hence, the set { , , ,—, } is called a complete set of relational algebra operations. Any query language equivalent to these operations is called relationally complete.
For database applications, additional operations are needed that were not part of the original relational algebra. These include:
1. Aggregate functions and grouping. 2. OUTER JOIN and OUTER UNION.
2.6 Complete Set of Operations
CG096 Advanced Database Technologies Lecture 1: Relational Languages 26
2.7 Additional Relational Operations AGGREGATIONS
Functions such as SUM, COUNT, AVERAGE, MIN, MAX are often applied to sets of tuples and aggregate the result through iterative application of the
functions to certain attributes of the relation<grouping attributes> <function list> (R)
Note: The grouping attributes are optional – if they are not present, there will be only one group of values consisting of the entire relation
Example: Retrieve the average salary of all employees (no grouping):R(AVGSAL) AVERAGE SALARY (EMPLOYEE)
Example: For each department, retrieve the department number, the number of employees, and the average salary (grouping by department, counting the employees and averaging the salary in each group):
R(DNO,NUMEMPS,AVGSAL) DNO COUNT SSN, AVERAGE SALARY (EMPLOYEE)
In this example DNO, NUMEMPS, AVGSAL are grouping attributes
CG096 Advanced Database Technologies Lecture 1: Relational Languages 27
CG096 Advanced Database Technologies Lecture 1: Relational Languages 28
2.8. More Relational Operations OUTER JOIN
In a regular EQUIJOIN or NATURAL JOIN operation, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. Some queries require all tuples in R1 (or R2 or both) to appear in the result
When no matching tuples are found, nulls are placed for the missing attributes
LEFT OUTER JOIN: R1 R2 lets every tuple in R1 appear in the result
RIGHT OUTER JOIN: R1 R2 lets every tuple in R2 appear in the result
FULL OUTER JOIN: R1 R2 lets every tuple in both R1 and R2 appear in the result
CG096 Advanced Database Technologies Lecture 1: Relational Languages 29
TEMP EMPLOYEE SSN=MGRSSNDEPARTMENT RESULT FNAME,MINIT,LNAME,DNAME(TEMP)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 30
3 Structured Query Language – SQL
A standard language for working with relational databases; mixture of DDL, DML and DCL constructs
It is a specific database language, not general-purpose programming language; all its constructs are primarily for manipulating tables, rows, columns, database schemes and users - not for direct processing of the data
There are several SQL standards; among them SQL-92 (SQL2) and SQL-99 (SQL3) are the most widely used, and SQL-92 is still regarded up-to-date; Different vendors implement them to certain levels of compliance – for example, Oracle 8i is SQL2 compliant, while Oracle 9i is SQL3 compliant
SQL is purely declarative; procedural extensions of SQL exist, but they are vendor-specific (e.g. Oracle PL/SQL, Microsoft Transact SQL, Informix 4GL, etc.)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 31
3.1 SQL DMLFor querying and manipulating relational databases; its constructs are recognised by the first keyword - SELECT, INSERT, UPDATE or DELETE
The SELECT statements implement the relational algebra operations; All relational operations can be expressed in a standard SQL implementation using SELECT statements only
INSERT statements are used to add tuples to the relations, defined by the relational schema
DELETE statements exclude tuples from the relations UPDATE change some of the attributes of the existing tuples
CG096 Advanced Database Technologies Lecture 1: Relational Languages 32
3.2 SELECT- statement The general form of this statement in SQL is
SELECT <list-of-columns> FROM <list-of-tables> WHERE <conditions-on-the-tables>
The result from executing the statement is directed to the system output by default. It corresponds to the following form
<algebraic expression>
Example: The following statement
SELECT e.name, a.addressFROM EMP e, ADDR aWHERE a.no = e.no
corresponds to the algebraic expression
name,address(EMP * ADDR)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 33
In some cases (e.g. in Oracle extended SQL), the query result can be stored
SELECT <list-of-columns> INTO <storage-place>FROM <list-of-tables> WHERE <conditions-on-the-tables>
In this case, the result is stored into <storage-place> and the statement corresponds to the following algebraic expression
<relation> <algebraic expression>
where <relation> is the relation calculated as a result of the expression
3.2 SELECT- statement …
CG096 Advanced Database Technologies Lecture 1: Relational Languages 34
There are variations of these templates, which support the implementation of relational operations through varying of different clauses in the statement:
SELECT clause can explicitly point at the required columns; alternatively, it can contain special symbol *, which projects full rows without skip of any columns in them
FROM clause can contain more then one table, which allows to implement both unary and binary operations
WHERE clause in the statement can be used to restrict the rows to be selected using additional conditions on the columns
INTO clause, when present, points to a data storage which should store the result from the query; it should have the same structure, as the expected result
3.2 SELECT- statement …
CG096 Advanced Database Technologies Lecture 1: Relational Languages 35
3.3 Simulating relational operations All relational algebra operations are simulated in SQL using SELECT
statements only. For this purpose the following variations of different parameters in the SELECT clauses can be used:
The list of attributes to be selected in the SELECT clause The number of tables to be looked into in the FROM clause The conditions specified in the WHERE clause The resulting relation given INTO clause is present
There is no structural correspondence between the expressions of relational algebra and the SQL expressions; in a single SQL statement the following combinations are possible:
single unary operator applied to one single relation single binary operator applied to pair of relations sequence of unary operators applied inner side out to one relation and
all the intermediate results combinations of binary and unary operators applied to several
relations and the intermediate results according nesting rules
CG096 Advanced Database Technologies Lecture 1: Relational Languages 36
The SELECT statements of SQL when applied to one table only correspond to three possible combinations of the relational algebra operators (RESTRICT) and (PROJECT)
Examples: List of all student ids, names and addressesSELECT StudentId, Name, Address FROM Student StudentId, Name, AddressSTUDENT
Full information about the students in Business ComputingSELECT * FROM Student WHERE CourseName = ‘Business Computing’ CourseName = ‘Business Computing’STUDENT
List of ids and names for students in Business ComputingSELECT StudentId, Name FROM Student WHERE CourseName = ‘Business Computing’StudentId, Name CourseName = ‘Business Computing’STUDENT
Restriction and Projection
CG096 Advanced Database Technologies Lecture 1: Relational Languages 37
When applied to two tables simultaneously without WHERE clause, the SELECT statements correspond to combinations of the unary relational algebra operator with the binary operator
Examples: Full information about students and courses
SELECT * FROM Student, Course OR SELECT * FROM Student CROSS JOIN Course
RA: STUDENT COURSE List of ids for both students and lecturers
SELECT StudentId, LecturerId FROM Student, LecturerRA: StudentId, LecturerId STUDENT LECTURER
Note: When more than one tables are in a single SELECT statement, to avoid ambiguity when referring to columns with the same names we precede them with the names of respective tables.
• Names of all students and lecturersSELECT STUDENT.Name, LECTURER.NameFROM STUDENT CROSS JOIN LECTURER
Cartesian Product
CG096 Advanced Database Technologies Lecture 1: Relational Languages 38
When applied to two tables simultaneously, the SELECT statements with WHERE clause correspond to combination of the two unary relational algebra operators (RESTRICT) and (PROJECT) with proper binary operators
Examples: Full information about the students and their courses
SELECT * FROM Student, CourseWHERE Student.Course = Course.Name
ORSELECT * FROM Student INNER JOIN CourseON Student.Course = Course.NameRA: STUDENT ⋈ Course = Name COURSE
List of student names with their course leadersSELECT StudentName, CourseLeader FROM Student NATURAL JOIN Course RA: StudentName, CourseLeaderSTUDENT * COURSE
Inner Joins
CG096 Advanced Database Technologies Lecture 1: Relational Languages 39
Full information about students from Business computing with details about the course SELECT * FROM Student INNER JOIN Course ON Student.Course = Course.Name WHERE Course.CourseName = ‘Business Computing’ RA: CourseName = ‘Business Computing’
(STUDENT ⋈ Course = Name COURSE)
Note: In order to avoid ambiguity when referring to columns from different tables which have the same names, we should qualify them explicitly in the WHERE clause. This can be done through preceding their names with aliases of the respective tables.
The names of all unit leaders of current units together with the names of the units themselves SELECT l.Name, u.Name FROM LECTURER l INNER JOIN UNIT u ON u.Leader = l.Name WHERE u.Status = ‘Current’
CG096 Advanced Database Technologies Lecture 1: Relational Languages 40
The tuples returned as answers of two independent queries can be combined in a single relation using the set operators (UNION) - for the union of two relations and (DIFFERENCE or MINUS) - for the difference between them
Examples: (using Oracle SQL Syntax) Full details for students in both Mathematics and Computing
SELECT * FROM Student WHERE Course = ‘Computing’ UNION SELECT * FROM Student
WHERE Course = ‘Mathematics’RA: Course = ‘Computing’STUDENT Course = ‘Mathematics’STUDENT
Full details for students in Computing after first yearSELECT * FROM Student
WHERE Course = ‘Computing’ MINUS SELECT * FROM Student
WHERE Level = 1 RA: Course = ‘Computing’STUDENT Level = 1STUDENT
Set Operations
CG096 Advanced Database Technologies Lecture 1: Relational Languages 41
Outer joins are not always directly expressible in the commercial SQL implementations. But they can be modelled if needed
Examples: (using Oracle SQL syntax) List of unit leaders with their units, including leaders still without
assigned units SELECT LecturerName, UnitName FROM Lecturer, Unit WHERE Lecturer.UnitName = Unit.UnitName (+)
RA: LECTURER UnitName = UnitName UNIT List of units with their leaders, including units still without
appointed leaders SELECT UnitName, LecturerName FROM Lecturer, Unit WHERE Lecturer.UnitName (+) = Unit.UnitName RA: LECTURER UnitName = UnitName UNIT
Outer Joins (Oracle 8i)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 42
Combined list of units and leaders, including units without leaders and leaders without units (full outer join)SELECT Lecturer.UnitName, Unit.UnitName FROM Lecturer, UnitWHERE Lecturer.UnitName = Unit.UnitName(+) UNIONSELECT Lecturer.UnitName, Unit.UnitName FROM Lecturer, UnitWHERE Lecturer.UnitName (+) = Unit.UnitName
RA: LECTURER UnitName = UnitName UNIT
CG096 Advanced Database Technologies Lecture 1: Relational Languages 43
Outer joins are not always directly expressible in the commercial SQL implementations. But they can be modelled if needed
Examples: (using Oracle SQL syntax) List of unit leaders with their units, including leaders still without
assigned units SELECT LecturerName, UnitName FROM Lecturer LEFT OUTER JOIN Unit ON Lecturer.UnitName = Unit.UnitName
RA: LECTURER UnitName = UnitName UNIT List of units with their leaders, including units still without
appointed leaders SELECT UnitName, LecturerName FROM Lecturer RIGHT OUTER JOIN Unit WHERE Lecturer.UnitName = Unit.UnitName RA: LECTURER UnitName = UnitName UNIT
Outer Joins (Oracle 9i)
CG096 Advanced Database Technologies Lecture 1: Relational Languages 44
Combined list of units and leaders, including units without leaders and leaders without units (full outer join)SELECT Lecturer.UnitName, Unit.UnitName FROM Lecturer FULL OUTER JOIN UnitON Lecturer.UnitName = Unit.UnitName
RA:
When more than two tables are queried, the SQL operator corresponds to combination of unary and binary operations
Examples: (using Oracle SQL syntax) List of students in Computing after first year
SELECT StudentId, StudentName FROM Student WHERE CourseName= ‘Computing’MINUS SELECT StudentId, StudentName FROM Student
WHERE CourseLevel = 1RA: CourseName = ‘Computing’StudentId, StudentNameSTUDENT
CourseLevel = 1StudentId, StudentNameSTUDENT
LECTURER UnitName = UnitName UNIT
CG096 Advanced Database Technologies Lecture 1: Relational Languages 45
Note: According to relational algebra theory, the relations do not contain duplicated tuples. However, the result from an SQL query could contain several identical rows. If we wish to have a true relation as a result instead, the keyword DISTINCT should be specified after SELECT to indicate eliminating of possible duplications.
All units which have been subscribed by the students without duplicates
SELECT DISTINCT e.UnitFROM STUDENT s, ENROLMENT eWHERE e.Student = s.Name
CG096 Advanced Database Technologies Lecture 1: Relational Languages 46
SummaryRelational Algebra
Formal languages have unambiguous syntax and clear semantics, which makes them good for specifications
Formal languages hide the details of implementation and can’t give very deep insight into the real DBMS
Formal languages formalize adequately only part of the database operations, which have semantics inherited from the relational model; they do not cover DCL at al.
SQL SQL syntax is ambiguous about the order of operations, which requires
knowledge of the way it is interpreted SQL is not a programming language, but language for communication
with DBMS and still can’t give very deep insight into the real data processing
SQL as a mixture of DDL, DML and DCL is the ultimate choice for practical relational databases
CG096 Advanced Database Technologies Lecture 1: Relational Languages 47
3.1 Summary of RA vs. SQL
Top Related