SD202: Databases - moodle.r2.enst.fr
Transcript of SD202: Databases - moodle.r2.enst.fr
Pedagogical Team
Louis Jachiet
Associate Professor
Antoine Amarilli
Associate Professor
Nedeljko Radulovic
PhD
Louis JACHIET 2 / 57
Why databases are everywhere?
A database can handle everything data-related:
• It structures the data.
• Stores it in a persistent way.
• Allows the efficient retrieval of data.
• Allows the updating while checking integrity constraints.
• It deals with controlled and concurrent access.
• It separates data from a specific application:
- several applications can use it
- dedicated monitoring and administration tools.
Louis JACHIET 4 / 57
Why databases are everywhere?
A database can handle everything data-related:
• It structures the data.
• Stores it in a persistent way.
• Allows the efficient retrieval of data.
• Allows the updating while checking integrity constraints.
• It deals with controlled and concurrent access.
• It separates data from a specific application:
- several applications can use it
- dedicated monitoring and administration tools.
Louis JACHIET 4 / 57
Example 1
The database for a train company has to deal with:
• Structured data for trains, seats, clients, bookings, prices
• Never lose any data
• Efficiently retrieve data even for complex queries (e.g. what are
the available seats in train from Paris to Lille?)
• Regular updates (e.g. booking a seat)
• Several applications using the data with several views: the web-
site, the agents, the inspectors
• Lots of updates (several per seconds)
Louis JACHIET 5 / 57
Example 1
The database for a train company has to deal with:
• Structured data for trains, seats, clients, bookings, prices
• Never lose any data
• Efficiently retrieve data even for complex queries (e.g. what are
the available seats in train from Paris to Lille?)
• Regular updates (e.g. booking a seat)
• Several applications using the data with several views: the web-
site, the agents, the inspectors
• Lots of updates (several per seconds)
Louis JACHIET 5 / 57
Example 1
The database for a train company has to deal with:
• Structured data for trains, seats, clients, bookings, prices
• Never lose any data
• Efficiently retrieve data even for complex queries (e.g. what are
the available seats in train from Paris to Lille?)
• Regular updates (e.g. booking a seat)
• Several applications using the data with several views: the web-
site, the agents, the inspectors
• Lots of updates (several per seconds)
Louis JACHIET 5 / 57
Example 1
The database for a train company has to deal with:
• Structured data for trains, seats, clients, bookings, prices
• Never lose any data
• Efficiently retrieve data even for complex queries (e.g. what are
the available seats in train from Paris to Lille?)
• Regular updates (e.g. booking a seat)
• Several applications using the data with several views: the web-
site, the agents, the inspectors
• Lots of updates (several per seconds)
Louis JACHIET 5 / 57
Example 1
The database for a train company has to deal with:
• Structured data for trains, seats, clients, bookings, prices
• Never lose any data
• Efficiently retrieve data even for complex queries (e.g. what are
the available seats in train from Paris to Lille?)
• Regular updates (e.g. booking a seat)
• Several applications using the data with several views: the web-
site, the agents, the inspectors
• Lots of updates (several per seconds)
Louis JACHIET 5 / 57
Example 1
The database for a train company has to deal with:
• Structured data for trains, seats, clients, bookings, prices
• Never lose any data
• Efficiently retrieve data even for complex queries (e.g. what are
the available seats in train from Paris to Lille?)
• Regular updates (e.g. booking a seat)
• Several applications using the data with several views: the web-
site, the agents, the inspectors
• Lots of updates (several per seconds)
Louis JACHIET 5 / 57
Example 2
The database for a Synapse like system has to deal with:
• Structured data for courses, rooms, students, curriculums,
grades, etc.
• Never loose any data
• Need to efficiently retrieve data even for complex queries (e.g.
what are the available room on this particular date?)
• Can update data easily (e.g. book )
• Handle constraints (e.g. we have only grades for students reg-
istered in a class)
• Many updates
Louis JACHIET 6 / 57
Example 3
Your favorite social network database to deal with:
• Structured data for users, pages, groups, etc.
• Prefers not to loose any data
• Complex queries
• LOTS of updates
• Concurrent updates
Louis JACHIET 7 / 57
Naive implementation: what needs to be done
• Choose your favorite language and use the classical
datastructures to represent data
• Design a file format system to store data on disk and a
synchronization mechanism between your data in-memory and
the data on disk
• Find the efficient datastructures and algorithms that will help
solve efficiently all of the queries (e.g. B-trees, Tries,
Hashtables).
• Handles updates with the synchronization, the persistence,
etc.
• Invent a control rights system that fits your use-case
• Implement an API so that the various applications that needs
to access data can do it.
• Be future proof
Louis JACHIET 8 / 57
Naive implementation
You need to be an expert in:
• POO,
• Serialization
• Disaster recovery
• Datastructures and algorithms
• Integrity constraints
• Right systems
• Networking
• Parallelism
Or, be an expert in databases!
Louis JACHIET 9 / 57
Naive implementation
You need to be an expert in:
• POO,
• Serialization
• Disaster recovery
• Datastructures and algorithms
• Integrity constraints
• Right systems
• Networking
• Parallelism
Or, be an expert in databases!
Louis JACHIET 9 / 57
What is a DataBase Management Systems?
Codd’s Definition
A DBMS should provide the following functions:
• Data storage, retrieval and update
• User accessible catalog or data dictionary describing the meta-
data
• Support for transactions and concurrency
• Facilities for recovering the database should it become damaged
• Support for authorization of access and update of data
• Access support from remote locations
• Enforcing constraints to ensure data in the database abides by
certain rules
Louis JACHIET 10 / 57
What is a DataBase Management Systems?
In practice
Databases come in all forms and shapes but it is a tool designed to
simplify the design of applications that need to store and retrieve
data generally through some form of dedicated API.
Louis JACHIET 11 / 57
Properties of DataBase Management Systems
• Physical Data independence Users of a database can ignore
how their data is stored in practice.
• Logical Independence Users can be given a partial view of
the data.
• Fourth Generation Language The database should be
controlled (queries and update) through an interface where
the users express their intention regardless of how the
database actually computes it.
• Query optimization The queries and update should be
automatically optimized to be as efficient as possible.
Louis JACHIET 12 / 57
Properties of DataBase Management Systems
• Logical integrity The DBMS should verify that update keep
data in a consistent state with regard to the constraints on
the structure of data.
• Physical integrity The DBMS should try to stay coherent in
the case of events like losing power, etc.
• Data sharing Multiple users can access the data while
preserving the logical and physical integrity.
• Standardized A DBMS should use a standardized interface so
one could swap one DBMS vendor by another vendor without
any major change to the code.
Louis JACHIET 13 / 57
Some standards for databases
• Relational simple but powerful model (tables), the one we
will focus on!
• XML, recursive data, complex queries, used to be hyped.
• Json/Documents, similar to XML but the fad moved to this
one.
• Graph data is modeled as a graph, complex queries, very
fashionable.
• Object complex hierarchical data model, inspired by OOP
• Key-Value simple queries and simple data model,
performance oriented.
• Olap-cube Oriented for performance of analytical queries.
Louis JACHIET 14 / 57
Proximity with other systems
Databases are generally installed
• Have dedicated files / memory
• Always running
Installation-less alternative
• Store the whole database in a file (sql/XML/JSON/...)
• Simple files with data (pure text, or CSV, or ...)
Louis JACHIET 15 / 57
Proximity with other systems
Databases are generally installed
• Have dedicated files / memory
• Always running
Installation-less alternative
• Store the whole database in a file (sql/XML/JSON/...)
• Simple files with data (pure text, or CSV, or ...)
Louis JACHIET 15 / 57
In summary
A variety of properties
Independence, Integrity, etc.
and variety of standards
Relational, Graph, Key-value, etc.
but always handles data and is accessible through some
interface
Louis JACHIET 16 / 57
A brief history of programming languages
From the 40s to mid 70s
A boom in programming languages, almost every PL paradigm is
invented then:
• 1938, the first computers are programmed with 0s and 1s
• 1940s, assembly languages start to be used
• 1954, FORTRAN, an imperative language, still in use
• 1958, Lisp, a functional programming language
• 1962, Simula, an object oriented programming language
• 1972, Prolog, a logic language
• 1973, C, the well-known language
• 1978, ML, a statically typed programming language.
Louis JACHIET 17 / 57
A brief history of DBMS
From the 60s
• 1960s, the navigational model and the first DBMS (e.g. IMS,
IDS)
• 1970, Codd writes the foundation paper A Relational Model of
Data for Large Shared Data Banks.
• 1975, First version of a relational DBMS, System R
• 1979, Multiple relational DBMS and a Standard Query Lan-
guage (SQL).
• 2000s, the No/Not only/New SQL movement with e.g. XML
databases and XQuery
• 2010s, very large databases and the return of SQL
• 2020s, graph databases ?
Louis JACHIET 18 / 57
Relational Database Management Systems (RDBMS)
• The most popular DBMSes
• based on the relational model (more on that later)
• A standardized query language SQL
• Centralized systems
Louis JACHIET 19 / 57
Properties of DBMS
• High decoupling between
• data model AND how it stored
• queries AND how they are executed
• Allows for complex queries
• High optimization of queries with indexes
• Software that are reliable, stable, featureful
• Supports integrity constraints
• Can cover large or very large datasets (Gigabyte or even
Terabytes)
• Supports transactions with ACID properties
Louis JACHIET 21 / 57
ACID properties
Atomiticity
A transaction block is either fully executed or completely canceled.
Consistency (or Correctness)
The resulting database is valid w.r.t to the integrity constraints.
Isolation
The effect of two concurrent transactions is the same as if one was
scheduled before the other.
Durability
Once confirmed, a transaction cannot be rolled back.
Louis JACHIET 22 / 57
Sidenote: the CAP theorem
CAP theorem
No cluster of computers can guarantee simultaneously:
• Coherence, i.e every successful read sees the latest write
• Availability, i.e. all queries are successful
• Partition Tolerance i.e. the system continues even if network
messages are lost
Immediate consequence
When a network failure happens one has to either:
• cancel the current operations
• proceed with the risk of being incoherent
Louis JACHIET 23 / 57
Sidenote: the CAP theorem
CAP theorem
No cluster of computers can guarantee simultaneously:
• Coherence, i.e every successful read sees the latest write
• Availability, i.e. all queries are successful
• Partition Tolerance i.e. the system continues even if network
messages are lost
Indirect consequence
It is hard to build a distributed SQL engine with ACID properties.
Louis JACHIET 23 / 57
Weaknesses of traditional RDBMS
• Hard to scale to very very large dataset (the peta-byte)
• Hard to scale to several thousand queries per second
• Does not model inherently recursive data (e.g. trees)
• ACID incurs noticeable overhead (latency, disk, CPU) due to
locks and journalization
• Generally disk-bound for typical job
Louis JACHIET 24 / 57
NoSQL
Key idea, remove some of the properties expected from a
traditional RDBMS to overcome some of the weaknesses
• multifaceted ecosystem,
• different data models (key-value, documents),
• relaxation of ACID properties,
• simplification of queries,
• etc.
Louis JACHIET 25 / 57
NewSQL
How to get SQL
• Complex queries
• ACID properties
and better performance/scalability?
• More queries per second
• Handle larger datasets
• Better query time on complex queries
Louis JACHIET 26 / 57
NewSQL
Solution:
• remove any bottleneck (locks, journalization, caches)
• put everything in RAM
• lockless parallelism
• use clusters
Spanner
Louis JACHIET 27 / 57
The future of SQL
SQL has been there a long time
• created almost 50 years ago
SQL is still there for a long time
• Wide adoption
• Many new software rely on it
• Recent upgrades of the standard: 86, 89, 92, 99, 03, 06, 08,
11, 16 and more to come
• SQL standard now capture more than just relational data:
XML, json, windows, temporal data and soon graphs!
Louis JACHIET 28 / 57
An example
Theaters
Name Address nbRooms
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Casting
Movie Person Role
“Inception” “Ellen Page” Actor
“Inception” “Leonardo DiCaprio” Actor
“Inception” “Christopher Nolan” Director
“Toy Story 3” “Tom Hanks” Voice Actor
“Mamma Mia” “Meryl Streep” Actor
“Mamma Mia” “ Phyllida Lloyd” Director
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
Louis JACHIET 29 / 57
The relational model: intuition
A Schema is composed of:
• Several tables or relations.
• Each relation has several columns or attributes.
• Each column has a type (INTEGER, BIGINT, VARCHAR, . . . )
The data is stored as records or tuples into this table.
Louis JACHIET 30 / 57
The relational model: intuition
A Schema is composed of:
• Several tables or relations.
• Each relation has several columns or attributes.
• Each column has a type (INTEGER, BIGINT, VARCHAR, . . . )
The data is stored as records or tuples into this table.
Louis JACHIET 30 / 57
The relational model: intuition
A Schema is composed of:
• Several tables or relations.
• Each relation has several columns or attributes.
• Each column has a type (INTEGER, BIGINT, VARCHAR, . . . )
The data is stored as records or tuples into this table.
Louis JACHIET 30 / 57
An example
Theaters
Name Address nbRooms
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Casting
Movie Person Role
“Inception” “Ellen Page” Actor
“Inception” “Leonardo DiCaprio” Actor
“Inception” “Christopher Nolan” Director
“Toy Story 3” “Tom Hanks” Voice Actor
“Mamma Mia” “Meryl Streep” Actor
“Mamma Mia” “ Phyllida Lloyd” Director
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
Louis JACHIET 31 / 57
The relational model: a perspective from logic
We have sets
• L of labels
• V of values
• T of types (for each τ ∈ T , τ ⊆ T )
Relation schema
Then a relation schema is a n-tuple (A1, . . . ,An) where each Ai is
a pair (Li , τi ) with Li ∈ V and τi ∈ T
Relational schema
A finite subset L of L and for each l ∈ L we have a relation
schema.
Louis JACHIET 32 / 57
The relational model: an example
Sets
• L is the set of strings
• V is the set of sequences of bytes
• T is the set of “classical” types (INTEGER, TEXT, etc.).
Relation schema for Theathers
(Name, TEXT),(Address, TEXT), (nbRooms, INTEGER)
Relation schema for Casting
(Movie, TEXT),(Person, TEXT), (Role, ENUM ROLE)
Relational schema
{Theater,Casting,Projection} with Theater having the schema
above, etc.
Louis JACHIET 33 / 57
The relational model: an exercise
How would you model a very simple synapse-like system
modeling
• Students
• Courses
• Rooms
• Assignment of courses to rooms (we suppose that a course
always takes place in one room)
• Assignment of student to courses
Louis JACHIET 34 / 57
The relational model: a perspective from logic
Instance of a schema relation
An instance of a schema relation (L1, τ1), . . . , (Ln, τn) is a finite set
of n-tuples (v11 , . . . , v1n ), . . . such that ∀i , j v ij ∈ τj .
Instance of a relational schema
An instance of a relational schema is an instance for each relation
in the schema.
Louis JACHIET 35 / 57
Some notation
Schema
Schema is a voluntarily ambiguous here and can be used for a
relation or a whole database
A-component of a tuple
Given a tuple t in a relation s of schema τ where A is the i-th
attribute, we define t[A] as the i-th component of t.
A-component of a tuple
Given a tuple t in a relation s of schema τ where A is a subset of
the attributes in τ , we define t[A] as the tuple where we keep
components from t associated to an attribute in A.
Louis JACHIET 36 / 57
Different types of integrity constraints
• Key When the sets of attributes A is a key constraint, it
means we cannot have two tuples t1 and t2 such that
t1[A] = t2[A].
• Foreign-key given two relations R1 and R2, A → B is a
foreign-key constraint between R1 and R2 when it means that
for each tuple t1 in R1, then there exists a unique tuple t2 in
R2 such that t1[A] = t2[B].
Note that there might exists t ′1 with t ′1[A] = t2[B] and there
might be a t ′2[B] with no corresponding t1.
• Check Given a relation R, a check constraint is a boolean
function f such that for each t ∈ R we have f (t) = >.
Louis JACHIET 37 / 57
Examples of constraints
Theaters
Name Address nbRooms
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Casting
Movie Person Role
“Inception” “Ellen Page” Actor
“Inception” “Leonardo DiCaprio” Actor
“Inception” “Christopher Nolan” Director
“Toy Story 3” “Tom Hanks” Voice Actor
“Mamma Mia” “Meryl Streep” Actor
“Mamma Mia” “ Phyllida Lloyd” Director
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
What are the constraints here?Louis JACHIET 38 / 57
Examples of constraints
Theaters
Name Address nbRooms
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Casting
Movie Person Role
“Inception” “Ellen Page” Actor
“Inception” “Leonardo DiCaprio” Actor
“Inception” “Christopher Nolan” Director
“Toy Story 3” “Tom Hanks” Voice Actor
“Mamma Mia” “Meryl Streep” Actor
“Mamma Mia” “ Phyllida Lloyd” Director
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
Not all desirable constraints can beexpressed...Louis JACHIET 39 / 57
Named and unnamed
Named perspective
Similarly we can forget about the order of attributes.
Unnamed perspective
A very similar model is one where the i-th column has no name
and is always referred at the column i .
Equally expressive but less understandable from a practical
viewpoint
Louis JACHIET 40 / 57
Named and unnamed
Named perspective
Similarly we can forget about the order of attributes.
Unnamed perspective
A very similar model is one where the i-th column has no name
and is always referred at the column i .
Equally expressive but less understandable from a practical
viewpoint
Louis JACHIET 40 / 57
Set vs Multiset semantics
Set semantics
A relation is a set, which means that no two tuples are equal.
Equivalent to considering that the set of attributes is a key,
generally what we use for the theory
Multiset semantics
Two (or more) equal tuples can appear in the same relation
Generally what SQL does but some drawbacks
In most cases we can ignore the set vs multiset semantics
Louis JACHIET 41 / 57
Set vs Multiset semantics
Set semantics
A relation is a set, which means that no two tuples are equal.
Equivalent to considering that the set of attributes is a key,
generally what we use for the theory
Multiset semantics
Two (or more) equal tuples can appear in the same relation
Generally what SQL does but some drawbacks
In most cases we can ignore the set vs multiset semantics
Louis JACHIET 41 / 57
Set vs Multiset semantics
Set semantics
A relation is a set, which means that no two tuples are equal.
Equivalent to considering that the set of attributes is a key,
generally what we use for the theory
Multiset semantics
Two (or more) equal tuples can appear in the same relation
Generally what SQL does but some drawbacks
In most cases we can ignore the set vs multiset semantics
Louis JACHIET 41 / 57
Typed vs Untyped
Typed
Usually we have types and a column can only have values from one
type.
Untyped
We can consider an untyped variant, equivalent to the case where
all types are T (the set of all values).
Louis JACHIET 42 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S
• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
A relational language
Given a schema S we introduce a “relational language” containing
the following constructs:
• R, for R a relation in S• RENAME(t,a,b)
• DROP(t,a)
• FILTER(t,cond)
• PRODUCT(t,t ′)
• UNION(t,t ′)
• DIFFERENCE(t,t ′)
Louis JACHIET 43 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
Theaters
Name Address nbRooms
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Casting
Movie Person Role
“Inception” “Ellen Page” Actor
“Inception” “Leonardo DiCaprio” Actor
“Inception” “Christopher Nolan” Director
“Toy Story 3” “Tom Hanks” Voice Actor
“Mamma Mia” “Meryl Streep” Actor
“Mamma Mia” “ Phyllida Lloyd” Director
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
Louis JACHIET 44 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute all the projections?
Solution: Projection
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
Louis JACHIET 45 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute the name of all theaters?
Theathers
Theaters
Name Address nbRooms
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Louis JACHIET 46 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute the name of all theaters?
DROP(Theathers,Address)
Theaters
Name nbRooms
“La Nef” 7
“Le Melies” 3
“Le Club” 3
Louis JACHIET 46 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute the name of all theaters?
Solution: DROP(DROP(Theathers,Address),nbRooms)
Theaters
Name
“La Nef”
“Le Melies”
“Le Club”
Louis JACHIET 46 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute the name of really all theaters?
UNION(
DROP(DROP(Theathers,Address),nbRooms),
DROP(DROP(Projections,Title),Date)
)
How to compute projections in August?
FILTER(Projections, Date in August)
Louis JACHIET 47 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute the name of really all theaters?
UNION(
DROP(DROP(Theathers,Address),nbRooms),
DROP(DROP(Projections,Title),Date)
)
How to compute projections in August?
FILTER(Projections, Date in August)
Louis JACHIET 47 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• Actors = DROP(FILTER(Casting,role=Actor),role)
• AllProjAct = PRODUCT(Projections,Actors)
• join = FILTER(AllProjAct, proj.title6=act.movie)
• solution = DROP(join,[proj.movieTitle,act.movie,Date,Theather])
Joins
A PRODUCT following by a FILTER is (generally) a JOIN.
Louis JACHIET 48 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• Actors = DROP(FILTER(Casting,role=Actor),role)
• AllProjAct = PRODUCT(Projections,Actors)
• join = FILTER(AllProjAct, proj.title6=act.movie)
• solution = DROP(join,[proj.movieTitle,act.movie,Date,Theather])
Joins
A PRODUCT following by a FILTER is (generally) a JOIN.
Louis JACHIET 48 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• Actors = DROP(FILTER(Casting,role=Actor),role)
• AllProjAct = PRODUCT(Projections,Actors)
• join = FILTER(AllProjAct, proj.title6=act.movie)
• solution = DROP(join,[proj.movieTitle,act.movie,Date,Theather])
Joins
A PRODUCT following by a FILTER is (generally) a JOIN.
Louis JACHIET 48 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• Actors = DROP(FILTER(Casting,role=Actor),role)
• AllProjAct = PRODUCT(Projections,Actors)
• join = FILTER(AllProjAct, proj.title6=act.movie)
• solution = DROP(join,[proj.movieTitle,act.movie,Date,Theather])
Joins
A PRODUCT following by a FILTER is (generally) a JOIN.
Louis JACHIET 48 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• Actors = DROP(FILTER(Casting,role=Actor),role)
• AllProjAct = PRODUCT(Projections,Actors)
• join = FILTER(AllProjAct, proj.title6=act.movie)
• solution = DROP(join,[proj.movieTitle,act.movie,Date,Theather])
Joins
A PRODUCT following by a FILTER is (generally) a JOIN.
Louis JACHIET 48 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• First compute Actors = FILTER(Casting,Role=Actor)
• Then compute the join: JOIN(Projections,Actors, title=movie)
• Then discard all columns except “Person” and “Theater”
Louis JACHIET 49 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• First compute Actors = FILTER(Casting,Role=Actor)
• Then compute the join: JOIN(Projections,Actors, title=movie)
• Then discard all columns except “Person” and “Theater”
Louis JACHIET 49 / 57
Operators: R, RENAME, DROP, FILTER, PRODUCT,
UNION, DIFFERENCE
How to compute pairs (a, t) where actor a has a movie in
theater t?
• First compute Actors = FILTER(Casting,Role=Actor)
• Then compute the join: JOIN(Projections,Actors, title=movie)
• Then discard all columns except “Person” and “Theater”
Louis JACHIET 49 / 57
JOIN example
Theaters
Name Address nb
“La Nef” “bd Edouard Rey” 7
“Le Melies” “caserne de Bonne” 3
“Le Club” “rue Phalanstere” 3
Projection
Title Date Theater
“Inception” 12/08/2010 20h “Le Melies”
“Toy Story 3” 13/08/2010 17h “Le Club”
“Toy Story 3” 13/08/2010 20h “Le Club”
“Toy Story 3” 10/08/2010 17h “Le Melies”
“Akmareul boatda” 10/08/2010 16h “Le Club”
“How to train your dragon” 12/03/2010 18h “Pathe Chavant”
JOIN(Theathers,Projection)
Projection Theaters
Title Date Theater Name Address nb
“Inception” 12/08/2010 20h “Le Melies” “Le Melies” “caserne de Bonne” 3
“Toy Story 3” 13/08/2010 17h “Le Club” “Le Club” “rue Phalanstere” 3
“Toy Story 3” 13/08/2010 20h “Le Club” “Le Club” “rue Phalanstere” 3
“Toy Story 3” 10/08/2010 17h “Le Melies” “Le Melies” “caserne de Bonne” 3
“Akmareul boatda” 10/08/2010 16h “Le Club” “Le Club” “rue Phalanstere” 3
Louis JACHIET 50 / 57
Valid computations
A computation is not necessarily valid:
• We cannot use a relation name that does not exist
• We cannot use attributes that do not exist
• We cannot UNION two relations with different attributes
• We cannot create a table with the same attribute twice
This can be checked just looking at the schema
Louis JACHIET 51 / 57
Valid computations
A computation is not necessarily valid:
• We cannot use a relation name that does not exist
• We cannot use attributes that do not exist
• We cannot UNION two relations with different attributes
• We cannot create a table with the same attribute twice
This can be checked just looking at the schema
Louis JACHIET 51 / 57
Extensions
Many ways of extending the relational algebra:
• Multiset semantics
• UNION and DROP can introduce duplicates
• NULLs
• Aggregation (total number of rooms in all theaters)
• Recursion
Louis JACHIET 52 / 57
Exercises 1/3
Schema
Room (Name,Time,MovieTitle)
Movie (MovieTitle, Director, Actor)
Procucer (ProducerName, MovieTitle)
Seen (Spectator, MovieTitle)
Like (Spectator, MovieTitle)
• Where and when can we see the movie “Mad Max...”?
• What are the movies directed by Welles?
• Who are the actors of “Ran”?
• Where can we see a movie in which Signoret plays?
• Among the actors who produced at least one movie?
• Among the actors who directed a movie that they played in?
• Who plays in one (or more) movie from Varda?Louis JACHIET 53 / 57
Exercises 2/3
Schema
Room (Name,Time,MovieTitle)
Movie (MovieTitle, Director, Actor)
Procucer (ProducerName, MovieTitle)
Seen (Spectator, MovieTitle)
Like (Spectator, MovieTitle)
• Who are the actors playing in all the films from Chloe Zhao?
• Who produces all the movies from Kurosawa?
• Who are the spectators watching all the movies?
• Among the spectators, who likes all the movies they see?
• Where can we see Adele Haenel after 16:00?
• What are the movies with no room projecting them?
Louis JACHIET 54 / 57
Exercises 3/3
Schema
Room (Name,Time,MovieTitle)
Movie (MovieTitle, Director, Actor)
Procucer (ProducerName, MovieTitle)
Seen (Spectator, MovieTitle)
Like (Spectator, MovieTitle)
• Among the producers, who produces a movies shown
nowhere?
• Among the producers who saw all the movies they directed?
• Among the spectators, who saw all the movies from
Kurosawa?
• Among the spectators, who liked a movie they did not watch?
• Among the spectators, who liked 0 movies?
• Among the producers, who did not produce a single movie
from Doillon?
• Among the producer, who watched only movies that they
produced?
Louis JACHIET 55 / 57
Course organization
Date Type Content
10/09/2021 Lesson Introduction
17/09/2021 Lesson SQL
24/09/2021 Lab SQL lab
01/10/2021 Lesson Schemas, views, and constraints
08/10/2021 Lesson Evaluation algorithms
15/10/2021 Lab Constraints and evaluation algorithms
22/10/2021 Lab Recap
29/10/201 Exam
Louis JACHIET 56 / 57