Week 2-data models

13
1 Today’s Class Data Models Relational Model 2 Data Models Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations. 3 Categories of data models Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.) Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer. Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details. 4 Schemas versus Instances Database Schema: The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. Schema Diagram: A diagrammatic display of (some aspects of) a database schema. Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE. Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence). 5 Database Schema Vs. Database State Database State: Refers to the content of a database at a moment in time. Initial Database State: Refers to the database when it is loaded Valid State: A state that satisfies the structure and constraints of the database. Distinction The database schema changes very infrequently. The database state changes every time the database is updated. Schema is also called intension, whereas state is called extension. 6 define empty state initial state load state update update valid state satisfy database schema

description

 

Transcript of Week 2-data models

Page 1: Week 2-data models

1

Today’s Class

Data Models

Relational Model

2

Data Models

Data Model: A set of concepts to describe the structure of a database, and certain constraintsthat the database should obey.

Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations.

3

Categories of data models

Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models.)

Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer.

Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details.

4

Schemas versus Instances• Database Schema: The description of a

database. Includes descriptions of the database structure and the constraints that should hold on the database.

• Schema Diagram: A diagrammatic display of (some aspects of) a database schema.

• Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE.

• Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence).

5

Database Schema Vs. Database State

• Database State: Refers to the content of a database at a moment in time.

• Initial Database State: Refers to the database when it is loaded

• Valid State: A state that satisfies the structure and constraints of the database.

• Distinction• The database schema changes very infrequently. The

database state changes every time the database is updated.

• Schema is also called intension, whereas state is called extension.

6

define

empty state

initial state

load

state

update

updatevalid state

satisfy database schema

Page 2: Week 2-data models

7

Importance of Data Models

Data models Representations, usually graphical, of complex

real-world data structures

Facilitate interaction among the designer, the applications programmer and the end user

End-users have different views and needs for data

Data model organizes data for various users

8

Data Model Basic Building Blocks

Entity Anything about which data will be

collected/stored

Attribute Characteristic of an entity

Relationship Describes an association among entities

• One-to-one (1:1) relationship

• One-to-many (1:M) relationship

• Many-to-many (M:N or M:M) relationship

Constraint A restriction placed on the data

9

History of Data Models

Relational Model: proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. Now in several commercial products (DB2, ORACLE, SQL Server, SYBASE, INFORMIX).

Network Model: the first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (CODASYL - DBTG report of 1971). Later implemented in a large variety of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital Equipment Corp.).

Hierarchical Data Model: implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. The most popular model. Other system based on this model: System 2k (SAS inc.)

10

History of Data Models

Object-oriented Data Model(s): several models have been proposed for implementing in a database system. One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at H.P.- used in Open OODB).

Object-Relational Models: Most Recent Trend. Started with Informix Universal Server. Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server etc. systems.

11

Hierarchical Database Model

Logically represented by an upside down tree

Each parent can have many children

Each child has only one parent

12

Hierarchical Database Model Advantages

Conceptual simplicity

Database security and integrity

Data independence

Efficiency

Disadvantages

Complex implementation

Difficult to manage and lack of standards

Lacks structural independence

Applications programming and use complexity

Implementation limitations

Page 3: Week 2-data models

13

Hierarchical Data Model

• ADVANTAGES:• Hierarchical Model is simple to construct and operate on

• Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies

• Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.

• DISADVANTAGES:• Navigational and procedural nature of processing

• Database is visualized as a linear arrangement of records

• Little scope for "query optimization"

14

Network Database Model

Each record can have multiple parents

Composed of sets

Each set has owner record and member record

Member may have several owners

15

Network Database Model

Advantages

Conceptual simplicity

Handles more relationship types

Data access flexibility

Promotes database integrity

Data independence

Conformance to standards

Disadvantages

System complexity

Lack of structural independence

16

Network Data Model

• ADVANTAGES:• Network Model is able to model complex relationships and

represents semantics of add/delete on the relationships.

• Can handle most situations for modeling using record types and relationship types.

• Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database.

• DISADVANTAGES:• Navigational and procedural nature of processing

Database contains a complex array of pointers that thread through a set of records.

Little scope for automated "query optimization”

17

Network Model Data are represented by collections of records.

similar to an entity in the E-R model

Records and their fields are represented as record typetype customer = record type account = record

customer-name: string; account-number: integer;

customer-street: string; balance: integer;

customer-city: string;

end end

Relationships among data are represented by links similar to a restricted (binary) form of an E-R relationship

restrictions on links depend on whether the relationship is many many, many-to-one, or one-to-one.

18

Data-Structure Diagrams

o Schema representing the design of a network database.

o A data-structure diagram consists of two basic components:

o Boxes, which correspond to record types.

o Lines, which correspond to links.

o Specifies the overall logical structure of the database.

o For every E-R diagram, there is a corresponding data-structure diagram.

Page 4: Week 2-data models

19 20

Network Model Data Structure

a record type may be both a owner and member of two set types

21

The DBTG CODASYL Model

o All links are treated as many-to-one relationships.

o To model many-to-many relationships, a record type is defined to represent the relationship and two links are used.

o Create a new record type Rlink (referred to as a dummy record type).

22

23

Hierarchical Model

o A hierarchical database consists of a collection of records which are connected to one another through links.

o a record is a collection of fields, each of which contains only one data value.

o A link is an association between precisely two records.

o The hierarchical model differs from the network model in that the records are organized as collections of trees rather than as arbitrary graphs.

24

Tree-Structure Diagrams

The schema for a hierarchical database consists of

boxes, which correspond to record types

lines, which correspond to links

Record types are organized in the form of a rooted tree.

No cycles in the underlying graph.

Relationships formed in the graph must be such that only one-to-many or one-to-one relationships exist between a parent and a child.

Page 5: Week 2-data models

25

General Structure

A parent may have an arrow pointing to a child, but a child must have an arrow pointing to its parent.

26

Database schema is represented as a collection of tree-structure diagrams.

single instance of a database tree

The root of this tree is a dummy node

The children of that node are actual instances of the appropriate record type

27

Single Relationships

If the relationship depositor is one to one, then the link depositor has two arrows.

Only one-to-many and one-to-one relationships can be directly represented in the hierarchical mode.

28

The Relational Model

29

Introduction

Proposed by Edgar. F. Codd(1923-2003) in the early seventies. [ Turing Award –1981

Most of the modern DBMS are relational.

Simple and elegant model with a mathematical basis.

Led to the development of a theory of data dependencies and database design.

Relational algebra operations

crucial role in query optimization and execution.

Laid the foundation for the development of

Tuple relational calculus and then

Database standard SQL

30

Basic Concepts

• Entities and relationships are stored in tables

• Relationships are captured by including key of

one table into another

• Languages for manipulating the tables

• All popular DBMSs today are based on relational

data model (or an extension of it, e.g., object-

relational data model)

Page 6: Week 2-data models

31

Why is it so good?

• Simplicity, everybody knows how to manipulate tables

• Tables are simple enough so that solutions to complicated

problems such as concurrency control and query

optimization can be obtained

• It has a theoretical basis for the studying of database design

problems

• Tables are logical concepts; physically tables can be stored

in different ways support data independence

32

Terminology

• Relation table; denoted by R(A1, A2, ..., An) where R is a relation

name and (A1, A2, ..., An) is the relation schema of R

• Attribute column; denoted by Ai

• Tuple row

• Attribute value value stored in a table cell

• Domain legal type and range of values of an attribute

denoted by dom(Ai)

– Attribute: Age Domain: [0-100]

– Attribute: EmpName Domain: 50 alphabetic chars

– Attribute: Salary Domain: non-negative integer

• Ideally, a domain can be defined in terms of another domain; e.g., the domain

of EmpName is PersonName. This is NOT allowed in most basic DBMSs.

• However, most recent DBMSs allows this (object-relational) extension such

as Oracle 10g.

33

Relational Database: Definitions

Relational database: a set of relations

Relation: made up of 2 parts: Instance : a table, with rows and columns.

#Rows = cardinality, #fields = degree / arity.

Schema : specifies name of relation, plus name and type of each column.

• e.g. Students(sid: string, name: string, login: string,age: integer, gpa: real).

Can think of a relation as a set of rows or tuples (i.e., all rows are distinct).

34

STUDENT

Name Student-id Age CGPA

Chan Kin Ho 99223367 23 8.19

Lam Wai Kin 96882145 17 10.00

Man Ko Yee 96452165 22 8.75 Lee Chin Cheung 96154292 16 10.00

Alvin Lam 96520934 15 9.65

Attributes/Columns (collectively

as a schema)

Relation Name/Table Name

An Example Relation

Cardinality = 5, degree = 4, all rows distinct

35

Another Relation Example

enrollment (studentName, rollNumber, courseNo, sectionNo)

enrollment

36

Relational Model

Sets

collections of items of the same type

no order

no duplicates

Mappings

domain range1:many

many:1

1:1

many:many

Page 7: Week 2-data models

37

Relational Model Concepts

Relational Model of data is based on theconcept of RELATION

A Relation is a Mathematical concept based onidea of SETS

The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations

38

Relational Model Concepts

The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper:"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.

The above paper caused a major revolution in the field of Database management and earned Codd the coveted ACM Turing Award in 1981

39

Relational Model Concepts

The relational model represents the database as a collection of relations

Each relation resembles a table of values

When a relation is thought of as a table of values, each row in the table represents a collection of related data values

40

Some Formal Definitions

• A relation is denoted by: R(A1, A2, ..., An)

– STUDENT(Name, Student-id, Age, CGA)

• Degree of a relation: the number of attributes n in the

relation.

• Tuple t of R(A1, A2, ..., An): An ordered set of values

<v1,v2,...,vn> where each vi is an element of dom(Ai).

• Relation instance r(R): A set of tuples in R

r(R) = {t1, t2, ..., tm}, or alternatively

r(R) dom(A1) dom(A2) ... dom(An)

41

Domain

A Domain D is a set of atomic values.

Atomic means that each value in the domain is indivisible as far as the relational model is concerned

It means that if we separate an atomic value, the value itself become meaningless, for example: SSN

Local_phone_number

Names

Employee_ages

42

Domains & Data Types

Smallest semantic of data Individual Part Number, Individual Supplier

number, Individual City name etc. Atomic values or scalar values Domain is a named set of atomic values Pool of legal values Example: Supplier number an integer [0, 10000]

Page 8: Week 2-data models

43

Relation and Cartesian Product

• A relation is any subset of the Cartesian product of domains of

values

• Example: Let Dom(Name) = { Lee, Cheung }

Dom(Grade) = { A, B, C }

Then the Cartesian product of the domains is

Dom(Name) Dom(Grade) = { Lee, A , Lee, B , Lee, C ,

Cheung, A , Cheung, B , Cheung C }

• A relation StudentGrade (Name, Grade) can be defined as any

subset of the Cartesian product Dom(Name) Dom(Grade)

r(StudentGrade) = { Lee, A , Cheung C } Dom(Name) Dom(Grade)

44

Characteristics of Relations

• Tuples in a relation are not considered to be ordered, even

though they appear to be in a tabular form. (Recall that a

relation is a set of tuples.)

• Ordering of attributes in a relation schema R are

significant.

• Values in a tuple: All values are considered atomic.

(Recall that a domain is a set of atomic values.) A special

null value is used to represent values that are unknown or

inapplicable to certain tuples.

45

Identical Relations

46

Relational Model Notation

An attribute A can be qualified with the relation name R to which it belongs by using the dot notation R.A

For example, STUDENT.Name or STUDENT.Age

47

Relational Model Notation

We refer to component values of a tuple t by:• t[Ai] or t.Ai• This is the value vi of attribute Ai for tuple t• Both t[Ai, Aj, Ak] or t.(Ai, Aj, Ak) refers to a list of

attributes from R

For example: consider a tuple t=< “Barbara Benson”, “533-69-1238”, “839-8461”, “7384 Fontana Lane”, NULL, 19, 3.25> from the STUDENT relation in Figure 5.1

We have t.name=< “Barbara Benson”,>t. (Ssn, Gpa, Age) = <“533-69-1238”,3.25,19>

48

Domain Constraints Each attribute A must be an atomic value from the

dom(A)

The data types associated with domains typically include standard numeric data type for integers, real numbers, Characters, Booleans, fix-length strings, time, date, money or some special data types

Domain-constrained comparisonsSelect …..

From P, SP

Where P.P# = SP.P#

Select …..

From P, SP

Where P.weight = SP.qty

Both are valid queries in SQL, but second one makes nosense!!

Page 9: Week 2-data models

49

Key Constraints

A relation is defined as a set of tuples By definition, all elements of a set are distinct This means that no two tuples can have the

same combination of values for all their attributes

Superkey: a set of attributes that no two distinct tuples in any state r of R have the same value

Every relation has at least one default superkey – the set of all its attributes

50

Key Constraints

A superkey can have redundant attributes, so a more useful concept is that of a KEY which has no redundancy

Key satisfied two constrains:

Two distinct tuple in any state of the relation cannot have identical values for the attributes in the key

It is a minimal superkey

51

Key Constraints

For example, consider STUDENT relation

The attribute set {IDNO} is a key of STUDENT because no two student can have the same value for IDNO

Any set of attributes that includes IDNO – for example {IDNO, Name, Age} – is a superkey

52

Key Constraints

In general, a relation schema may have more than one key, in this case, each of the key is called a candidate key

Example: Consider the CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) CAR has two keys:

• Key1 = {State, Reg#}• Key2 = {SerialNo}

Both are also superkeys of CAR {SerialNo, Make} is a superkey but not a key.

53

Let K R (I.e., K is a set of attributes which is a subset of the schema of R)

K is a superkey of R if K can identify a unique tuple in a given relation r(R)

Keys

Customer(CusNo, Name, Address, …)where customers have unique customer numbers and unique names.Possible superkeys: CusNo

{CusNo, Name}{CusNo, Name, Address}plus many others

• K is a candidate key if K is minimal

• Every relation is guarantee to (must) have at least one key.

Why?

54

Key(Candidate key)

A key can not be determined from any particular instance data

it is an intrinsic property of a schema

it can only be determined from the meaning of attributes

A relation can have more than one key.

Superkey: A set of attributes that contains any key as a subset. A key can also be defined as a minimal superkey

Primary Key: One of the candidate keys chosen for indexing purposes ( More details later…)

Page 10: Week 2-data models

55

Key Constraints

If a relation has several candidate keys, one is chosen arbitrarily to be the primary key.

Example: Consider the CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) We chose SerialNo as the primary key

The primary key value is used to uniquely identify each tuple in a relation Provides the tuple identity

Also used to reference the tuple from another tuple General rule: Choose as primary key the smallest of the

candidate keys (in terms of size) Not always applicable – choice is sometimes subjective

56

CAR table with two candidate keys –LicenseNumber chosen as Primary Key

57

COMPANY Database Schema

58

Key Constraints and Constraints on NULL values

Another constraint on attributes specifies whether NULL value are or are not permitted

For example, if every STUDENT tuple must have a valid, non-NULL value for the Name attribute, then Name of STUDENT is constrained to be NOT NULL

59

Entity Integrity

Entity Integrity:

The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R).

• This is because primary key values are used to identify the individual tuples.

• t[PK] null for any tuple t in r(R)

• If PK has several attributes, null is not allowed in any of these attributes

Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key.

60

Referential Integrity Constraint

Referential Integrity Constraint is specified between two relations and is used to maintain the consistency among tuples in the two relations

Informally define the constraint: a tuple in one relation must refer to an existing tuple in that relation

For example, the Dno in EMPLOYEE gives the department number for which each employee works, this number must match the Dnumber value in DEPARTMENT

Page 11: Week 2-data models

61

Referential Integrity Constraint

Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2.

A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].

62

Displaying a relational database schema and its constraints

Each relation schema can be displayed as a row of attribute names

The name of the relation is written above the attribute names

The primary key attribute (or attributes) will be underlined

A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the foreign key attributes to the referenced table

Can also point to the primary key of the referenced relation for clarity

Next slide shows the COMPANY relational schema diagram

63

Referential Integrity Constraints for COMPANY database

64

65

Other Types of Constraints

Semantic Integrity Constraints:

based on application semantics and cannot be expressed by the model per se

Example: “the max. no. of hours per employee for all projects he or she works on is 56 hrs per week”

A constraint specification language may have to be used to express these

66

Modification and Updates

In this section, we concentrate on the database Updates and Modification

There are threee basic operation: Insert, Delete and Modify Insert is used to insert a new tuple or tuples in a

relation

Delete is used to delete tuples

Update (or Modify) is used to change the values of some attributes

Page 12: Week 2-data models

67

Modification and Updates

Insert: insert new element with specify all related attributes

Delete: delete an element by giving Relation name and key of the tuple

Modify: modify a value by giving a relation name, Key of the target tuple and attribute to modify

68

Possible violations for each operation

INSERT may violate any of the constraints:

Domain constraint:• if one of the attribute values provided for the new tuple

is not of the specified attribute domain

Key constraint:• if the value of a key attribute in the new tuple already

exists in another tuple in the relation

Referential integrity:• if a foreign key value in the new tuple references a

primary key value that does not exist in the referenced relation

Entity integrity:• if the primary key value is null in the new tuple

69

Insert Example

Insert <„Cecilia‟, „F‟, „Kolonsky‟, NULL, „1960-04-05‟, „6357 Windy lane,Kate,TX‟, F, 28000, NULL, 4> into EMPLOYEE

Insert < „Cecilia‟, „F‟, „Kolonsky‟, 999887777, „1960-04-05‟, „6357 Windy lane,Kate,TX‟, F, 28000, NULL, 4 >

Cecilia‟, „F‟, „Kolonsky‟, 667788999, „1960-04-05‟, „6357 Windy lane,Kate,TX‟, F, 28000, NULL, 7>

70

Possible violations for each operation

DELETE may violate only referential integrity:

If the primary key value of the tuple being deleted is referenced from other tuples in the database

• Can be remedied by several actions: RESTRICT, CASCADE, SET NULL

• RESTRICT option: reject the deletion

• CASCADE option: propagate the new primary key value into the foreign keys of the referencing tuples

• SET NULL option: set the foreign keys of the referencing tuples to NULL

One of the above options must be specified during database design for each foreign key constraint

71

Delete Example

Delete the EMPLOYEE tuple with Ssn=„99988777‟

Delete the EMPLOYEE tuple with Ssn=„333445555‟

Delete the EORKS_ON tuple eith Essn=„999887777‟ and Pno=10

72

Possible violations for each operation

UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified

Any of the other constraints may also be violated, depending on the attribute being updated:

Updating the primary key (PK):

• Similar to a DELETE followed by an INSERT

• Need to specify similar options to DELETE

Updating a foreign key (FK):

• May violate referential integrity

Updating an ordinary attribute (neither PK nor FK):

• Can only violate domain constraints

Page 13: Week 2-data models

73

Update Example

Update the salary of EMPLOYEE tuple with Ssn=„999887777‟ to 2800

Update the Dno of the EMPLOYEE tuple with Ssn=„999887777‟ to 1

Update the Dno of the EMPLOYEE tuple with Ssn=„999887777‟ to 7

Update the Ssn of the EMPLOYEE tuple with Ssn=„999887777‟ to „987654321‟

74

Summary In relational systems, the DB is perceived by the

user as relations & nothing else

Relations are only logical structures

At the physical level, the system is free to storethe data in any way it likes – using sequentialfiles, indexing, hashing…

Provided it can map stored representations torelations

75

Relational Systems Consider the relations:Dept(dept#, dname, budget)

D1 MKTNG 10M

D2 DEV 12M

D3 RES 5M

Emp(emp#, ename, dept#, salary)

E1 LOPEZ D1 40K

There is a connection between tuples E1 & D1. The connection is represented,

not by a pointer, but by the occurrence of value D1 in E1.

In non-relational systems, such information is typically represented by some

kind of pointer that is visible to the user.

76

Relational Systems In relational systems, there are no pointers at

the logical level

Pointers will be there at the physical level

Physical storage details are concealed from theuser in relational systems

77

Properties of Relations

There are no duplicate tuples• Body of a relation is a mathematical set

Tuples are unordered, top to bottom• Body of a relation is a mathematical set• No such thing as fifth tuple, next tuple ..• No concept of positional addressing

Attributes are unordered, left to right• Heading of a relation is a mathematical set• No concept of positional addressing

All attribute values are atomic• Normalized (1st Normal Form)

78

Types of Relations Base Relations

• The original (given) relations

Derived Relations• Relations obtained from base relations

Views• “Virtual” derived relation• Only definition is stored in the catalog• Definition executed at run-time

Snapshots• “Real” derived relation

Query Result• Unnamed derived relation