Introduction to Database

53
1 Introduction to Database

description

Introduction to Database. Data versus Information. When people distinguish between data and information, Data is simply a set of individual numbers or facts. E.g. John Smith got a 87 on Test 2 in CSC 240. - PowerPoint PPT Presentation

Transcript of Introduction to Database

Page 1: Introduction to Database

1

Introduction to Database

Page 2: Introduction to Database

2

Data versus Information• When people distinguish between data and

information, – Data is simply a set of individual numbers or facts.

• E.g. John Smith got a 87 on Test 2 in CSC 240.

– Information arises from integration – putting the individual facts together to see the larger picture or by asking a question not answerable by just one piece of data.

• John Smith scored more than 10 points over the class average on all of the tests in CSC 240.

• Jane Jones got the highest score on Test 2 is CSC 240.

Page 3: Introduction to Database

3

Database

• The purpose of a database is to bring together a set of data so that it is centralized, organized, easy to extract information from, and so forth.

• The way the facts are organized and how they relate to each other is known as the database design.

• The software application that facilitates the storing of the data and the accessing of the information is known as a database management system.

Page 4: Introduction to Database

4

What do we want?• Desired Features of our database

– Storage:• We want to store data efficiently, have it centralized

(or at least seemingly centralized).

– Retrieval: • We want to have the data at our fingertips when we

want it.

– Querying: • We want to ask various questions about the data

(and get answers in a timely manner).

• (These desires are to some extent in conflict.)

Page 5: Introduction to Database

5

Database Modeling

• In database design, one tries to organize the data in a way that reflects the situation from which the data comes.

• One approach is known as entity-relationship modeling or ER modeling.

• The design is often represented pictorially in an ER Diagram.

Page 6: Introduction to Database

6

Design Implementation

• After a database is designed, one uses a database management system to implement (actually make) the database.

• In database implementation, one encounters terms like table, record, field, keys, and so forth.

Page 7: Introduction to Database

7

Entities• One identifies units of information that clearly

belong together. These are known as the entities.• An entity is somewhat similar to an object in

programming, it collects data that belongs together in some immediate way.

• Entities also separate the data into distinct units. • Database entities often reflect real objects/entities

(persons, buildings, courses, etc.)

Page 8: Introduction to Database

8

Fields

• The lower-level pieces of data gathered together to form an entity are known as fields or attributes or properties. – The Person entity might consist of fields like

FirstName, LastName, JobType, SocSecNum, etc.

– Fields are analogous to properties of an object.

• Fields have a type (Text, Number, Yes/No, Memo, Date/Time, etc.) which indicate how the information is to be stored and interpreted.

Page 9: Introduction to Database

9

Relationship

• The various entities may be distinct, but they are not completely disconnected. – E.g. a Customer places an order

• An association between two entities is known as a relationship.

Page 10: Introduction to Database

10

ER Diagram

• One can visualize the entities and their relationship using an Entity-Relationship (ER) diagram. – The entities are represented by rectangles. – The relationships are represented by arrows

between the rectangles. • The arrow may include a verb to capture the nature

of the relationship (as well as other notations).

Page 11: Introduction to Database

11

ER Diagram Example

CustomerCustomerID

CustomerFirstName

OrderOrderID

ShippingCost

ItemItemID

ItemDescription

Places

Is part of

Page 12: Introduction to Database

12

DBMS

• Before databases a programmer had to worry about – where files are to be found,– the order in which the fields occur,– the length of the fields and/or the delimiter used,

• Such features of the file-based approach are called program-data dependence.

• These are now taken care of by the database management system (DBMS).

Page 13: Introduction to Database

13

Database vs. DBMS

• The generalized routines for reading, parsing, searching, sorting etc. are in the DBMS.

• But information specific to a particular case (number of fields, their type, size and so on) is still required. This data is placed together with the “actual” data in the database.

Page 14: Introduction to Database

14

Meta-data

• This data about the data is known as meta-data.– Meta: a prefix meaning: after, along with or

beyond

• The meta-data describes the actual data, and so databases are sometimes called self-describing.

• Related terms include: data dictionary, system catalog and schema.

Page 15: Introduction to Database

15

Meta-data: Open a database file (Access in this

example) in Notepad

Some actual data

One can see there’s more to this file than just customer data.

Page 16: Introduction to Database

16

Database/DBMS Distinction Database

Raw-data and meta-data

DBMS

User Application

Users and applications interact with a database only through the DBMS.

Page 17: Introduction to Database

17

Pros of Database Approach

• Control of data redundancy

• Data consistency

• More info from same data

• Sharing of data

• Improved data integrity

• Improved security

• Enforcement of standards

• Economy of scale• Balancing of conflicting

requirements• Improved accessibility and

responsiveness• Improved maintenance

through data independence• Increased concurrency• Improved backup and

recovery services

Page 18: Introduction to Database

18

Client-Server• The client-server model is a way for transactions to take

place. – The transaction is viewed as a service. – The client requests the service. – The server provides the service.

• For example, to query a networked database– A client would request the network server(s) to connect it to the

database server– The database server queries the database– The result is passed from database server to network server to client.

• The client-server terminology can be applied to both software and hardware.

Page 19: Introduction to Database

19

SQL

• SQL (Structured Query Language), pronounced S-Q-L or See-Quel, has become the standard language for relational databases.

• SQL is part third generation and part fourth generation.

Page 20: Introduction to Database

20

What’s it made of?

• SQL has 3 components: – Data Definition Language (DDL)

• The part that allows you to establish the structure of the database

– Data Manipulation Language (DML)• The part that allows you to enter data, update data

and ask questions of the data (queries)

– Data Control Language (DCL) • The part that allows you to add security features

(e.g. user authentication), concurrency (multi-user) features, recovery features, etc.

Page 21: Introduction to Database

21

Express Relationships as Verbs

• Relationships are generally expressed as verbs. For example, – Athlete comes from Country

– Athlete competes in Event

• The relationship can be represented by a line between the two rectangles representing the participating entities. The verb is written on the line. In the Chen model, the verb is placed in a diamond.

Page 22: Introduction to Database

22

Basic Structure: Relationships

• Relationships are said to have a multiplicity.

• Relationships are categorized by how many things are related to how many things.

– 1:M (one-to-many)

– M:N (many-to-many)

– 1:1 (one-to-one)

Page 23: Introduction to Database

23

Relationship Examples

• One-to-many– A country will be represented by many athletes, but

each athlete represents only one country.

• Many-to-many– An athlete may compete in many events, and an event

has many athletes competing in it.

• One-to-one– Each country has one athlete serve as flag bearer in the

opening ceremony.

Page 24: Introduction to Database

24

Degree of a Relationship

• Relationships are said to have a degree (the number of entity types involved). – Binary: involves two entities– Ternary: involves three entities – Quaternary: involves four entities

• Even if not using the Chen model, ternary and higher degree relationships are represented using a diamond.

Page 25: Introduction to Database

25

Ternary Relationship Example

StockHolder StockBroker

Stock

Buy/Sell

A StockHolder buys or sells a stock through a StockBroker.

No arrows in relationships with degree higher than 2.

Page 26: Introduction to Database

26

Quaternary Relationship Example

Make a movieActor Director

Producer

Writer

Page 27: Introduction to Database

27

Recursive Relationship

• If an entity (type) has a relationship with itself, that relationship is called recursive.

• The entity occurrences in the relationship may be distinct.

• Since the subject and object are of the same entity type, the “roles” the occurrences are playing in the relationship may be added to the diagram.

Page 28: Introduction to Database

28

Recursive Relationship

• A typical example here is if one employee serves as the supervisor of another employee. – While the relationship involves different entity

occurrences (i.e. two different employees), it involves only one entity type. Thus as far as entity type goes, the Employee entity has a relationship with itself.

Page 29: Introduction to Database

29

Role names are used to clarify situations with multiple relationships

Faculty Student

Teaches

Advises

Teacher Student

Advisor Advisee

Page 30: Introduction to Database

30

Attributes

• Attributes are the properties of an entity. – E.g. the attributes of a Customer are FirstName,

LastName, etc.

• Relationships can also have properties. – E.g. a stock is bought or sold on a particular

date (at a particular price). – (One may consider introducing a new entity

called a transaction.)

Page 31: Introduction to Database

31

Attribute Domain

• An attribute’s domain is the set of values that a property is allowed to take on.

• For example,– The quantity of items ordered is 0

– The price paid is 0

– Gender would be ‘M’ or ‘F’ (or perhaps NULL)

– Phone numbers consist of numbers only. One can also specify the number of digits or a range thereof.

Page 32: Introduction to Database

32

Self-documenting attribute names• When it comes time to implement the database

and one is turning attributes into the corresponding fields, resist the temptation to use abbreviated field names.

• If you use descriptive field names, your implementation will be self-documenting – in that many people will know what you mean simply by the name you have used. – Some designers suggest that the first part of a field

name refer to the table/entity it belongs to. This can be especially useful with common fields like IDs and names.

Page 33: Introduction to Database

33

Attributes: Simple or Composite

• A property that takes on a value that cannot be broken down into pieces is called simple, (aka “atomic”)– E.g. quantity, price, gender

• A property that can be broken into constituent properties is called composite.– E.g. address street, city, state, zipcode– name firstName, lastName

Page 34: Introduction to Database

34

Attributes: Single-valued or Multi-valued

• Single-valued: a property that takes on only one value at a time for a given entity occurrence.– E.g. dateOfBirth, you only have one

• Multi-valued: a property that can take on more than one value at the same time for a given entity occurrence – E.g. phoneNumber (home and cell number)

– E.g. beneficiary

Page 35: Introduction to Database

35

Attributes: Derived

• If a property can be determined from other properties, it is said to be derived. – E.g. age or AgeCategory (20-29, 30-39, etc.)

can be derived from dateOfBirth– E.g. taxBracket can be derived from

grossIncome and deductions– E.g. city might be derivable from zipcode

Page 36: Introduction to Database

36

Keys

• A candidate key is a minimal set of properties that uniquely determine each entity occurrence (record, row, tuple).

• A candidate key may be composite, i.e. consisting of more than one property.

• Minimal above means that if one property is removed from the set, the set no longer uniquely determines an occurrence.

Page 37: Introduction to Database

37

Keys (Cont.)• There can be more than one candidate key.

– In the school’s database (banner), a person can be identified by his or her

• socialSecNumber• idNumber• pidm

• The primary key is the candidate key that is selected to identify the entity occurrence internally (within the database).

• A candidate not chosen to be the primary is sometimes called an alternate key.

Page 38: Introduction to Database

38

Relational Model: Ingredients • The main components of the Relational Model are

tables (a two-dimensional array). • Tables are a realization of the mathematical

concept of a relation. • Tables are reminiscent of the files used in a file-

based approach. • Table Relation File• The table is logical and the data does not

necessarily take this form physically. • A table has a name.

Page 39: Introduction to Database

39

Table Relation File

Page 40: Introduction to Database

40

Relational Model: Ingredients (Cont.)

• A table collects together associated data. • A table is thought of in terms of rows and columns. • The data in a single column is all of the same type, i.e.

all the same property. – E.g. all of the people’s last names.

• The column (a.k.a. field) has a name and a type (e.g. text, number, etc.).

• A table is distinct from a similar looking mathematical object, the matrix, in that the order of the columns does not matter.

• Column Field Attribute Property

Page 41: Introduction to Database

41

Column Field Attribute Property

Page 42: Introduction to Database

42

Relational Model: Ingredients (Cont.)

• The row (a.k.a. a record) collects together the various properties that belong to a particular object. – E.g. a person’s first name, last name, date of

birth, etc.

• Again a table is distinct from a matrix, in that the order of the rows does not matter.

• Row Record Tuple

Page 43: Introduction to Database

43

Row Record Tuple

Page 44: Introduction to Database

44

More Relational Model Vocabulary

• In addition to having a type, a field has a domain, the set of values that the particular property is allowed to have. – E.g. a number must fall between 0 and 100.

– E.g. some text (string) must have two letters followed by four numbers.

– E.g. a person’s gender must be M or F.

• Ensuring that a value falls within the domain is called applying the domain constraint.

Page 45: Introduction to Database

45

Input masks and Validation Rules are ways to impose domain constraints in Access

Page 46: Introduction to Database

46

Validation Rule example

Page 47: Introduction to Database

47

Since data was entered before the validation rule was introduced. The old data must be tested for consistency.

Page 48: Introduction to Database

48

More Relational Model Vocabulary (Cont.)

• The number of fields (tuples) in a table is known as its degree.– Unary relations (1-tuples)– Binary relations (2-tuples)– Ternary relations (3-tuples)– N-ary relations (N-tuples)

• The number of records in a table is called its cardinality.

• The degree is a property of the schema, while the cardinality is a property of the instance.

Page 49: Introduction to Database

49

Degree and Cardinality degree

card

inal

ity

Page 50: Introduction to Database

50

Keys• A fields or set of fields that can be used to uniquely

identify all of the rows in a table is known as a key. • A key should not have any extraneous fields.

– E.g. if SocSecNum uniquely identifies a person, then you don’t need SocSecNum and LastName.

• A table may have more than one field or set of fields that serve this purpose, they are called collectively the candidate keys.

• One key is chosen from the candidate keys to be the primary key.

Page 51: Introduction to Database

51

Keys (Cont.)

• When choosing a primary key, make sure that it must be unique, as opposed to simply happening to be unique for the instance you have or have in mind.

• Because of redundancy issues, it should not contain too many fields or fields that might change.

• Be mindful of privacy issues, SocSecNum can be a bad choice.

• For the reasons above, one often introduces an ID field to serve as a primary field.

Page 52: Introduction to Database

52

Purpose of Keys

• Keys are used to – Uniquely identify a record as in a query– Sort the data– Establish relationships

• When one table’s key is found in another table for the purpose of establishing a relationship, it is known as a foreign key.

Page 53: Introduction to Database

53

References

• Database Systems Rob and Coronel• Database Systems, Connolly and Begg• SQL for Dummies, Taylor• http://www.metacard.com/wp1a.html • http://www.oracle.com/glossary/index.html?axx.html • http://www.itworld.com/nl/db_mgr/05072001/ • Concepts of Database Management, Pratt and Adamski• http://www.whatis.com

• http://www.webopedia.com