Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352...

48
Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is [email protected] Pearcey Centre Course CO24 Database Design using SQL

Transcript of Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352...

Page 1: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 1

My name is: Rod Simpson

My current office is C 4.46

My phone number is (03) 990 32352

My email is [email protected]

Pearcey Centre Course CO24 Database Design using SQL

Page 2: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 2

Introduction to Database Technology,and Database Design using SQL

Introduction to Database Technology,and Database Design using SQL

Objectives : To introduce you to– Database Technology

– RDB Management Systems

– The Relational Database Model

– Relational Database Design concepts

– Structured Query Language

– Data Warehousing

Page 3: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 3

Introduction to Database Technology,and Database Design using SQL

Introduction to Database Technology,and Database Design using SQL

• And some insight into some of the components and structure of the Oracle DBMS (version 8i)

Page 4: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 4

Database and Associated TopicsDatabase and Associated TopicsDatabase and Associated TopicsDatabase and Associated Topics

The objective of this lecture is to introduce you to a cross section of material which will be introduced over the next 9 lectures.

You will look at

- the scope of database,

- why this form of data management is so deeply entrenched in the Information Technology world,

- the different ‘sizes’ of database - and the reasons for this

- the aspects of security, recovery, accuracy and integrity - and some of the advantages and disadvantages of database

technology

Page 5: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 5

SQL DevelopmentSQL Development

There will be some selected and appropriate SQL commands (user level), and examples will be included in the lecture material

AND

there will be some exercises based on SQL and its functions each laboratory session.

There will also be some discussions and review material at the laboratory sessions

Page 6: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 6

Database TheoryDatabase Theory

Why database?

Data is a valuable corporate resource which needs

accuracy,

consistency,

and security controls.

Page 7: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 7

Database TheoryDatabase Theory

The ‘centralised’ control of data means that for many applications the data will already exist, and facilitate quicker development.

Data will no longer be related by application programs, but by the structure defined in the database.

And this also means Easier, Faster and Less

Costly User System Maintenance

Page 8: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 8

Consider some of the problems of traditional file systems.

In the the past as new applications were written, they either used existing files, or created a new file or files for their use.

Frequently, several existing files needed to be sorted and merged to obtain the new file. Thus, it is probable that several files contained the same information stored in different ways. In other words, there would have been redundant and possibly inconsistent data.

Consider the files for an insurance company

Traditional File Systems

POLICYNUMBER

POLICYHOLDER

PREMIUMS data ADDRESS

PREMIUM-PA

PREMIUM-TOTAL

POLICYNUMBER

POLICYHOLDER

AGENCY data ADDRESS

AGENT-CODE

RENEWAL-DATE

RENEWAL-AMT

Page 9: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 9

Information / DataInformation / DataInformation / DataInformation / Data

A General Definition:

Data: raw (unprocessed or part-processed) facts which represent the state of entities (things) which have occurred.

Information: data which has been processed into a form useful to the user.

What is information to one user, may be data to another user.

Page 10: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 10

Basic DefinitionsBasic Definitions

Database: A collection of related data

Data: Known facts that can be captured and recorded

Schema: Some part of the real world about which data is stored in the database.

Database Management System(DBMS):A software package to facilitate the creation and maintenance of a computerised database.

Page 11: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 11

What is a Database ?What is a Database ?

A DATABASE is a shared collection of

Inter-related data which is designed to meet the needs of multiple types of users and applications.

Thus the concept of USER VIEWS

• Data stored is INDEPENDENT of the programs which use it

• Data is structured to provide a foundation for future applications

• Data may be physically distributed

Page 12: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 12

Data Base Management SystemData Base Management System

The Primary Objectives of a DBMS are to providefacilities for :

1. Definition of Database Logical Structures

2. Definition of Physical Structures

3. Access to the Database

4. Definition of Storage Structures to store user data

These components are known as the ‘database architecture’

Page 13: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 13

DataBase Management SystemDataBase Management System

• Software - Provides access to a database in an integrated and controlled manner.

• Must contain (1) Definition/Structure capabilities

(2) Data manipulation capabilities

Page 14: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 14

DBMS ComponentsDBMS ComponentsDBMS ComponentsDBMS Components

1. Data Description Language (DDL)

- used to describe data at the database level

2 levels (1) Schema - complete description of a

database

(2) Sub-Schema - user view

2. Data Manipulation Language (DML)

Provides for Create, Insert, Delete, Drop,

Retrieve, Report, Update, Modify

Calculate (derive)

---> Common term ‘ QUERY’

Page 15: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 15

Three Schema ArchitectureThree Schema Architecture

ANSI & ISO suggest that a DBMS should have three schemas

Conceptual Schema - the global logical model of the data and processing of the enterprise. i.e. community user view.

External Schema(s) - the logical application views of the Conceptual Schema. i.e. individual user views.

Internal Schema - the internal level storage view.

Page 16: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 16

Data Base ArchitectureData Base Architecture

3 Schema Architecture

1. User Views - External Schema

2. Complete Database - Conceptual Schema

3. Physical Database - Internal Schema

Page 17: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 17

Three Schema ArchitectureThree Schema Architecture

External Schema 1

External Schema 2

External Schema n

ConceptualSchema

InternalSchema

Page 18: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 18

Application DevelopmentApplication Development

Applications and their data needs are not considered in isolation.

Centralised control of one or several databases takes place. i.e. database administration.

Data administration is seen as an important part of system development.

CLAIMS

PREMIUMS

CLAIMS

PREMIUMS

DBMS

Page 19: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 19

Data IntegrityData Integrity

Validation or integrity rules may be defined and automatically invoked at run time by the DBMS regardless of the source of update i.e. application program, 4GL screen or query language.

Significant variation exists among DBMS in the level of support for semantic data integrity.

ISO suggest that 100% of all enterprise rules should be held in the conceptual schema, and specifically none in application programs.

An area of significant development during the 1990's.

Page 20: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 20

Data IntegrityData Integrity

CATALOGUEData Definitions &

Integrity Rules

STORED DATA

DBMS

ApplicationPrograms

4GL Screens &Stored Pros.

InteractiveQuery

Language

Page 21: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 21

Inter-Related DataInter-Related Data

CLAIMS

RENEWALS

AGENCY

DBMS

RENEWALS

CLAIMS AGENCY

QUERY

Data related by structure

Flexible enquiry easier

Page 22: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 22

Multiple ApplicationsMultiple Applications

CLAIMSAGENCY

DATABASELOCALVIEWS

RENEWALS

Page 23: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 23

Important Database Functions (1)Important Database Functions (1)

Data Integrity

Data Independence

Referential Integrity

Concurrency Control

Database Consistency• Multi Users• Distributed Database• Replicated Database• Partitioned Database

Page 24: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 24

Important Database Functions(2)Important Database Functions(2)

Recovery from Failure• Transaction• Media

Determinancy• Consistent Results• Respond to ALL events• and cater for unpredictable order

Scalability

Page 25: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 25

Database EnvironmentDatabase Environment

Databases Can Be:• Transaction Intensive Databases

• Decision Support Databases

• Mixed Load Databases

• Small Databases

• VLDB - Very Large Databases

• Non-traditional Databases - weather forecasting

Page 26: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 26

The Many Faces of DatabaseThe Many Faces of Database

They can be:

Data Warehouses

Data Marts (and Data Martlets)

How is a database size measured ?

There are a number of ‘measurements’

Raw data size

Total database size

Total usable disk space size (which includes media protection such as mirroring)

Page 27: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 27

The Many Faces of DatabaseThe Many Faces of Database

Hardware Database Raw Data Total Disk

HP9000 Oracle 100GB 643GBDigital 8400 Oracle 100GB 361GB

IBM SP2 DB2/6000 100GB 377GB

NCR5100 Teradata 100GB 880GB

NCR5100 Teradata 1,000GB 3,280GB

Page 28: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 28

The Many Faces of DatabaseThe Many Faces of Database

The first databases were stored on large centralised mainframe computers.

They were accessed from terminals which had no processing capability

As distributed computing and microcomputers became available during the early 1980’s, 2 new kinds of databases emerged :

personal databases

client/server databases

Page 29: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 29

The Many Faces of DatabaseThe Many Faces of Database

Personal databases (Microsoft Access and FoxPro) are aimed at the single-user database applications which are stored on the single user’s desktop computer - a client workstation

When a personal DBMS is used for a multiuser application,the database application files are stored on a file server and transmitted to the individual users across a network.

A Server refers to any computer able to accept requests from other computers and to share some or all of its resources such as printers, files, programs,

Page 30: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 30

The Many Faces of DatabaseThe Many Faces of Database

A network is an infrastructure of telecommunications hardware and software which enables computers to transmit messages to each other

With a personal DBMS, each client workstation must load the entire application into memory along with the client database application in order to view, insert, update or print .

A client request for a small amount of data from a large database might require the server to transmit the entire database to the client’s workstation.

Page 31: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 31

The Many Faces of DatabaseThe Many Faces of Database

Newer personal databases use indexed files which enable the server to send only part of the database. In either case there is a heavy demand on client workstations and on the network.

Page 32: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 32

The Many Faces of DatabaseThe Many Faces of Database

Client/server databases split the DBMS and the applications into a ‘process’ running on the server and the applications running on the client.

The client application sends data requests across the network.

When the server receives a request, the server DBMS process retrieves the data from the database, performs the requested functions, and sends only the final query results back via the network to the client.

This generates less network traffic than personal databases.

Page 33: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 33

The Many Faces of DatabaseThe Many Faces of Database

Another important difference between client/server and personal databases is in the handling of client failures.

In a personal database system, when a client workstation fails, the database is likely to be damaged due to interrupted updates, deletes, insertions.

Records in use at the failure time are locked. They are unavailable to other users. The database may be able to be repaired, but all users must log off during the repair process.

Often the processes active at the time of failure cannot be reconstructed. The database must be restarted to the last regular backup, but transactions since that backup are not automatically available (normally)

Page 34: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 34

The Many Faces of DatabaseThe Many Faces of Database

A client/server database is not affected when a client workstation fails. The failed client’s in-process transactions are lost, but the failure of a single client should not affect other users.

In the case of a server failure, a central synchronised transaction log, which contains a record of all current database changes, enables in-progress transactions from all clients to be either fully completed or rolled back.

Page 35: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 35

The Many Faces of DatabaseThe Many Faces of Database

Rolling Back has the effect of the database never having processed the transactions. Client transactions can then be resubmitted. Most client/server database servers have additional features to minimise the risk of failure and have fast recovery mechanisms. It is a bit similar to the ‘undo’ which you have met in some of Microsoft’s office software

(there is a small exercise with commit and rollback in a few week’s time)

Page 36: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 36

The Many Faces of DatabaseThe Many Faces of Database

Client/server systems also differ in the way in which they handle competing transactions. A system of locking is normally applied which forces transaction other than the one current to wait until the lock is unset.

A personal database uses optimistic locking - there is the assumption that 2 or more competing transactions will not occur at the same time. User code can be written if this situation is not acceptable.

Transaction processing: This refers to the grouping of related database changes into batches which must either all succeed or all fail.

Page 37: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 37

DataBase EnvironmentDataBase Environment

All databases require:

– Querying Capabilities– Data Display facilities– Database navigation– Data entry (Initial Load, Transactions) – Data validation– Data deletion – Committing capability

Page 38: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 38

Database TransactionsDatabase Transactions

· Sometimes several database operations need to be treated as one atomic unit which may either succeed or fail.

EMPNO SALARY DEPT

E3 30,000 D2E4 60,000 D2E1 50,000 D1E2 18,000 D1

DEPT T0TAL SALARY

D1 68,000D2 90,000

EMPBUDGET

To keep the budget correct, any alteration to EMP wouldneed to flow onto (into ?) BUDGET

Page 39: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 39

Concurrency ControlConcurrency Control

· The DBMS should support multiple concurrent users of the same data and ensure that the data remains consistent at all times.

Part 2 QOH 10

Part 2 QOH 5

Part 2 QOH 20

Supply 5 items

QOH=QOH-5

TX 1 TX 2

Delivery of 10 items

What is the correct result ?

Page 40: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 40

SecuritySecurity

Each user may require identification with a user-id and password.

Users may be limited in the data they can see and what actions they can perform on that data.

The DBMS may encrypt and decrypt data as it is stored and retrieved.

Many systems now provide data value sensitive security.

There is an article on ‘security’ in about Week 5.

Page 41: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 41

Disadvantages of Database ProcessingDisadvantages of Database Processing

• Complexity

• Expense

• Vulnerability

• Size

• Training Costs

• Compatibility

• Technology Lock-In

Page 42: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 42

Advantages of Database ProcessingAdvantages of Database Processing

• Reduction in Data Redundancy

• Data Integrity

• Data Independence

• Data Security

• Data Consistency

• Easier Use of Data via DBMS Tools (Query Language, 4GL’s

• Less Disk Storage

Page 43: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 43

Costs Associated with DatabaseCosts Associated with Database

The initial purchase cost

Planning and design

Database education and training

Application and data conversion

System overheads (response)

Management and Administration

Complexity of support

Page 44: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 44

The UsersThe Users

So, who are the users ?

There are 4 main groups

1. Unsophisticated or ‘naïve’ users

They interact with the system by invoking one of the application programs which have been written

as part of the design and implementation processes.

E.g. a person wishing to find a bank account balance uses an ATM or Web program which has a

‘form’ the person can complete and ‘send’.

The balance detail will be returned.

Page 45: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 45

The UsersThe Users

2. Application Programmers

Normally these are computer professionals who write application programs. They can choose from

many tools to develop the interfaces. RAD’s for instance are tools which enable a programmer to construct forms and reports.

There are languages which combine imperative control structures (for loops, if-the-else statements) with statements of the data manipulations language. (known as 4th generation languages).

Page 46: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 46

The UsersThe Users

3. ’Sophisticated’ users.

They interact with the system without writing programs. They develop their database requests using a

database query language.

The queries are submitted to a query processor, which interprets the query and converts it into instructions. (non-procedural language).

On line analytical processing (OLAP) tools simplify analysts’ tasks by the ‘viewing’ of results in a variety of ways.

E.g. sales by region, or region and product, or by city with a region.

Another class of tools is found in Data Mining applications

Page 47: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 47

The UsersThe Users

4. Specialised Users.

These are sophisticated users who who write specialised database applications which don’t fit into the ‘traditional’ or ‘normal’ data processing framework.

Computer aided design, knowledge based and expert systems. Systems which store data with complex data types such as graphics and audio data, and environment modelling systems - such as the Country Fire Authority and the Ambulance systems.

These are gaining in popularity and use.

Page 48: Pclec 01 / 1 My name is: Rod Simpson My current office is C 4.46 My phone number is (03) 990 32352 My email is rod.simpson@csse.monash.edu.au Pearcey Centre.

Pclec 01 / 48

And that’s it for the first session !