Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

27
Databases and DBMS Paul Wong Paul Wong bld39 rm6 bld39 rm6 Tues 12:30-2:30 Tues 12:30-2:30 Thurs 8:30-1030 Thurs 8:30-1030
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Page 1: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Databases and DBMS

Paul WongPaul Wong

bld39 rm6bld39 rm6

Tues 12:30-2:30Tues 12:30-2:30

Thurs 8:30-1030Thurs 8:30-1030

Page 2: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

A Brief Recap Data: individual factsData: individual facts

Information: Information: organisation of one or more pieces of organisation of one or more pieces of data to answer a complex questiondata to answer a complex question

Knowledge: inferred and applied information to Knowledge: inferred and applied information to establish useful patternsestablish useful patterns

Page 3: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Database System

Databases store data that has a common structure Databases store data that has a common structure For example For example

the customers in a bank, the customers in a bank, the car parts in an inventory system, the car parts in an inventory system, the books in a librarythe books in a library the students enrolled in a Universitythe students enrolled in a University

For every customer we need more or less the same For every customer we need more or less the same data: Name, address, phone, account number etc.data: Name, address, phone, account number etc.

Page 4: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Database Concepts Each of the things we want to know about a Each of the things we want to know about a

customer (customer (car part, book or student) car part, book or student) is known as an is known as an attribute and is stored in a separate “attribute and is stored in a separate “fieldfield” ” Name and phone numbers are fieldsName and phone numbers are fields

Name could be divided into three fieldsName could be divided into three fields Title, Family Name and Given Name Title, Family Name and Given Name

All of the fields that describe a customer are All of the fields that describe a customer are arranged to form a “arranged to form a “recordrecord”” G. Smith, 3 Square St, Perth, 5554321, C50716G. Smith, 3 Square St, Perth, 5554321, C50716

A set of records is known as a “A set of records is known as a “filefile”. A “”. A “databasedatabase” ” may be a single file but usually it is several filesmay be a single file but usually it is several files

Page 5: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Databases, flat files and tables When the data in an When the data in an individualindividual file is displayed it file is displayed it

looks like this:looks like this:

ID Num Name Size Type

1 0175 Nail 15mm Clout

2 0254 Nut 35mm Hex head

Such files only have two dimensions and are Such files only have two dimensions and are called flat files or tablescalled flat files or tables ..

Some applications only need a single flat fileSome applications only need a single flat file

Page 6: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Field attributes Each field in the record must have:Each field in the record must have:

a unique name (not a unique identifier, see later)a unique name (not a unique identifier, see later) Part_name, Customer_name, PhoneWork, PhoneHomePart_name, Customer_name, PhoneWork, PhoneHome

a field type - number, character, date, logical etca field type - number, character, date, logical etc numbers may be defined as integers or decimals for most numbers may be defined as integers or decimals for most

business databases, but floating-point may be required business databases, but floating-point may be required for scientific datafor scientific data

logical is just Yes or No (or True and False) logical is just Yes or No (or True and False) useful for indicating status: Payment received - Nouseful for indicating status: Payment received - No

a field width - the number of spaces requireda field width - the number of spaces required

Each record usually has a unique number (identifier)Each record usually has a unique number (identifier)

Page 7: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Transaction systems If you were building a system that stored data about If you were building a system that stored data about

which parts a customer had purchased, the system which parts a customer had purchased, the system would be a Transaction Processing System (TPS)would be a Transaction Processing System (TPS)

You could do this with a single file but it would be You could do this with a single file but it would be VERY redundant. This is called VERY redundant. This is called data redundancydata redundancy problem. Redundancies can be wasteful and degrade problem. Redundancies can be wasteful and degrade performance of the system. BUT, it may lead to more performance of the system. BUT, it may lead to more serious problems.serious problems.

Page 8: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Name Phone Address Name Type Size

1 G.Smith 5554321 Nut hex head 35mm

2 G.Smith 5554321 Nail clout 15mm

3 P.Ng 5556789 Nut hex head 35mm

4 P.Ng 5556789 Nail clout 15mm5

Every time we make a sale, we have to enter all the Every time we make a sale, we have to enter all the customer’s details and all the details for every part. If a customer’s details and all the details for every part. If a customer changes his address, it is now wrong on every customer changes his address, it is now wrong on every previous sale - which address do we use? Hence previous sale - which address do we use? Hence redundanciesredundancies may lead to may lead to data inconsistencies.data inconsistencies.

Page 9: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Database organisation We can solve all these problems we need to rethink about We can solve all these problems we need to rethink about

how we structure our data to avoid these problems.how we structure our data to avoid these problems. One possibility is to use multiple files, one for each entity One possibility is to use multiple files, one for each entity

that we are interested in.that we are interested in. In our previous example, we need a customer file, a parts In our previous example, we need a customer file, a parts

file and a sales file. file and a sales file.

These files form a database that can beThese files form a database that can be relational - based on relational - based on tablestables structure structure hierarchical - based on hierarchical - based on treestrees structure structure network - based on network - based on graphsgraphs structure structure

Relational are the most common so we’ll stick to those but Relational are the most common so we’ll stick to those but the other two are also useful. The underlying structures the other two are also useful. The underlying structures are all based on precisely defined concepts. So they have are all based on precisely defined concepts. So they have special meaning.special meaning.

Page 10: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Relational database terminology Relational databases use slightly different termsRelational databases use slightly different terms

The “files” in a relational database are known as The “files” in a relational database are known as tablestables and each table describes a collection of real world and each table describes a collection of real world entityentity, , e.g. the collection of customers, the collection of parts etc.e.g. the collection of customers, the collection of parts etc.

Each table consists of a set of records or Each table consists of a set of records or tuplestuples

Each tuple is made up of fields or Each tuple is made up of fields or attributesattributes

The name relational refers to the fact that special The name relational refers to the fact that special relationshipsrelationships are defined between the tables or entities in are defined between the tables or entities in the database, e.g. sales file. These relationships are known the database, e.g. sales file. These relationships are known as as

““one to one”, “one to many” and “many to many”one to one”, “one to many” and “many to many”

Page 11: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

How Does it Work?How Does it Work?

G. Smith,1 Bruce Rd

nut, 35mm

nail, 15mm

customers parts

sales

Page 12: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

File organisation Unlike text files, DB files have their own internal Unlike text files, DB files have their own internal

structure - the record. Records in a DB file are often structure - the record. Records in a DB file are often stored in sequential order e.g. part number order. stored in sequential order e.g. part number order. This is useful for reporting on all parts etc.This is useful for reporting on all parts etc.

When searching a DB file, we often want a single When searching a DB file, we often want a single record or group of records with a common property. record or group of records with a common property. In this case, it is better to store the file for direct In this case, it is better to store the file for direct access - the address of every record is known in access - the address of every record is known in advance. If we know we want part #512, we can go advance. If we know we want part #512, we can go directly to the record for that part.directly to the record for that part.

Direct access files use relative or absolute addressesDirect access files use relative or absolute addresses

Page 13: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Indexing Obviously, we cannot actually put a file in order Obviously, we cannot actually put a file in order

of two different fields e.g. part_num & part_nameof two different fields e.g. part_num & part_name

If we need to search on both these fields we can If we need to search on both these fields we can build an index file for each one. build an index file for each one.

An index file could have all part_nums in order An index file could have all part_nums in order and beside each, the corresponding record numberand beside each, the corresponding record number

An index on part name would have every part An index on part name would have every part name in alphabetical order with a corresponding name in alphabetical order with a corresponding record numberrecord number

Page 14: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Deletion and indexing When we delete a record in a database, we only When we delete a record in a database, we only

mark it as deleted - it still exists and it still has a mark it as deleted - it still exists and it still has a valid record number.valid record number.

When we purge deleted records, the record that When we purge deleted records, the record that used to be number 23, could become 21.used to be number 23, could become 21.

We should regenerate the index files immediately We should regenerate the index files immediately after any purge procedureafter any purge procedure

Page 15: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

DBMS Organisations often have a number of applications that use Organisations often have a number of applications that use

overlapping dataoverlapping data

Our customer file may be used by both the sales Our customer file may be used by both the sales department and by the marketing department. department and by the marketing department.

When the Sales TPS was written, it may have used the When the Sales TPS was written, it may have used the field name: Customer_IDfield name: Customer_ID

While the Marketing application uses: CustIdWhile the Marketing application uses: CustId

The names must coincide exactly with the field names in The names must coincide exactly with the field names in the database. But this makes application development the database. But this makes application development more difficult.more difficult.

Page 16: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Database Management Systems - DBMS Make it possible for different applications to use Make it possible for different applications to use

different names for the same fields and entitiesdifferent names for the same fields and entities

Alternative names are called aliases and are stored in Alternative names are called aliases and are stored in a data dictionary.a data dictionary.

The data dictionary also contains all the definitions The data dictionary also contains all the definitions of tables and fields, these are called metadata – data of tables and fields, these are called metadata – data about the structure of dataabout the structure of data

Application developers do not need to inspect each Application developers do not need to inspect each table or file to find out the type or length of a field, table or file to find out the type or length of a field, they simply use the data dictionarythey simply use the data dictionary

Data dictionaries also contain access rightsData dictionaries also contain access rights

Page 17: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Goals of DBMS Data efficiencyData efficiency

reduces redundancy of data reduces redundancy of data compared to papercompared to paper ??? ??? Access flexibilityAccess flexibility

provides simultaneous access to many usersprovides simultaneous access to many users Data integrity Data integrity

the quality of the data is more reliable, up to datethe quality of the data is more reliable, up to date Data independenceData independence

developers interact with the DBMS, not the datadevelopers interact with the DBMS, not the data Data securityData security

granting of authorization to modify data, access granting of authorization to modify data, access etc.etc.

Page 18: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Database Administrator (DBA) Documents all databases and database applications Documents all databases and database applications

(as they affect the DBMS)(as they affect the DBMS) Set up new databases and integrate new database Set up new databases and integrate new database

applicationsapplications Modifying database structuresModifying database structures Notifying staff of database alterationsNotifying staff of database alterations Monitoring and tuning the databasesMonitoring and tuning the databases Carries out routine functions like purging and re-Carries out routine functions like purging and re-

indexingindexing Ensures correct access for authorised usersEnsures correct access for authorised users Selects DBMS and develops policiesSelects DBMS and develops policies

Page 19: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

User and developer views Many DB packages (engines), particularly for the Many DB packages (engines), particularly for the

PC provide the developer with the ability to view PC provide the developer with the ability to view all the records or step through the records one at a all the records or step through the records one at a time. The records can be edited interactively.This time. The records can be edited interactively.This is handy BUT dangerousis handy BUT dangerous

e.g an adventurous user could delete all index e.g an adventurous user could delete all index files or delete the primary key from a table, files or delete the primary key from a table, because he or she “never uses them”because he or she “never uses them”

Users do NOT usually access the database engine Users do NOT usually access the database engine itself. Instead, a DB application provides each itself. Instead, a DB application provides each user with the required data and functionsuser with the required data and functions

Page 20: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

DB procedural languages & SQL Many DB packages (DB4, Oracle etc) provide their Many DB packages (DB4, Oracle etc) provide their

own procedural languages to access the DBown procedural languages to access the DB

Many different languages made it difficult. IBM Many different languages made it difficult. IBM developed the Structured Query Language which is developed the Structured Query Language which is now widely used by DB systemsnow widely used by DB systems

SELECT ITEM, PRICESELECT ITEM, PRICE

FROM SALES_FILEFROM SALES_FILE

WHERE NUMBER .3WHERE NUMBER .3

ORDER BY ITEMORDER BY ITEM

This produces a list of the items and prices for all This produces a list of the items and prices for all sales where more than 3 items were purchased. The sales where more than 3 items were purchased. The list is in alphabetical order on item namelist is in alphabetical order on item name

Page 21: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Uses of SQL If a user typed in the code we just saw, we would If a user typed in the code we just saw, we would

say it was say it was interactive SQLinteractive SQL use use

SQL commands can also be embedded in other SQL commands can also be embedded in other procedural languages, like Cobol - procedural languages, like Cobol - embedded SQLembedded SQL

Although intended to be simple, SQL was still too Although intended to be simple, SQL was still too hard for most managers to usehard for most managers to use

Complex queries need to “join” relational tables to Complex queries need to “join” relational tables to get the required result. This is complex and slowget the required result. This is complex and slow

Page 22: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Access control and Data sharing DBs, generally, and SQL, specifically, provide DBs, generally, and SQL, specifically, provide

mechanisms to control who gets access to data.mechanisms to control who gets access to data.

Remember that DBs allow multiple users to share Remember that DBs allow multiple users to share the data simultaneously. This has some problems.the data simultaneously. This has some problems.

If one user accesses a client RECORD to change a If one user accesses a client RECORD to change a the phone number the phone number whilewhile another user is changing another user is changing the address, they BOTH have copies of the record the address, they BOTH have copies of the record open. The user who saves the record first will lose open. The user who saves the record first will lose the changes made when the second user saves his the changes made when the second user saves his or her record. or her record.

Smith, 5554321, 3 Big St PerthSmith, 5554321, 3 Big St Perth

Page 23: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Concurrency When two or more users access a record When two or more users access a record

simultaneously it is called concurrency. simultaneously it is called concurrency.

DBMS provide file locking and record locking to DBMS provide file locking and record locking to overcome concurrency problems. overcome concurrency problems.

Several users can view the record without problemsSeveral users can view the record without problems

The first person to use a system function to EDIT or The first person to use a system function to EDIT or DELETE a record also places a lock on that record. DELETE a record also places a lock on that record. No other user can then start to edit or delete that same No other user can then start to edit or delete that same record until the first user has finished record until the first user has finished

This causes delays and doesn’t solve all problemsThis causes delays and doesn’t solve all problems

Page 24: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Object oriented DBs Object Oriented (OO) DBs store special data Object Oriented (OO) DBs store special data

called objects. An object contains not only the called objects. An object contains not only the data but the procedures associated with that data. data but the procedures associated with that data.

They are more flexible than traditional databases They are more flexible than traditional databases and can handle more complex queries than SQLand can handle more complex queries than SQL

Page 25: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Graphics/video - MDDB Most early databases were text based - all of the Most early databases were text based - all of the

entities described in all the fields were either entities described in all the fields were either characters or numberscharacters or numbers

More modern databases often include graphics - More modern databases often include graphics - e.g. mug shots, house photos etc. They can also e.g. mug shots, house photos etc. They can also include video clips and sound filesinclude video clips and sound files

One recent development is the MultiDimensional One recent development is the MultiDimensional database (MDDB) which allows users to rapidly database (MDDB) which allows users to rapidly analyse statistics about a company’s performanceanalyse statistics about a company’s performance

e.g. no. of fridges sold in each state last monthe.g. no. of fridges sold in each state last month

Page 26: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Client Server Many of the DBs we have seen use a powerful search Many of the DBs we have seen use a powerful search

engine to access data. This engine may be located on engine to access data. This engine may be located on a remote computer whose task is to serve up the a remote computer whose task is to serve up the requested data - a DB serverrequested data - a DB server

As PCs have become more powerful, they have taken As PCs have become more powerful, they have taken on some of the processing task from DB servers. on some of the processing task from DB servers. They do this by running their own “client” software They do this by running their own “client” software which integrates with the DB serverwhich integrates with the DB server

The client asks for a set of records and then it orders The client asks for a set of records and then it orders them and filters out unwanted datathem and filters out unwanted data

The DB engine/server is often called the “back end” The DB engine/server is often called the “back end” and the client software is called the “front end”and the client software is called the “front end”

Page 27: Databases and DBMS Paul Wong bld39 rm6 Tues 12:30-2:30 Thurs 8:30-1030.

Data Warehouses (DW) & the Web Many organisations have a lot of DBs and DB Many organisations have a lot of DBs and DB

applications. Frequently, the data in these DBs has applications. Frequently, the data in these DBs has grown in an ad hoc fashion and cannot be easily grown in an ad hoc fashion and cannot be easily integrated. To overcome this, may organisations integrated. To overcome this, may organisations are now developing data warehouses, where are now developing data warehouses, where snapshots of the active DBs can be storedsnapshots of the active DBs can be stored

Many organisations prefer to provide access to Many organisations prefer to provide access to DWs through browser based applications. This DWs through browser based applications. This allows people all over the organisation to access allows people all over the organisation to access the data through a common interfacethe data through a common interface