Data administration

13
CHAPTER 6: DATA ADMINISTRATION Prof. Erwin M. Globio, MSIT 6 - 1 Chapter Objectives At the end of this chapter, you should be able to: define data administration, database administration, locking, versioning, deadlock, transaction; define the difference between data administration and database administration; describe the function of a DBMS and its major components; describe the optimistic and pessimistic systems of concurrency control; describe the problem of database security and the techniques to enhance security; describe the problem of database recovery and the facilities to recover database. Essential Reading Modem Database Management (4th Edition), Fred R. McFadden & Jeffrey A. Hoffer (1994), Benjamin/Cummings. [Chapter 12, page 425 - 458] Fundamentals of Database Systems, Ramez Elmasri & Shamkant B.Narathe (1989), Benjamin/Cummings. Practical Database Techniques, S. Misbah Deen. Useful Websites to learn Database and Programming: http://erwinglobio.wix.com/ittraining http://ittrainingsolutions.webs.com/ http://erwinglobio.sulit.com.ph/ http://erwinglobio.multiply.com/

Transcript of Data administration

CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 1

Chapter Objectives At the end of this chapter, you should be able to:

define data administration, database administration, locking, versioning, deadlock,

transaction;

define the difference between data administration and database administration;

describe the function of a DBMS and its major components;

describe the optimistic and pessimistic systems of concurrency control;

describe the problem of database security and the techniques to enhance security;

describe the problem of database recovery and the facilities to recover database.

Essential Reading Modem Database Management (4th Edition), Fred R. McFadden & Jeffrey A. Hoffer (1994),

Benjamin/Cummings. [Chapter 12, page 425 - 458]

Fundamentals of Database Systems, Ramez Elmasri & Shamkant B.Narathe (1989),

Benjamin/Cummings.

Practical Database Techniques, S. Misbah Deen.

Useful Websites to learn Database and Programming:

http://erwinglobio.wix.com/ittraining

http://ittrainingsolutions.webs.com/

http://erwinglobio.sulit.com.ph/

http://erwinglobio.multiply.com/

DB212 CHAPTER 6: DATA ADMINISTRATION

6 - 2 Prof. Erwin M. Globio, MSIT

6.1 Data and Database Administrator

6.1.1 Introduction

There are many causes of poor data utilization:

Multiple definitions of the same data entity and inconsistent representations of the same

data elements in separate database, which makes linking data across different.

Missing key data elements, which makes existing data useless.

Low levels of data quality due to inappropriate sources of data or timing of data transfers

from one system to another.

Not knowing what data exist, where to find them, and what they really mean. Therefore,

the data administration function is essential to the success of managing the data resource.

6.1.2 Data Administration

A high-level function that is responsible for the overall management for the overall

management of data resources in an organization, including maintaining corporate-wie

definitions and standards.

6.1.3 Database Administration

A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and

recovery.

6.1.4 Functions of Data and Database Administration

There are 6 stages in the life cycle of a typical database system:

Database planning

This develops a strategic plan fro database development that supports the overall

organizational business plan. This is usually is the responsibility of top management.

Database analysis

The process of analysis is concerned with identifying data entities currently used by the

organization and their relationships.

Database design

This develops a strategic plan for database development that supports the overall

organization business plan. This usually is the responsibility of top management.

DB212 CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 3

Operation and maintenance

This is a process to update the database to keep it current.

Growth and change

Data administrators must plan for charge, such as adding new record types,

accommodating growth. They must monitor the performance of the database and take

corrective actions whenever necessary.

The manner in which these functions are performed varies from one organization to the next

and is influenced by the use of specific methodologies and CASE tools.

6.2 DBMS

A DBMS is a software application system that is used to create, maintain, and to provide

controlled access to user databases.

6.2.1 Components of a DBMS

DBMS Engine

This is the central components of a DBMS which provides access to the repository and the database and coordinates all of the other functional elements of the DBMS.

Interface subsystem

The interface subsystem provides facilities for users and applications to access the various

components of the DBMS. Most DBMS products provide a range of languages and other

interfaces. The system is used by programmers and by users with little or no

programming experience.For examples:

A data definitions languages (DDL) which is used to define database structures such

as records, tables, files and views.

An interactive query language (such as SQL), which is used to display data extracted

from the database and to perform simple updates.

A graphic interface (such as Query-by example).

A DBMS programming language (such as dBASE IV command language or Access

Basic).

An interface to standard third-generation programming languages such as BASIC and

COBOL.

Information Repository Dictionary Subsystem

This is also known as the Data Dictionary which is used to manage and control access to

the repository.

DB212 CHAPTER 6: DATA ADMINISTRATION

6 - 4 Prof. Erwin M. Globio, MSIT

Performance Management Subsystem

This provides facilities to optimize DBMS performance. Two of its important functions

follow:

Query optimization: Structuring SQL queries to minimize response time.

DBMS reorganization: Maintaining statistics on database usage and taking actions

such as database reorganization, creating indexes.

Backup and Recovery SubsystemThis subsystem provides facilities for logging

transactions and database changes, periodically making backup copies of the database,

and recovering the database in the event of some type of failure.

Application Development SubsystemThis subsystem that provides facilities that allow

end users and programmers to develop complete database applications.

Security Management SubsystemThis subsystem provides facilities to protect and control

access to the database and repository.

6.3 Concurrency Control

This concerned with preventing loss of data integrity due to interference between users in a

multi-user environment.

6.3.1 Single-user versus Multi-user Systems

One criterion for classifying a database system is by the number of users who can use the

system concurrently. A DBMS is single-user if at most one user at a time can use the system

and is multi-user if many users can use the system concurrently.

In a multi-user DBMS, the stored data items are the primary resources that may be accessed

concurrently by user programs, which are constantly retrieving and modifying the database.

The execution of a program that accesses or changes the contents of the database is called a

transaction. The transactions submitted by the various users may execute concurrently and

may access and update the same database records. If this concurrent execution is controlled, it

may lead to problems such as an inconsistent database.

DB212 CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 5

6.3.2 Why Concurrency Control is Needed?

Problems

The lost update problem

Consider the situation illustrated in diagram below. That figure is intended to be read

as follow:

Transaction A Time Transaction B

---------------------------- -------------------------

1.Read account balance -------------------------

(Balance = $1,000) t1

--------------------------- 1.Read account balance

(Balance = $1,000)

2.Update record t2 ------------------------- (withdraw $200 and the

balance is $800) 2.Update record

--------------------------- t3 (withdraw $300 and the

balance is $700)

t4 --------------------------

ERROR!

Transaction A retrieve some record R at time t1;

Transaction B retrieves that same record R at the t2;

Transaction B updates the same record at time t4.

Thus transaction A's update is lost at time t4, because transaction B overwrites without even

looking at it. This means that the effect of B's update has been lost due to interference between the

transactions.

The temporary update problem

This occurs when one transaction updates a database item and then the transaction

fails for some reason. The updated item is accessed by another transaction before it is

changed back to its original value. For example, TI updates item X then fails before

completion, so the system must change X back to its original value. Before it does so,

transaction T2 reads the "temporary" value of X, which will not be recorded

permanently in the database because of the failure of T1.

Transaction 1 (T1)

Read item (X)

X = X – N

Write item (X)

read-item

transaction T1 fails and must change

the value of X back to its old value;

but meanwhile, T2 as read the

“temporary” incorrect value of X

Transaction 2 (T2)

Read-item (X)

X = X + M

Write-item (X)

DB212 CHAPTER 6: DATA ADMINISTRATION

6 - 6 Prof. Erwin M. Globio, MSIT

Inconsistent Analysis Problem

Another problem is when one transaction is calculating an aggregate summary

function on a number of records while other transactions are updating some of these

records. The aggregate function may calculate some values before they are updated

and others after they are updated. For example, suppose a transaction T3 is calculating the total number of reservations an all the flights, meanwhile, transaction

T1 is executing. If the interleaving of operations shown in figure below occurs, the

result of T3 will be off by an amount N because T3 reads the value of X after N seats

are subtracted from it and reads the value of Y before those N seats are added to it.

6.3.3 Basic Approaches to Concurrency Control

In short, concurrency control is concerned with preventing loss of data integrity due to

interference between users in a multi-user environment.

There are two basic approaches to concurrency control : a pessimistic approach and an

optimistic approach.

Locking (Pessimistic Approach)

Locking mechanisms are the most common type of concurrency control mechanism. With

looking, any data that is retrieved by a user for updating must be locked, or denied to

other user, until the update is completed.

Locking data is most like checking a cook out of the library. It is unavailable to other

until it is returned by the borrower.

There are many types of lock. The following is a different type/example of lock:

Shared locks

Shared locks (also called S locks, or read locks) allow other transaction to read (but not

update) a record (or other resource).

A transaction should place a shared lock on a record when it will only read (but not update) that record. With a shared lock, it prevents another user from placing an exclusive

lock on that record.

Transaction 1 (T1) Transaction 2 (T2)

Sum = 0

Read-item (A) Sum = Sum + A

Read-item (X)

X = X-N

Write-item (X)

Read-item (X)

Sum = sum + X

Read-item (Y)

Sum = sum + Y

Read-item (Y) Y = Y+N

Write-item (Y)

DB212 CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 7

Exclusive locks

Exclusive locks (also called X locks, or write locks) prevent another transaction from

reading (and therefore updating) a record until it is unlocked.

A transaction should place an exclusive lock on a record when it is about to update

that record. With an exclusive lock, it prevents another user from placing any type of

on that record.

Shared Lock(S lock) Exclusive Lock (X lock)

Shared Lock True False

Exclusive Lock False False

Deadlock

Locking (say at the record level) solves the problem of erroneous updates but may lead to another, called deadlock. This may result when two (or more) transaction

have locked a common resource and each must wait for the other to unlocks the

resource.

For example, user A has locked record X and user B has locked record Y. User A

then requests record Y and user B requests record X. Both requests are denied, since

the requested records are already locked. Thus, unless the DBMS intervenes, both

users will wait indefinitely.

User A Time User B

---------------------------- ------------------------

t1 ------------------------

1. Lock record X

t2 1.Lock record Y

--------------------------------- --------------------------

2. Request record Y t3 2. Requesr record X

:

t4 :

-------------------------------- (Wait for X)

(Wait for Y)

Managing deadlock

There are two basis ways to resolve deadlocks :

- Deadlock prevention

When deadlock prevention is employed, user programs must lock all records

they will required at the beginning of a transaction (rather than one at a

time).

- Deadlock resolution

This allows deadlocks to occur but build mechanisms into the DBMS for

deteching and breaking the deadlocks.

DB212 CHAPTER 6: DATA ADMINISTRATION

6 - 8 Prof. Erwin M. Globio, MSIT

Optimistic approach (Versioning)

This approach that most of the time other users do not want the same record, or it they do,

they only want to read the record. With versioning, there is no form of locking. Each

transaction is treated as a view of the database as when the transaction starts. When

transaction modifies a record, the DBMS creates a new record version instead of

overwriting the old record. If there is no conflict, this user 's changes are used to update the central database.

However, suppose there is a conflict such as two users have made conflicting changes to

their private copy of the database. Then, changes made by one of the users are committed

to the database.(Committed means after "successful" completion). The other user must be

told that there was a conflict and his work cannot be incorporated into the central

database. This update will be repeated again later.

The main advantage of versioning over locking is performance improvement as read-only

transactions can run concurrently with updating transaction.

User A reads the record containing the account balance, successfully withdraws $200 and

the new balance $800 is posted the account with a COMMIT statement. Meanwhile, user

B has also read the account record and requested a withdrawal. This is posted to her local

version of the account record. Therefore, when the transaction attempts to COMMIT, it discovers the update conflict and her transaction is aborted. The transaction can be

restarted later with the correct balance of $800.

6.3.4 Why Recovery Is Needed?

Whenever a transaction is submitted to a DBMS for execution, the system is responsible for

making sure that either (a) all operations in the transaction are completed successfully and

their effect is recorded permanently in the database or (b) the transaction has no effect on the database or any other transactions. The DBMS must not permit to let some operations of a

transaction T be applied to the database while other operations of T are not. However, this can

happen if a transaction fails after executing some of its operations by before executing all of

them.

Types of Failures

There are several possible reasons for a transaction to fail in the middle of execution. For

example :

Computer failure (system crush) : A hardware or software error occurs in the

computer system during transaction execution. If the hardware crashes, the contents

of the computer internal memory may be lost.

A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero.

Disk failure: Some disk blocks may lose their data because of a read or write

malfunction or because of a disk read/write head crash. This may happen during a

read or write operation of the transaction.

Physical problems and catastrophes:This is an endless list that includes power or air

conditioning failure,fire,theft sabotage,overwriting disks or tapes by mistake etc.

DB212 CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 9

6.4 Database Recovery

Database recovery means restoring a database quickly and accurately after loss and damage.

The basic recovery facilities includes :

Backup facility, which provide periodic backup copies of the entire database. The copy

should be stored in a secured location where it is protected from loss or damaged.

Journalizing facilities, which maintain an audit of transactions and database changes.

There are transaction log and database change log.

Transaction log contains a record of the essential data for each transaction that is processed against the database.

Database change log contains before- and after- images of records that have been

modified by transactions.

A checkpoint facility is when the DBMS periodically suspends all processing and

synchronizes its files and journals. Checkpoints should be taken frequently (say, several times an hour). When failures do occurs, it is often possible to resume processing from

the most recent checkpoint. Thus, only a few minutes of processing work must be

repeated. Consider the following example which shows the possible timings of

transactions in relation to the time of the crash and the time of the last checkpoint.

Database

Management

System

Database

(Current)

Transaction

log

Database

Change

log

Database

(backup)

T1

T2

T3

DB212 CHAPTER 6: DATA ADMINISTRATION

6 - 10 Prof. Erwin M. Globio, MSIT

Time of last checkpoint Time of crash

Transaction T1 was completed before the last checkpoint, so it will not be listed in

the checkpoint log record and will have no records in the log subsequent to the last

checkpoint.

Transaction T2 was currently active at the time of the last checkpoint so it will also

have a COMMIT or ABORT log record in the log file subsequent to the last

checkpoint.

Transaction T3 is also listed in the checkpoint record, but it has not completed by

the time of the failure, so it has no COMMIT or ABORT record in the log.

Transaction T4 was executed fully between the time of the last checkpoint and the

crash, so it has both a BEGIN TRANSACTION and a COMMIT or ABORT record in the log, subsequent to the last check-point record.

Transaction T5 was was begun after the checkpoint, but not completed. It therefore

has a BEGIN TRANSACTION, but no COMMIT or ABORT record, in the log

subsequent to the last checkpoint.

Therefore, at the time of crash, transaction T3 and T5 effects have to be undone, since

they are incomplete transaction. Transactions of type T1 has no problems, since they are

known to have completed and their updates are known to have been consolidated on the

databases at the time of the last checkpoint. Transaction of type T2 and T4 normally

present no problem but it is not known whether all the necessary updates have been

carried out on the database (some changed pages may still be in the buffers and

consequently been lost). Thus the system will have to check whether a complete updates

are done. If not, all the updates are undone, else if completed (commit), all updates are redone.

In short, this means redoing the effects of a transaction which had committed before the

crash, but after the last checkpoint; as well as undoing the effects of the incomplete

transactions at the point of crash.

A recovery manager, allows the DBMS to restore the database to a correct condition and

restart processing transactions.

T4

T5

DB212 CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 11

6.4 Database Security

Database security is defined as protection of the database against accidental or intentional loss,

destruction or misuse. Data administration uses several facilities provided by data

management software in carrying out these functions. These include:

Views or subschemas, which help to restrict user views of the database. For example:

CREATE VIEW ITEM-ORDER

AS SELECT ITEM-NAME, ORDER-NO

FROM ITEM, ORDER

WHERE ITEM.ORDER-NO = ORDER.ORDER-NO;

Authorization rules, which identify users and restrict the actions they may take against the

database. For example, using of password.

User-defined procedures, which defines additional constraints or limitations in using the

database. For example, user implements their password logging in their own PC.

Encryption procedures, which encodes data in an unrecognizable form. For example, in

the electronic funds transfer systems. The encryption procedures should also include

decoding facility.

Authentication schemas, which positively identify a person attempting to gain access to a

database.

DB212 CHAPTER 6: DATA ADMINISTRATION

6 - 12 Prof. Erwin M. Globio, MSIT

6.5 Review Questions

1. Contrast the following terms:

a. data administration vs database administration

b. deadlock prevention vs deadlock resolution

c. optimistic concurrency control vs pessimistic concurrency control

d. shared locks vs exclusive locks

2. Describe the DBMS facilities that are required for database backup and recovery.

3. For each of the situations describe below, indicate which of the following security measures is most important appropriate:

i. authorization rules

ii. encryption

iii. authentication schemes

a. A national brokerage firm uses a simple password system to protect its

database but finds it needs a more comprehensive system to grant

different privileges (such as read versus create or update) to different

users.

b. A manufacturing firm uses a simple password system to protect its

database but finds it needs a more comprehensive system to grant

different privileges (such as read versus create or update) to different

users.

c. A university has experienced considerable difficulty with unauthorized users

who access files and databases by appropriating passwords from legitimated

users.

DB212 CHAPTER 6: DATA ADMINISTRATION

Prof. Erwin M. Globio, MSIT 6 - 13

Prof. Erwin M. Globio, MSIT

Senior IT Trainer

Mobile Numbers: 09393741359 or 09323956678

Email Add: [email protected]

Skype Id: erwinglobio