Dbms notesization 2014

Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 1

Paper: BCA-302

DATABASE

MANAGEMENT

SYSTEM

DEPARTMENT OF COMPUTER SCIENCE

DEV SANSKRITI VISHWAVIDYALAYA, SHANTIKUNJ,HARIDWAR (UK)

July-Dec 2014. Notes-ization @ DSVV.


PREAMBLE

ACKNOWLEDGEMENTS

Department of Computer Science at Dev Sanskriti Vishwavidyalaya, Shantikunj,

Haridwar (Uttarakhand) was established in year 2006. Department started Bachelor

of Computer Applications (BCA) in year 2012. The serene and vibrant

environment of the university is a boon for the students. Academically they learn

new things everyday but along with that the curriculum of life management

induces virtues of humanities in them.

It was an initiative taken by students of BCA (2013-2016) batch to work in a team

and instead of doing revision only to do a prevision on the subject. They gave it a

name ―Notes-ization‖. Every one contributed to it as per his/her own caliber. But

finally it‘s an sincere effort by Manan Singh (Student BCA III Sem) to finally

make the work presentable and reliable to make the effort of his team mates

fruitful and worth significant. Special thanks to all the web sources. Thank you

every one for this inspirational work. Hope it will benefit one an all. Thanks again

for carrying the spirit of SHARE-CARE-PROSPER


TABLE OF CONTENTS

UNIT TOPICS UNIT 1 Introduction to Database: Definition of Database, Components

of DBMS, Three Level of Architecture proposal for DBMS,

Advantage & Disadvantage of DBMS, Data independence,

Purpose of Database Management Systems, Structure of DBMS,

DBA and its responsibilities, Data Dictionary, Advantages of

Data Dictionary.

UNIT 2 Data Models: Introduction to Data Models, Object Based

Logical Model, Record Base Logical Model- Relational Model,

Network Model, Hierarchical Model. Entity Relationship

Model, Entity Set, Attribute, Relationship Set. Entity

Relationship Diagram (ERD), Extended features of ERD.

UNIT 3.1 Relational Databases: Introduction to Relational Databases and

Terminology- Relation, Tuple, Attribute, Cardinality, Degree,

Domain. Keys- Super Key, Candidate Key, Primary Key,

Foreign Key.

UNIT 3.2 Relational Algebra: Operations, Select, Project, Union,

Difference, Intersection Cartesian product, Join, Natural Join.

UNIT 4 Structured Query Language (SQL): Introduction to SQL,

History of SQL, Concept of SQL, DDL Commands, DML

Commands, DCL Commands, Simple Queries, Nested Queries,

Normalization: Benefits of Normalization, Normal Forms-

1NF, 2NF, 3NF, BCNF & and Functional Dependency.

UNIT 5 Relational Database Design: Introduction to Relational

Database Design, DBMS v/s RDBMS. Integrity rule, Concept of

Concurrency Control and Database Security.


UNIT 1

INTRODUCTION TO DATABASE

Introduction to Database: Definition of Database, Components of DBMS, Three

Level of Architecture proposal for DBMS, Advantage & Disadvantage of DBMS,

Data independence, Purpose of Database Management Systems, Structure of

DBMS, DBA and its responsibilities, Data Dictionary, Advantages of Data

Dictionary.


DEFINITION OF DATABASE

A database can be summarily described as a repository for data. A database is structured

collection of data. Thus, card indices, printed catalogues of archaeological artifacts and

telephone directories are all examples of databases. It may be stored on a computer and

examined using a program. These programs are often called `databases', but more strictly are

database management systems (DMS).

Computer-based databases are usually organized into one or more tables. A table stores data in a

format similar to a published table and consists of a series of rows and columns. To carry the

analogy further, just as a published table will have a title at the top of each column, so each

column in a database table will have a name, often called a field name. The term field is often

used instead of column. Each row in a table will represent one example of the type of object

about which data has been collected.


COMPONENTS OF DBMS

A database management system (DBMS) consists of several components. Each component plays

very important role in the database management system environment. The major components of

database management system are:

Software

Hardware

Data

Procedures

Database Access Language

Software

The main component of a DBMS is the software. It is the set of programs used to handle the

database and to control and manage the overall computerized database

1. DBMS software itself is the most important software component in the overall system.

2. Operating system including network software being used in network, to share the data of

database among multiple users.

3. Application programs developed in programming languages such as C++, Visual Basic

that are used to access database in database management system. Each program contains

statements that request the DBMS to perform operation on database. The operations may

include retrieving, updating, deleting data etc. The application program may be

conventional or online workstations or terminals

Hardware

Hardware consists of a set of physical electronic devices such as computers (together with

associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices

that make interface between computers and the real world systems etc. and so on. It is impossible

to implement the DBMS without the hardware devices. In a network, a powerful computer with

high data processing speed and a storage device with large storage capacity are required as

database server.

Characteristics:

It is helpful to categorize computer memory into two classes: internal memory and external

memory. Although some internal memory is permanent, such as ROM, we are interested here

only in memory that can be changed by programs. This memory is often known as RAM. This

memory is volatile, and any electrical interruption causes the loss of data.

By contrast, magnetic disks and tapes are common forms of external memory. They are


Non-volatile memory and they retain their content for practically unlimited amounts of time. The

physical characteristics of magnetic tapes force them to be accessed sequentially, making them

useful for backup purposes, but not for quick access to specific data.

In examining the memory needs of a DBMS, we need to consider the following issues:

•Data of a DBMS must have a persistent character; in other words, data must remain available

long after any program that is using it has completed its work. Also, data must remain intact even

if the system breaks down.

•A DBMS must access data at a relatively high rate.

•Such a large quantity of data needs to be stored that the storage medium must be low cost.These

requirements are satisfied at the present stage of technological development only by magnetic

disks.

Data

Data is the most important component of the DBMS. The main purpose of DBMS is to process

the data. In DBMS, databases are defined, constructed and then data is stored, updated and

retrieved to and from the databases. The database contains both the actual (or operational) data

and the metadata (data about data or description about data).

Procedures

Procedures refer to the instructions and rules that help to design the database and to use the

DBMS. The users that operate and manage the DBMS require documented procedures on hot use

or run the database management system. These may include.

1. Procedure to install the new DBMS.

2. To log on to the DBMS.

3. To use the DBMS or application program.

4. To make backup copies of database.

5. To change the structure of database.

6. To generate the reports of data retrieved from database.

Database Access Language

The database access language is used to access the data to and from the database. The users use

the database access language to enter new data, change the existing data in database and to

retrieve required data from databases. The user writes a set of appropriate commands in a

database access language and submits these to the DBMS. The DBMS translates the user

commands and sends it to a specific part of the DBMS called the Database Jet Engine. The

database engine generates a set of results according to the commands submitted by user, converts


these into a user readable form called an Inquiry Report and then displays them on the screen.

The administrators may also use the database access language to create and maintain the

databases.

The most popular database access language is SQL (Structured Query Language). Relational

databases are required to have a database query language.

Users

The users are the people who manage the databases and perform different operations on the

databases in the database system. There are three kinds of people who play different roles in

database system

1. Application Programmers

2. Database Administrators

3. End-Users

Application Programmers

The people who write application programs in programming languages (such as Visual Basic,

Java, or C++) to interact with databases are called Application Programmer.

Database Administrators

A person who is responsible for managing the overall database management system is called

database administrator or simply DBA.

End-Users

The end-users are the people who interact with database management system to perform

different operations on database such as retrieving, updating, inserting, deleting data etc.


3 LEVEL OF ARCHITECTURE PROPOSAL OF DBMS

The logical architecture, also known as the ANSI/SPARC architecture, was elaborated at the

beginning of the 1970s. It distinguishes three layers of data abstraction:

1. The physical layer contains specific and detailed information that describe show data are

stored: addresses of various data components, lengths in bytes, etc. DBMSs aim to

achieve data independence, which means that the database organization at the physical

level should be indifferent to application programs.

2. The logical layer describes data in a manner that is similar to, say, definitions of

structures in C. This layer has a conceptual character; it shields the user from the tedium

of details contained by the physical layer, but is essential in formulating queries for the

DMBS.

3. The user layer contains each user‘s perspective of the content of the database.

The logical architecture describes how data in the database is perceived by users. It is not

concerned with how the data is handled and processed by the DBMS, but only with how it looks.

The method of data storage on the underlying file system is not revealed, and the users can

manipulate the data without worrying about where it is located or how it is actually stored. This

results in the database having different levels of abstraction.

The majority of commercial Database Management System available today is based on the

ANSI/SPARC generalized DBMS architecture, as proposed by the ANSI/SPARC Study Group

on Data Base Management Systems. Hence this is also called as the ANSI/SPARC model. It

divides the system into three levels of abstraction: the internal or physical level, the conceptual

level, and the external or view level.

The External or View Level: The external or view level is the highest level of abstraction of database. It provides a window on

the conceptual view, which allows the user to see only the data of interest to them. The user can

be either an application program or an end user. There can be many external views as any

number of external schemas can be defined and they can overlap each other. It consists of the

definition of logical records and relationships in the external view. It also contains the method of

deriving the objects such as entities, attributes and relationships in the external view from the

conceptual view.

The Conceptual Level or Global Level: The conceptual level presents a logical view of the entire database as a unified whole. It allows

the user to bring all the data in the database together and see it in a consistent manner. Hence,

there is only one conceptual schema per database. The first stage in the design of a database is to

define the conceptual view, and a DBMS provides a data definition language for this purpose. it

describes all the records and relationships included in the database.

The data definition language used to create the conceptual level must not specify any physical

storage considerations that should be handled by the physical level. It does not provide any

storage or access details, but defines the information content only.


The Internal or Physical Level: The collection of files permanently stored on secondary storage devices is known as the physical

database. The physical or internal level is the one closest to the physical storage and it provide a

low level description of the physical database, and an interface between the operating system file

system and the record structures used in higher level of abstraction. It is at this level that record

types and methods of storage are defined, as well as how stored fields are represented, what

physical sequence the stored records are in, and what other physical structures exist.


ADVANTAGES & DISADVANTAGES OF DBMS

Advantages of the DBMS:

The DBMS serves as the intermediary between the user and the database. The database structure

itself is stored as a collection of files, and the only way to access the data in those files is through

the DBMS. The DBMS receives all application requests and translates them into the complex

operations required to fulfill those requests. The DBMS hides much of the database‘s internal

complexity from the application programs and users.

The different advantages of DBMS are as follows:

1. Improved data sharing.

The DBMS helps create an environment in which end users have better access to more and

better-managed data. Such access makes it possible for end users to respond quickly to changes

in their environment.

2. Improved data security.

The more users access the data, the greater the risks of data security breaches. Corporations

invest considerable amounts of time, effort, and money to ensure that corporate data are used

properly. A DBMS provides a framework for better enforcement of data privacy and security

policies.

3. Better data integration.

Wider access to well-managed data promotes an integrated view of the organization‘s operations

and a clearer view of the big picture. It becomes much easier to see how actions in one segment

of the company affect other segments.

4. Minimized data inconsistency.

Data inconsistency exists when different versions of the same data appear in different places.

For example, data inconsistency exists when a company‘s sales department stores a sales

representative‘s name as ―Bill Brown‖ and the company‘s personnel department stores that same

person‘s name as ―William G. Brown,‖ or when the company‘s regional sales office shows the

price of a product as $45.95 and its national sales office shows the same product‘s price as

$43.95. The probability of data inconsistency is greatly reduced in a properly designed database.

5. Improved data access.

The DBMS makes it possible to produce quick answers to ad hoc queries. From a database

perspective, a query is a specific request issued to the DBMS for data manipulation—for

example, to read or update the data. Simply put, a query is a question, and an ad hoc query is a

spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to

the application. For example, end users, when dealing with large amounts of sales data, might

want quick answers to questions (ad hoc queries) such as:

- What was the dollar volume of sales by product during the past six months?

- What is the sales bonus figure for each of our salespeople during the past three months?

- How many of our customers have credit balances of $3,000 or more?


6.Improved decision making.

Better-managed data and improved data access make it possible to generate better-quality

information, on which better decisions are based. The quality of the information generated

depends on the quality of the underlying data. Data quality is a comprehensive approach to

promoting the accuracy, validity, and timeliness of the data. While the DBMS does not guarantee

data quality, it provides a framework to facilitate data quality initiatives.

7.Increased end-user productivity.

The availability of data, combined with the tools that transform data into usable information,

empowers end users to make quick, informed decisions that can make the difference between

success and failure in the global economy.

Disadvantages of Database:

Although the database system yields considerable advantages over previous data management

approaches, database systems do carry significant disadvantages. For example:

1. Increased costs.

Database systems require sophisticated hardware and software and highly skilled personnel. The

cost of maintaining the hardware, software, and personnel required to operate and manage a

database system can be substantial. Training, licensing, and regulation compliance costs are

often overlooked when database systems are implemented.

2. Management complexity.

Database systems interface with many different technologies and have a significant impact on a

company‘s resources and culture. The changes introduced by the adoption of a database system

must be properly managed to ensure that they help advance the company‘s objectives. Given the

fact that database systems hold crucial company data that are accessed from multiple sources,

security issues must be assessed constantly.

3. Maintaining currency.

To maximize the efficiency of the database system, you must keep your system current.

Therefore, you must perform frequent updates and apply the latest patches and security measures

to all components. Because database technology advances rapidly, personnel training costs tend

to be significant. Vendor dependence. Given the heavy investment in technology and personnel

training, companies might be reluctant to change database vendors. As a consequence, vendors

are less likely to offer pricing point advantages to existing customers, and those customers might

be limited in their choice of database system components.

4. Frequent upgrade/replacement cycles.

DBMS vendors frequently upgrade their products by adding new functionality. Such new

features often come bundled in new upgrade versions of the software. Some of these versions

require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs

money to train database users and administrators to properly use and manage the new features.


DATA INDEPENDENCE

A major objective for three-level architecture is to provide data independence, which means that

upper levels are unaffected by changes in lower levels.

There are two kinds of data independence:

• Logical data independence

• Physical data independence

Logical Data Independence

Logical data independence indicates that the conceptual schema can be changed without

affecting the existing external schemas. The change would be absorbed by the mapping between

the external and conceptual levels. Logical data independence also insulates application

programs from operations such as combining two records into one or splitting an existing record

into two or more records. This would require a change in the external/conceptual mapping so as

to leave the external view unchanged.

Physical Data Independence

Physical data independence indicates that the physical storage structures or devices could be

changed without affecting conceptual schema. The change would be absorbed by the mapping

between the conceptual and internal levels. Physical data independence is achieved by the

presence of the internal level of the database and the mapping or transformation from the

conceptual level of the database to the internal level. Conceptual level to internal level mapping,

therefore provides a means to go from the conceptual view (conceptual records) to the internal

view and hence to the stored data in the database (physical records).

If there is a need to change the file organization or the type of physical device used as a result of

growth in the database or new technology, a change is required in the conceptual/ internal

mapping between the conceptual and internal levels. This change is necessary to maintain the

conceptual level invariant. The physical data independence criterion requires that the conceptual

level does not specify storage structures or the access methods (indexing, hashing etc.) used to

retrieve the data from the physical storage medium. Making the conceptual schema physically

data independent means that the external schema, which is defined on the conceptual schema, is

in turn physically data independent.

The Logical data independence is difficult to achieve than physical data independence as it

requires the flexibility in the design of database and prograll1iller has to foresee the future

requirements or modifications in the design.


PURPOSE OF DBMS

Database management systems were developed to handle the following difficulties of typical

file-processing systems supported by conventional operating systems. Data redundancy and

inconsistency. Difficulty in accessing data isolation – multiple files and formats. Integrity

problems, Atomicity of updates, Concurrent access by multiple users and Security problems.

In the early days, database applications were built directly on top of the

file system.

Drawbacks of using file systems to store data:

- Data redundancy and inconsistency.

- Multiple file formats, duplication of information in different file.

- Difficulty in accessing data.

- Need to write a new program to carry out each new task.

- Data isolation — multiple files and formats. - Integrity constraints

- Hard to add new constraints or change existing ones.

These problems and others led to the development of database management systems.


STRUCTURE OF DBMS

The components in the structure of DBMS are described below:

DBA :- DBA means Database Administrator. He\She is person which is responsible for the

installation, configuration, upgrading, administration, monitoring, maintenance, and security of

databases in an organization.

Database Schema: - A database schema defines its entities and the relationship among them.

Database schema is a descriptive detail of the database, which can be depicted by means of


schema diagrams. All these activities are done by database designer to help programmers in

order to give some ease of understanding all aspect of database.

DDL Processor: - The DDL Processor or Compiler converts the data definition statements into a

set of tables. These tables contain the metadata concerning the database and are in a form that

can be used by other components of DBMS.

Data Dictionary: - Information pertaining to the structure and usage of data contained in the

database, the metadata, is maintained in a data dictionary. The term system catalog also describes

this meta data. The data dictionary, which is a database itself, documents the data. Each database

user can consult the data dictionary to learn what each piece of data and various synonyms of the

data fields mean.

Integrity Checker: - It checks the integrity constraints so that only valid data can be entered into

the database.

User: - The users are either application programmers or on-line terminal users of any degree of

sophistication. Each user has a language at his or her disposal. For the application programmer it

will be a conventional programming language, such as COBOL or PL/I; for the terminal user it

will be either a query language or a special purpose language tailored to that user‘s requirements

and supported by an on-line application program.

Queries:- In DBMS a search questions that instruct the program to locate records that need

specific criteria is called Query.

Query Processor: - The query processor transforms user queries into a series of low level

instructions. It is used to interpret the online user's query and convert it into an efficient series of

operations in a form capable of being sent to the run time data manager for execution. The query

processor uses the data dictionary to find the structure of the relevant portion of the database and

uses this information in modifying the query and preparing and optimal plan to access the

database.

Programmer:- Programmer can manipulate the database in all possible ways.

Application Program:- Complete, self-contained computer program that performs a specific

useful task, other than system maintenance functions application programs.


DML Processor:- DML processor process the data manipulation statements such as select ,

update , delete etc. that are passed by the application programmer into a computer program that

perform specified task by programmer such as delete a table etc.

Authorization Control: - The authorization control module checks the authorization of users in

terms of various privileges to users.

Command Process: - The command processor processes the queries passed by authorization

control module.

Query Optimizer: - The query optimizers determine an optimal strategy for the query

execution.

Transaction Manager: - The transaction manager ensures that the transaction properties should

be maintained by the system.

Scheduler: - It provides an environment in which multiple users can work on same piece of data

at the same time in other words it supports concurrency.

Buffer Manager: - The buffer manager is the software layer responsible for bringing pages from

disk to main memory as needed. The buffer manager manages the available main memory by

partitioning it into a collection of pages, which we collectively refer to as the buffer pool.

Recovery Manager: - The recovery manager , which is responsible for maintaining a log and

restoring the system to a consistent state after a crash. It is responsible for ensuring transaction

atomicity and durability.

Physical Database: - The physical database specifies additional storage details. We must decide

what file organization to use to store the relations and create auxiliary data structure called

indexes.


DBA & ITS RESPONSIBILITIES

A Database Administrator (acronym: DBA) is an IT Professionals responsible for: Installation,

Configuration, Upgrade, Administration, Monitoring, Maintenance and Securing, of databases in

an organization.

Database administrator responsibilities are as follows:-

1. Database Installation and upgrading

2. Database configuration including configuration of background Processes

3. Database performance optimization & fine tuning

4. Configuring the Database in Archive log mode

5. Maintaining Database in archive log mode

6. Devising Database backup strategy

7. Monitoring & checking the Database backup & recovery process

8. Database troubleshooting

9. Database recovery in case of crash

10. Database security

11. Enabling auditing features wherever required

12. Table space management

13. Database Analysis report

14. Database health monitoring

15. Centralized controlled

List of skills required to become database administrators are:-

Communication skills

Knowledge of database theory

Knowledge of database design

Knowledge about the RDBMS itself, e.g. Oracle Database, IBM DB2, Microsoft SQL

Server, Adaptive Server Enterprise, MaxDB, PostgreSQL

Knowledge of Structured Query Language (SQL) e.g. SQL/PSM, Transact-SQL

General understanding of distributed computing architectures, e.g. Client/Server,

Internet/Intranet, Enterprise

General understanding of the underlying operating system, e.g. Windows, Unix, Linux.

General understanding of storage technologies, memory management, disk arrays,

NAS/SAN, networking

General understanding of routine maintenance, recovery, and handling failover of a

Database

http://en.wikipedia.org/wiki/Installation_%28computer_programs%29

http://en.wikipedia.org/wiki/Computer_configuration

http://en.wikipedia.org/wiki/Upgrade

http://en.wikipedia.org/wiki/System_administrator

http://en.wikipedia.org/wiki/System_Monitoring

http://en.wikipedia.org/wiki/Software_maintenance

http://en.wikipedia.org/wiki/Computer_Security


DATA DICTIONARY & ITS ADVANTAGES

A data dictionary, or metadata repository, as defined in the Dictionary of Computing, is a

"centralized repository of information about data such as meaning, relationships to other data,

origin, usage, and format." The term may have one of several closely related meanings pertaining

to databases and database management systems (DBMS):

a document describing a database or collection of databases.

an integral component of a DBMS that is required to determine its structure.

a piece of middleware that extends or supplants the native data dictionary of a DBMS.

The term data dictionary and data repository are used to indicate a more general software

utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the

information stored in it to the user and the DBA, but it is mainly accessed by the various

software modules of the DBMS itself, such as DDL and DML compilers, the query optimizer,

the transaction processor, report generators, and the constraint enforcer. On the other hand, a

data dictionary is a data structure that stores metadata, i.e., (structured) data about data.

Any well designed database will surely include a data dictionary as it gives database

administrators and other users easy access to the type of data that they should expect to see in

every table, row, and column of the database, without actually accessing the database.

Since a database is meant to be built and used by multiple users, making sure that everyone is

aware of the types of data each field will accept becomes a challenge, especially when there is a

lack of consistency when assigning data types to fields. A data dictionary is a simple yet

effective add-on to ensure data consistency.

Some of the typical components of a data

dictionary entry are:

• Name of the table

• Name of the fields in each table

• Data type of the field (integer, date,

text…)

• Brief description of the expected data

for each field

• Length of the field

• Default value for that field

• Is the field Nullable or Not Nullable?

• Constraints that apply to each field, if

any

Not all of these fields (and many others) will apply to every single entry in the data dictionary.

For example, if the entry were about the root description of the table, it might not require any


information regarding fields. Some data dictionaries also include location details, such as each

field‘s current location, where it actually came from, and details of the physical location such as

the IP address or DNS of the server.

Format and Storage

There exists no standard format for creating a data dictionary. Meta-data differs from table to

table. Some database administrators prefer to create simple text files, while others use diagrams

and flow charts to display all their information. The only prerequisite for a data dictionary is that

it should be easily searchable.

Again, the only applicable rule for data dictionary storage is that it should be at a convenient

location that is easily accessible to all database users. The types of files used to store data

dictionaries range from text files, xml files, spreadsheets, an additional table in the database

itself, to handwritten notes. It is the database administrator‘s duty to make sure that this

document is always up to date, accurate, and easily accessible.

Creating the Data Dictionary

First, all the information required to create the data dictionary must be identified and recorded in

the design documents. If the design documents are in a compatible format, it should be possible

to directly export the data in them to the desired format for the data dictionary. For example,

applications like Microsoft Visio allow database creation directly from the design structure and

would make creation of the data dictionary simpler. Even without the use of such tools, scripts

can be deployed to export data from the database to the document. There is always the option of

manually creating these documents as well.

Advantages of a Data Dictionary

The primary advantage of creating an informative and well designed data dictionary is that it

exudes clarity on the rest of the database documentation. Also, when a new user is introduced to

the system or a new administrator takes over the system, identifying table structures and types

becomes simpler. In scenarios involving large databases where it is impossible for an

administrator to completely remember specific bits of information about thousands of fields, a

data dictionary becomes a crucial necessity.


UNIT 2

DATA MODELS

Data Models: Introduction to Data Models, Object Based Logical Model, Record

Base Logical Model- Relational Model, Network Model, Hierarchical Model.

Entity Relationship Model, Entity Set, Attribute, Relationship Set. Entity

Relationship Diagram (ERD), Extended features of ERD.


INTRODUCTION TO DATA MODELS

Data Model can be defined as an integrated collection of concepts for describing and

manipulating data, relationships between data, and constraints on the data in an organization.

The importance of data models is that data models can facilitate interaction among the designer,

the application programmer and the end user. Also, a well- developed data model can even foster

improved understanding of the organization for which the database design is developed. Data

models are a communication tool as well.

A data model comprises of three components:

• A structural part, consisting of a set of rules according to which databases can be constructed.

• A manipulative part, defining the types of operation that are allowed on the data (this includes

the operations that are used for updating or retrieving data from the database and for changing

the structure of the database).

• Possibly a set of integrity rules, which ensures that the data is accurate.

The purpose of a data model is to represent data and to make the data understandable. There

have been many data models proposed in the literature. They fall into three broad categories:

• Object Based Data Models

• Physical Data Models

• Record Based Data Models


OBJECT BASED LOGICAL MODEL ,

Object based data models use concepts such as entities, attributes, and relationships. An entity is a distinct

object (a person, place, concept, and event) in the organization that is to be represented in the database.

An attribute is a property that describes some aspect of the object that we wish to record, and a

relationship is an association between entities.

Some of the more common types of object based data model are:

• Entity-Relationship

• Object Oriented

• Semantic

• Functional


RECORD BASED LOGICAL MODEL & ITS TYPES

Record based logical models are used in describing data at the logical and view levels. In

contrast to object based data models, they are used to specify the overall logical structure of the

database and to provide a higher-level description of the implementation. Record based models

are so named because the database is structured in fixed format records of several types. Each

record type defines a fixed number of fields, or attributes, and each field is usually of a fixed

length.

The three most widely accepted record based data models are:

• Hierarchical Model

• Network Model

• Relational Model


RELATIONAL MODEL

The relational model for database is a database model based on first-order predicate logic, first

formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all

data is represented in terms of tuples, grouped into relations. A database organized in terms of

the relational model is a relational database.

Advantages of Relational Model:

Conceptual Simplicity: We have seen that both the hierarchical and network models are

conceptually simple, but relational model is simpler than both of those two.

Structural Independence: In the Relational model, changes in the structure do not affect the

data access.

Design Implementation: the relational model achieves both data independence and structural

independence.

Ad hoc query capability: the presence of very powerful, flexible and easy to use capability is

one of the main reason for the immense popularity of the relational database model.

Disadvantages of Relational Model:

Hardware overheads: relational database systems hide the implementation complexities and the

physical data storage details from the user. For doing this, the relational database system need

more powerful hardware computers and data storage devices.

Ease of design can lead to bad design: the relational database is easy to design and use. The

user needs not to know the complexities of the data storage. This ease of design and use can lead

to the development and implementation of the very poorly designed database management

system.


NETWORK MODEL

The network model is a database model conceived as a flexible way of representing objects and

their relationships. Its distinguishing feature is that the schema, viewed as a graph in which

object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or

lattice.

While the hierarchical database model structures data as a tree of records, with each record

having one parent record and many children, the network model allows each record to have

multiple parent and child records, forming a generalized graph structure.

Advantages Network Model :

Conceptual Simplicity: just like hierarchical model it also simple and easy to implement.

Capability to handle more relationship types: the network model can handle one to one1:1 and

many to many N: N relationship.

Ease to access data: the data access is easier than the hierarchical model.

Data Integrity: Since it is based on the parent child relationship, there is always a link between

the parent segment and the child segment under it.

Data Independence: The network model is better than hierarchical model in case of data

independence.

Disadvantages of Network Model:

System Complexity: All the records have to maintain using pointers thus the database structure

becomes more complex.

Operational Anomalies: As discussed earlier in network model large number of pointers is

required so insertion, deletion and updating more complex.

Absence of structural Independence: there is lack of structural independence because when we

change the structure then it becomes compulsory to change the application too.


HIERARCHICAL MODEL

A hierarchical database model is a data model in which the data is organized into a tree-like

structure. The data is stored as records which are connected to one another through links. A

record is a collection of fields, with each field containing only one value. The entity type of a

record defines which fields the record contains.

Advantages of Hierarchical model

1.Simplicity: Since the database is based on the hierarchical structure, the relationship between

the various layers is logically simple.

2.Data Security :Hierarchical model was the first database model that offered the data security

that is provided by the dbms.

3.Data Integrity: Since it is based on the parent child relationship, there is always a link

between the parent segment and the child segment under it.

4.Efficiency: It is very efficient because when the database contains a large number of 1:N

relationship and when the user require large number of transaction.

Disadvantages of Hierarchical model:

1. Implementation complexity: Although it is simple and easy to design, it is quite complex to

implement.

2.Database Management Problem: If you make any changes in the database structure, then you

need to make changes in the entire application program that access the database.

3.Lack of Structural Independence: there is lack of structural independence because when we

change the structure then it becomes compulsory to change the application too.

4.Operational Anomalies: Hierarchical model suffers from the insert, delete and update

anomalies, also retrieval operation is difficult.


ENTITY RELATIONSHIP MODEL

In DBMS, an entity–relationship model (ER model) is a data model for describing the data or

information aspects of a business domain or its process requirements, in an abstract way that

lends itself to ultimately being implemented in a database such as a relational database. The main

components of ER models are entities (things) and the relationships that can exist among them,

and databases.

Entity–relationship modeling was developed by Peter Chen and published in a 1976 paper.

However, variants of the idea existed previously, and have been devised subsequently such as

supertype and subtype data entities and commonality relationships.

ER model is represents real world situations using concepts, which are commonly used by

people. It allows defining a representation of the real world at logical level.ER model has no

facilities to describe machine-related aspects.

In ER model the logical structure of data is captured by indicating the grouping of data into

entities. The ER model also supports a top-down approach by which details can be given in

successive stages.

Entity: - An entity is something which is described in the database by storing its data, it

may be a concrete entity a conceptual entity.

Entity set:- An entity set is a collection of similar entities.

Attribute:- An attribute describes a property associated with entities. Attribute will have a

name and a value for each entity.

Domain:- A domain defines a set of permitted values for a attribute.


ENTITY SET

Entity set:- An entity set is a collection of similar entities.

A database can be modeled as:

*"a collection of entities,

*"relationship among entities.

An entity is an object that exists and is distinguishable from other objects.

Ex:- specific person, company, event, plant

Entities have attributes

Ex:- people have names and addresses.

An entity set is a set of entities of the same type that share the same properties.

Ex:- set of all persons, companies, trees, holidays.

Entity is a thing in the real world with an independent existence. and entity set is collection or set

all entities of a particular entity type at any point of time. Take an example: a company have

many employees ,and these employees are defined as entities(e1,e2,e3....) and all these entities

having same attributes are defined under ENTITY TYPE employee, and set{e1,e2,.....} is called

entity set. we can also understand this by an anology. entity type is like fruit which is a class .we

haven't seen any "fruit" yet though we have seen instance of fruit like "apple ,banana,mango etc.

hence..fruit=entity type=EMPLOYEE apple=entity=e1 or e2 or e3enity set= bucket of apple,

banana ,mango etc={e1,e2......}


ATTRIBUTE In a database management system (DBMS), an attribute may describe a component of the

database, such as a table or a field, or may be used itself as another term for a field.

A table contains one or more columns there columns are the attribute in DBMS For Example--

say you have a table named "employee information" which have the following columns

ID,NAME,ADDRESS THEN id ,name address are the attributes of employee.


RELATIONSHIP SET

The association among entities is called relationship. For example, employee entity has relation

works at with department. Another example is for student who enrolls in some course. Here,

Works at and Enrolls are called relationship.

Relationship Set

Relationship of similar type is called relationship set. Like entities, a relationship too can have

attributes. These attributes are called descriptive attributes.

Degree of Relationship

The number of participating entities in an relationship defines the degree of the relationship.

Binary = degree 2

Ternary = degree 3

n-ary = degree

Mapping Cardinalities

Cardinality defines the number of entities in one entity set which can be associated to the number of entities of other set via relationship set. One-to-one: one entity from entity set A can be associated with at most one entity of entity set B and vice versa.

One-to-many: One entity from entity set A can be associated with more than one entities of

entity set B but from entity set B one entity can be associated with at most one entity.


Many-to-one: More than one entities from entity set A can be associated with at most one entity

of entity set B but one entity from entity set B can be associated with more than one entity from

entity set A.

Many-to-many: one entity from A can be associated with more than one entity from B and vice

versa


ENTITY RELATIONSHIP DIAGRAM (ERD)

Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the

relationships between entities in a database. ER diagrams often use symbols to represent three

different types of information. Boxes are commonly used to represent entities. Diamonds are

normally used to represent relationships and ovals are used to represent attributes.


Components of ER Diagram

The ER diagram has three main components:

1) Entity

An Entity can be an object, place, person or class. In ER Diagram, an entity is represented using

rectangles. Consider an example of an Organization. Employee, manager, Department, Product

and many more can be taken as entities from an Organization.

Weak Entity

A weak entity is an entity that must defined by a foreign key relationship with another entity as it

cannot be uniquely identified by its own attributes alone.Weak entity is an entity that depends on

another entity. Weak entity doen‘t have key attribute of their own. Double rectangle represents

weak entity.

2) Attribute

An Attribute describes a property or characterstic of an entity. For example, Name, Age,

Address etc can be attributes of a Student. Databases contain information about each entity. This

information is tracked in individual fields known as attributes, which normally correspond to the

columns of a database table.An attribute is represented using eclipse.

Key Attribute

A key attribute is the unique, distinguishing characteristic of the entity. For example, an

employee‘s social security number might be the employee‘s key attribute.Key attribute

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/weak-entity-example.jpg

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/attribute-example.jpg


represents the main characterstic of an Entity. It is used to represent Primary key. Ellipse with

underlying lines represent Key Attribute.

Composite Attribute

An attribute can also have their own attributes. These attributes are known as Composite

attribute.

3) Relationship

Relationships illustrate how two entities share information in the database structure.A

Relationship describes relations between entities. Relationship is represented using diamonds.

There are three types of relationship that exist between Entities.

Binary Relationship

Recursive Relationship

Ternary Relationship

Binary Relationship

Binary Relationship means relation between two Entities. This is further divided into three types.

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/key-attribute-example.jpg

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/composite-key2.jpg

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/relationship-example.jpg


1. One to One : This type of relationship is rarely seen in real world.

The above example describes that one student can enroll ony for one course and a course

will also have only one Student. This is not what you will usually see in relationship.

2. One to Many : It reflects business rule that one entity is associated with many number of

same entity. For example, Student enrolls for only one Course but a Course can have

many Students.

The arrows in the diagram describes that one student can enroll for only one course.

3. Many to Many :

The above diagram represents that many students can enroll for more than one courses.

Recursive Relationship

In some cases, entities can be self-linked. For example, employees can supervise other

employees.

Ternary Relationship

Relationship of degree three is called Ternary relationship.

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/one-to-one-example.jpg

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/one-to-many-example.jpg

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/many-to-many-example.jpg

http://tutorial.shiksha360.com/wp-content/uploads/2014/07/recursive-relationship.jpg


EXTENDED FEATURES OF ERD

ER Model has the power of expressing database entities in conceptual hierarchical manner such

that, as the hierarchical goes up it generalize the view of entities and as we go deep in the

hierarchy it gives us detail of every entity included.

Going up in this structure is called generalization, where entities are clubbed together to

represent a more generalized view. For example, a particular student named, Mira can be

generalized along with all the students, the entity shall be student, and further a student is person.

The reverse is called specialization where a person is student, and that student is Mira.

Generalization

As mentioned above, the process of generalizing entities, where the generalized entities contain

the properties of all the generalized entities is called Generalization. In generalization, a number

of entities are brought together into one generalized entity based on their similar characteristics.

For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.

Specialization

Specialization is a process, which is opposite to generalization, as mentioned above. In

specialization, a group of entities is divided into sub-groups based on their characteristics. Take a

group Person for example. A person has name, date of birth, gender etc. These properties are

common in all persons, human beings. But in a company, a person can be identified as employee,

employer, customer or vendor based on what role do they play in company.

Similarly, in a school database, a person can be specialized as teacher, student or staff; based on

what role do they play in school as entities.


Inheritance

We use all above features of ER-Model, in order to create classes of objects in object oriented

programming. This makes it easier for the programmer to concentrate on what she is

programming. Details of entities are generally hidden from the user, this process known as

abstraction.

One of the important features of Generalization and Specialization, is inheritance, that is, the

attributes of higher-level entities are inherited by the lower level entities.

For example, attributes of a person like name, age, and gender can be inherited by lower level

entities like student and teacher etc.

Aggregation

The E-R model cannot express relationships among relationships.

When would we need such a thing?

Consider a DB with information about employees who work on a particular project and use a

number of machines doing that work. We get the E-R diagram shown in Figure below.


Figure 2.20: E-R diagram with redundant relationships

Relationship sets work and uses could be combined into a single set. However, they shouldn't be,

as this would obscure the logical structure of this scheme.

The solution is to use aggregation.

An abstraction through which relationships are treated as higher-level entities.

For our example, we treat the relationship set work and the entity sets employee and

project as a higher-level entity set called work.

Figure below shows the E-R diagram with aggregation.

Figure 2.21: E-R diagram with aggregation

Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for

each entity and relationship set as before.


The table for relationship set uses contains a column for each attribute in the primary key of

machinery and work.

Aggregation is an abstraction in which relationship sets are treated as higher level entity sets.

Here a relationship set is embedded inside an entity set, and these entity sets can participate in

relationships.


UNIT 3.1

RELATIONAL DATABASES

Relational Databases: Introduction to Relational Databases and Terminology-

Relation, Tuple, Attribute, Cardinality, Degree, Domain. Keys- Super Key,

Candidate Key, Primary Key, Foreign Key.


INTRODUCTION TO RELATIONAL DATABASES

Relational database was proposed by Edgar Codd (of IBM Research) around 1969. It has since

become the dominant database model for commercial applications (in comparison with other

database models such as hierarchical, network and object models). Today, there are many

commercial Relational Database Management System (RDBMS), such as Oracle, IBM DB2 and

Microsoft SQL Server. There are also many free and open-source RDBMS, such as MySQL,

mSQL (mini-SQL) and the embedded JavaDB.

A relational database organizes data in tables (or relations). A table is made up of rows and

columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A

database table is similar to a spreadsheet. However, the relationships that can be created among

the tables enable a relational database to efficiently store huge amount of data, and effectively

retrieve selected data.

A language called SQL (Structured Query Language) was developed to work with relational

databases.

Features of RDBMS

Features and characteristics of an RDBMS can be best understood by the Codd‘s 12 rules.

Codd’s12 Rules

Codd's thirteen rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F.

Codd, a pioneer of the relational model for databases, designed to define what is required from a

database management system in order for it to be considered relational, i.e., a relational database

management system (RDBMS). They are sometimes jokingly referred to as "Codd's Twelve

Commandments". They are as follows:

Rule 0: The Foundation rule:

A relational database management system must manage its stored data using only its

relational capabilities. The system must qualify as relational, as a database, and as a

management system. For a system to qualify as a relational database management system

(RDBMS), that system must use its relational facilities (exclusively) to manage the

database.

Rule 1: The information rule:

All information in a relational database (including table and column names) is

represented in only one way, namely as a value in a table.

Rule 2: The guaranteed access rule:

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Management_system

http://en.wikipedia.org/wiki/Relational_database_management_system

http://en.wikipedia.org/wiki/Database


All data must be accessible. It says that every individual scalar value in the database must

be logically addressable by specifying the name of the containing table, the name of the

containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support

a representation of "missing information and inapplicable information" that is systematic,

distinct from all regular values (for example, "distinct from zero or any other number", in

the case of numeric values), and independent of data type. It is also implied that such

representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active onlinecatalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to

authorized users by means of their regular query language. That is, users must be able to

access the database's structure (catalog) using the same query language that they use to

access the database's data.

Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that

1. Has a linear syntax

2. Can be used both interactively and within application programs,

3. Supports data definition operations (including view definitions), data

manipulation operations (update as well as retrieval), security and integrity

constraints, and transaction management operations (begin, commit, and

rollback).

Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators. This means

that data can be retrieved from a relational database in sets constructed of data from

multiple rows and/or multiple tables. This rule states that insert, update, and delete

operations should be supported for any retrievable set rather than just for a single row in a

single table.

Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked lists

etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:

Changes to the logical level (tables, columns, rows, and so on) must not require a change

to an application based on the structure. Logical data independence is more difficult to

achieve than physical data independence.

Rule 10: Integrity independence:

http://en.wiktionary.org/wiki/systematic

http://en.wikipedia.org/wiki/Data_type

http://en.wikipedia.org/wiki/Online

http://en.wikipedia.org/wiki/Online

http://en.wikipedia.org/wiki/Database_catalog

http://en.wikipedia.org/wiki/Query_language

http://en.wikipedia.org/wiki/Linear_syntax

http://en.wikipedia.org/wiki/Database_transaction

http://en.wikipedia.org/wiki/View_%28database%29


Integrity constraints must be specified separately from application programs and stored in

the catalog. It must be possible to change such constraints as and when appropriate

without unnecessarily affecting existing applications.

Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be invisible to

users of the database. Existing applications should continue to operate successfully:

1. when a distributed version of the DBMS is first introduced; and

2. when existing distributed data are redistributed around the system.

Rule 12: The non-subversion rule:

If the system provides a low-level (record-at-a-time) interface, then that interface cannot

be used to subvert the system, for example, bypassing a relational security or integrity

constraint.

Advantages of RDBMS

RDBMS offers an extremely structured way of managing data (although a good database design

is needed) as everything in an RDBMS is represented as values in relations (i.e. tables). Also,

many obvious advantages are visible within the 13 rules stated by Codd.

Disadvantages of RDBMS

RDBMS is very good for related data, but an unorganized and unrelated data creates only chaos

within RDBMS. That‘s a reason why the emerging trends such as Big Data (where a lot of data

from various sources is to be analyzed) don‘t welcome RDBMS, but non-relational (or non-SQL

DBMSs) DBMS for their purpose.

http://en.wikipedia.org/wiki/Integrity_constraints

http://en.wikipedia.org/wiki/Database_catalog


TERMINOLOGIES: (RELATION, TUPLE, ATTRIBUTE,

CARDINALITY, DEGREE, DOMAIN)

Relation:

Definition-

A database relation is a predefined row/column format for storing information in a relational

database. Relations are equivalent to tables. It is also known as table.

Example-

Tuple:

Definition-

In the context of databases, a tuple is one record (one row).

Example-


Attribute:

Definition-

In general, an attribute is a characteristic. In a database management system (DBMS), an

attribute refers to a database component, such a table. It also may refer to a database field.

Attributes describe the instances in the row of a database.

Example-

Degree:

Definition-

It is the number of attribute of its relation schema. It is an association among two or more

entities.

Example-


Cardinality:

Definition-

In the context of databases, cardinality refers to the uniqueness of data values contained in a

column.

It is not common, but cardinality also sometimes refers to the relationships between tables.

Cardinality between tables can be one-to-one, many-to-one, or many-to-many.

Example-

Domain

Definition-

In database technology, domain refers to the description of an attribute's allowed values. The

physical description is a set of values the attribute can have, and the semantic, or logical,

description is the meaning of the attribute.

Example-


KEYS: (SUPER KEYS, CANDIDATE KEY, PRIMARY

KEY, FOREIGN KEY)

Definition of a Key-

Simply consists of one or more attributes that determine other attributes.

The key is defined as the column or attribute of the database table. For example if a table has id,

name and address as the column names then each one is known as the key for that table. We can

also say that the table has 3 keys as id, name and address. The keys are also used to identify each

record in the database table.

The following are the various types of keys available in the DBMS system.

Super key

Candidate key

Primary key

Foreign key

Super Key-

A superkey is a combination of columns that uniquely identifies any row within a relational

database management system (RDBMS) table. A candidate key is a closely related concept

where the superkey is reduced to the minimum number of columns required to uniquely identify

each row.

For example, imagine a table used to store customer master details that contains columns such

as:

customer name

customer id

social security number (SSN)

address

date of birth

A certain set of columns may be extracted and guaranteed unique to each customer. Examples of

superkeys are as follows:

Name, SSN, Birthdate

ID, Name, SSN

However, this process may be further reduced. It can be assumed that each customer id is unique

to each customer. So, the superkey may be reduced to just one field, customer id, which is the

candidate key. However, to ensure absolute uniqueness, a composite candidate key may be


formed by combining customer id with SSN.

A primary key is a special term for candidate keys designated as unique identifiers for all table

rows. Until this point, only columns have been considered for suitability and are thus termed

candidate keys. Once a candidate key is decided, it may be defined as the primary key at the

point of table creation.

Candidate key-

A candidate key is a column, or set of columns, in a table that can uniquely identify any database

record without referring to any other data. Each table may have one or more candidate keys, but

one candidate key is special, and it is called the primary key. This is usually the best among the

candidate keys.

When a key is composed of more than one column, it is known as a composite key.

The best way to define candidate keys is with an example. For example, a bank‘s database is

being designed. To uniquely define each customer‘s account, a combination of the customer‘s ID

or social security number (SSN) and a sequential number for each of his or her accounts can be

used. So, Mr. Andrew Smith‘s checking account can be numbered 223344-1, and his savings

account 223344-2. A candidate key has just been created.

In this case, the bank‘s database can issue unique account numbers that are guaranteed to prevent

the problem just highlighted. For good measure, these account numbers can have some built-in

logic. For example checking accounts can begin with a ‗C,‘ followed by the year and month of

creation, and within that month, a sequential number.

Note that it was possible to uniquely identify each account using the aforementioned SSNs and a

sequential number (assuming no government mess-up, in which the same number is issued to

two people). So, this is a candidate key that can potentially be used to identify records. However,

a much better way of doing the same thing has just been demonstrated - creating a candidate key.

In fact, if the chosen candidate key is so good that it can certainly uniquely identify each and

every record, then it should be used as the primary key. All databases allow the definition of one,

and only one, primary key per table.

Primary key-

It is a candidate key that is chosen by the database designer to identify entities with in an entity

set. Primary key is the minimal super keys. In the ER diagram primary key is represented by

underlining the primary key attribute. Ideally a primary key is composed of only a single

attribute. But it is possible to have a primary key composed of more than one attribute.

A primary key is a special relational database table column (or combination of columns)

designated to uniquely identify all table records.


A primary key‘s main features are:

It must contain a unique value for each row of data.

It cannot contain null values.

A primary key is either an existing table column or a column that is specifically generated by the

database according to a defined sequence.

For example, students are routinely assigned unique identification (ID) numbers, uniquely-

identifiable Social Security numbers.

For example, a database must hold all of the data stored by a commercial bank. Two of the

database tables include the CUSTOMER_MASTER, which stores basic and static customer data

(e.g., name, date of birth, address and Social Security number, etc.) and the

ACCOUNTS_MASTER, which stores various bank account data (e.g., account creation date,

account type, withdrawal limits or corresponding account information, etc.).

To uniquely identify customers, a column or combination of columns is selected to guarantee

that two customers never have the same unique value. Thus, certain columns are immediately

eliminated, e.g., surname and date of birth. A good primary key candidate is the column that is

designated to hold unique and government-assigned Social Security numbers. However, some

account holders (e.g., children) may not have Social Security numbers, and this column‘s

candidacy is eliminated. The next logical option is to use a combination of columns such as the

surname to the date of birth to the email address, resulting in a long and cumbersome primary

key.

Foreign Key-

A foreign key is a column or group of columns in a relational database table that provides a link

between data in two tables. It acts as a cross-reference between tables because it references the

primary key of another table, thereby establishing a link between them.

In complex databases, data in a domain must be added across multiple tables, thus maintaining a

relationship between them. The concept of referential integrity is derived from foreign key

theory.

Foreign keys and their implementation are more complex than primary keys.

For any column acting as a foreign key, a corresponding value should exist in the link table.

Special care must be taken while inserting data and removing data from the foreign key column,

as a careless deletion or insertion might destroy the relationship between the two tables.

For instance, if there are two tables, customer and order, a relationship can be created between

them by introducing a foreign key into the order table that refers to the customer ID in the

customer table. The customer ID column exists in both customer and order tables. The customer

ID in the order table becomes the foreign key, referring to the primary key in the customer table.

To insert an entry into the order table, the foreign key constraint must be satisfied.


Some referential actions associated with a foreign key action include the following: Cascade: When rows in the parent table are deleted, the matching foreign key columns in the

child table are also deleted, creating a cascading delete.

Set Null: When a referenced row in the parent table is deleted or updated, the foreign key values

in the referencing row are set to null to maintain the referential integrity.

Triggers: Referential actions are normally implemented as triggers. In many ways foreign key

actions are similar to user-defined triggers. To ensure proper execution, ordered referential

actions are sometimes replaced with their equivalent user-defined triggers.

Set Default: This referential action is similar to "set null." The foreign key values in the child

table are set to the default column value when the referenced row in the parent table is deleted or

updated.

Restrict: This is the normal referential action associated with a foreign key. A value in the parent

table cannot be deleted or updated as long as it is referred to by a foreign key in another table.

No Action: This referential action is similar in function to the "restrict" action except that a no-

action check is performed only after trying to alter the table.


UNIT 3.2

RELATIONAL ALGEBRA

Relational Algebra: Operations, Select, Project, Union, Difference, Intersection

Cartesian product, Join, Natural Join.


INTRODUCTION

Relational algebra, first described by E.F. Codd while at IBM, is a family of algebra with a

well-founded semantics used for modeling the data stored in relational databases, and defining

queries on it.

In relational algebra the queries are composed using a collection of operators, and each query

describes a step by step procedure for computing the desired result.

The queries are specified in operational and procedural manner that‘s why its called the

procedural language also.

There are many operations which we include in the relational algebra .

Each relational query describes a step by step procedure for computing the desired answer

,based on the order in which operators are applied in the query.

The procedural nature of the algebra allows us to think of an algebra as a recipe, or a plan for

evaluating a query, and relational system in fact use algebra expressions to represent query

evaluation plans.

Relational algebra expression

It is an expression which is a composition of the operators and it forms a complex query called

a relational algebra expression.

A unary algebra operator applied to a single expression ,and a binary algebra operator applied to

two expression

Fundamental operations of Relational algebra:

Select

Project

Union

Set different

Cartesian product

Rename


SELECT

The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a

relation based on a selection condition.

The selection condition acts as a filter

Keeps only those tuples that satisfy the qualifying condition

Tuples satisfying the condition are selected whereas the other tuples are discarded

(filtered out)

Examples:

A. Select the STUDENT tuples whose age is 18

sigmaage=18 (STUDENT)

B. Select the STUDENT tuples whose course is bca

sigmacourse=BCA (STUDENT)

C. Select the students from the ―student relation instances‖ whose gender is male

sigmagender=F(STUDENT)

Student name Age gender course

Ritika 18 F BCA

Prerna 19 F Bsc.

Ankush 20 M BA

Preeti 18 F Bsc.

Pragyan 20 M BA

Ritu 18 F BCA

Janvi 20 F BCA

Answer of the first select statement is :

A.

Student name Age gender course

Ritika 18 F BCA

Preeti 18 F Bsc.

Ritu 18 F BCA


PROJECT

PROJECT Operation is denoted by p (pi)

If we are interested in only certain attributes of relation, we use PROJECT.

This operation keeps certain columns (attributes) from a relation and discards the other columns.

Example:

To list all the students name and course only in the student relation model.

Pistudent_name, course (student)

(output from the table first)

Student-name Course

Ritika BCA

Prerna Bsc.

Ankush BA

Preeti Bsc.

Pragyan BA

Ritu BCA

Janvi BCA


UNION

It is a Binary operation, denoted by sign of union in set theory. The result of R union S, is a

relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are

eliminated.

The two operand relations R and S must be ―type compatible‖ (or UNION compatible), & R and

S must have same number of attributes.

Each pair of corresponding attributes must be type compatible (have same or compatible

domains). Eg. in the bank enterprise we have depositor and borrower almost similar attributes

and types.

Customer name Id no.

RITA 301

GITA 302

RAM 303

(DEPOSITOR‘S RELATIONAL MODEL)

Customer name Id no.

Sham 300

Surbhi 304

Rita 301

Ram 303

(Borrower‘s relational model)

(Output: a union b)

Customer_name Id no

Rita 301

Gita 302

Ram 303

Sham 300

Surbhi 304


DIFFERENCE

SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by – .The result of R – S, is a

relation that includes all tuples that are in R but not in S. The attribute names in the result will be

the same as the attribute names in R. The two operand relations R and S must be ―type

compatible‖

Output: a-b

Customer name Idno

Gita 302

The elements of a which are not belongs to b contains only a single result


INTERSECTION

INTERSECTION: The result of the operation R intersection S, is a relation that includes all

tuples that are in both R and S.

The attribute names in the result will be the same as the attribute names in R

The two operand relations R and S must be ―type compatible‖


CARTESIAN PRODUCT ,

The resulting relation state has one tuple for each combination of tuples—one from R and one

from S. Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have

nR * nS tuples.

The two operands do NOT have to be "type compatible‖.

Example:

R.

A 1

B 2

D 3

F 4

S.

D 3

E 4

Output: R*S

A 1 D 3

A 1 E 4

B 2 D 3

B 2 E 4

D 3 D 3

D 3 E 4

F 4 D 3

F 4 E 4


JOIN ,

It is just a cross product of two relations.

Join allow you to evaluate a join condition between the attributes of the relations on

which the join operations undertaken .

It is used to combine related tuples from two relations.

Join condition is called theta.

Notation:-

R JOINjoin condition S

Let us take an instance:-


NATURAL JOIN

Another variation of JOIN called NATURAL JOIN — denoted by *

Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such

joins result in two attributes in the resulting relation having exactly the same value. A 'natural

join' will remove the duplicate attribute(s).

In most systems a natural join will require that the attributes have the same name to

identify the attribute(s) to be used in the join. This may require a renaming mechanism.

If you do use natural joins make sure that the relations do not have two attributes with the

same name by accident.

Example:

The following query results refer to this database state.


A simple database:


Example Natural Join Operations on the sample database above:


SUMMARY OF OPERATIONS


UNIT 4

STRUCTURED QUERY LANGUAGE (SQL)

&

NORMALIZATION

Structured Query Language (SQL): Introduction to SQL, History of SQL,

Concept of SQL, DDL Commands, DML Commands, DCL Commands, Simple

Queries, Nested Queries,

Normalization: Benefits of Normalization, Normal Forms- 1NF, 2NF, 3NF,

BCNF & and Functional Dependency.


INTRODUCTION TO SQL

Introduction & Brief History:

SQL is a special-purpose programming language designed for managing data held in a relational

database management system (RDBMS). Originally based upon relational algebra and tuple

relational calculus, SQL consists of a data definition language and a data manipulation language.

The scope of SQL includes data insert, query, update and delete, schema creation and

modification, and data access control.

SQL was one of the first commercial languages for Edgar F. Codd's relational model, as

described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data

Banks." Despite not entirely adhering to the relational model as described by Codd, it became the

most widely used database language.

SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the

International Organization for Standardization (ISO) in 1987. Since then, the standard has been

revised to include a larger set of features.

Why SQL?

Allows users to access data in relational database management systems.

Allows users to describe the data.

Allows users to define the data in database and manipulate that data.

Allows embedding within other languages using SQL modules, libraries & pre-compilers.

Allows users to create and drop databases and tables. Allows users to create view, stored procedure, functions in a database.

Allows users to set permissions on tables, procedures and views

Advantages of SQL:

High Speed: SQL Queries can be used to retrieve large amounts of records from a

database quickly and efficiently.

Well Defined Standards Exist: SQL databases use long-established standard, which is

being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.

No Coding Required: Using standard SQL it is easier to manage database systems

without having to write substantial amount of code.

http://en.wikipedia.org/wiki/Relational_algebra

http://en.wikipedia.org/wiki/Tuple_relational_calculus

http://en.wikipedia.org/wiki/Tuple_relational_calculus

http://en.wikipedia.org/wiki/Data_definition_language

http://en.wikipedia.org/wiki/Data_manipulation_language

http://en.wikipedia.org/wiki/Database_schema

http://en.wikipedia.org/wiki/Edgar_F._Codd

http://en.wikipedia.org/wiki/Relational_model

http://en.wikipedia.org/wiki/Codd%27s_12_rules

http://en.wikipedia.org/wiki/Technical_standard

http://en.wikipedia.org/wiki/American_National_Standards_Institute

http://en.wikipedia.org/wiki/International_Organization_for_Standardization


Emergence of ORDBMS: Previously SQL databases were synonymous with relational

database. With the emergence of Object Oriented DBMS, object storage capabilities are

extended to relational databases

Disadvantages of SQL:

Difficulty in Interfacing: Interfacing an SQL database is more complex than adding a few

lines of code.

More Features Implemented in Proprietary way: Although SQL databases conform to

ANSI &ISO standards, some databases go for proprietary extensions to standard SQL to

ensure vendor lock-in.


HISTORY OF SQL

In 1970 Edgar F. Codd, member of IBM Lab, published the classic paper, ‘A relational

model of data large shared data banks‘.

With Codd‘s paper ,a great deal of research and experiments started and led to the design

and prototype implementation of a number of relational languages.

One such language was Structured English Query Language (SEQUEL), defined by

Donald D. Chamberlin and Raymond F. Boyce.

The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark

of the UK-based Hawker Siddeley aircraft company.

A revised version of SEQUEL was released in 1976-77 called SEQUEL/2 or SQL

In 1978, IBM worked to develop Codd's ideas and released a product named System/R.

In 1986IBM developed the first prototype of relational database and standardized by

ANSI. The first relational database was released by Relational Software and its later

becoming ORACLE.

IN 1986 ANSI and ISO published an SQL standard called ‗SQU-86‘.

The next version of standard was SQL-89,SQL-92, followed by SQL-1999,SQL-

2003,SQL-2006, SQL-2008.

According to the industry trends , it is obvious that the relational model and SQL Will continue

to enhance its position in near future


CONCEPT BEHIND SQL

SQL Process

When you are executing an SQL command for any RDBMS, the system determines the best way

to carry out your request and SQL engine figures out how to interpret the task.

There are various components included in the process. These components are:-

Query Dispatcher

Optimization Engines

Classic Query Engine

SQL Query Engine

Classic query engine handles all non-SQL queries but SQL query engine won't handle logical

files.

SQL Architecture

Types of SQL Commands

The following sections discuss the basic categories of commands used in SQL to perform various

functions . The main categories are:-

DDL (Data Definition Language)

DML (Data Manipulation Language)

DQL (Data Query Language)

DCL (Data Control Language)

TCL (Transactional Control Language)


DDL COMMANDS

DDL (Data Definition Language) Commands of SQL allow the Data Definition functions like

creating, altering and dropping the tables.

The following are the various DDL Commands, along with their syntax, use and examples:

#1. CREATE

USE: creates a new table, view of a table, or other objects in database.

SYNTAX:

CREATE TABLE table_name(

Column_name1 data_type(size),

Column_name2 data_type(size),

….

);

EXAMPLE :

CREATE TABLE Persons

(PersonIDint,

LastNamevarchar(255),

FirstNamevarchar(255),

Address varchar(255),

City varchar(255)

);

#2. ALTER

USE : modifies an existing database object such as a table.

SYNTAX :

ALTER TABLE table_name

ADD column_namedatatype;

or


DROP COLUMN column_name;

or


MODIFY COLUMN column_namedatatype;

EXAMPLE :

ALTER TABLE Persons

ADD DateOfBirth date;

or


ALTER TABLE Persons

DROP COLUMN DateOfBirth;

or

ALTER TABLE Persons

ALTER COLUMN DateOfBirth year;

#3. DROP

USE : deletes an entire table, a view of a table, or other object in the database.

SYNTAX : DROP TABLE table_name;

EXAMPLE : DROP TABLE Persons;

#4. TRUNCATE

USE : remove all records from a table, including all spaces allocated for the

records are removed; also, reinitializes the primary key.

SYNTAX : TRUNCATE TABLE table_name;

EXAMPLE : TRUNCATE TABLE persons;

#5. COMMENT

USE : Add comments to the data dictionary.


DML COMMANDS

DML (Data Manipulation Language) Commands of SQL allow the Data Manipulation functions

like inserting, updating and deleting data values in the tables created using DDL Commands.

The following are the various DML Commands, along with their syntax, use and examples:

#1. INSERT

USE : creates a record.

SYNTAX :

INSERT INTO table_name

VALUES (value1,value2,value3,...);

or

INSERT INTO table_name (column1,column2,column3,...)

VALUES (value1,value2,value3,...);

EXAMPLE :

INSERT INTO Persons VALUES(1,‘manan’,’07-08-1994’);

#2. UPDATE

USE : modifies records.

SYNTAX :

UPDATE table_name

SET column1=value1,column2=value2,...

WHERE some_column=some_value;

EXAMPLE :

UPDATE Students

SET Fine=0

WHERE Stu_ID=404;

#3. DELETE

USE : delete records (but the structure remain intact).

SYNTAX :

DELETE FROM table_name

WHERE some_column=some_value;

EXAMPLE :

DELETE FROM Persons

WHERE Stu_ID=21;


#4. CALL

USE : call a PL/SQL or java subprogram.

#5. EXPLAIN PLAN

USE : explain access path to data.

SYNTAX :

EXPLAIN PLAN FOR

SQL_Statement;

EXAMPLE :

EXPLAIN PLAN FOR

SELECT last_name FROM employees;

#6. LOCK TABLE

USE : control concurrency.

SYNTAX :

LOCK TABLE table_name

IN EXCLUSIVE MODE

NOWAIT;

This locks the table in exclusive mode but does not wait if another user already has locked the table:

EXAMPLE :

LOCK TABLE employees

IN EXCLUSIVE MODE

NOWAIT;


DCL COMMANDS

DCL (Data Control Language) Commands of SQL allow the Data Manipulation functions like

granting and revoking permissions, committing changes, roll backing, etc.

The following are the various DCL Commands, along with their syntax, use and examples:

#1. GRANT

USE : gives a privilege to user(s).

SYNTAX :

GRANT permission [, ...]

ON [schema_name.]object_name [(column [, ...])]

TO database_principal[, ...]

[WITH GRANT OPTION]

EXAMPLE :

GRANT SELECT

ON Invoices

TO AnneRoberts;

#2. REVOKE

USE : takes back privileges/grants from users.

SYNTAX :

REVOKE [GRANT OPTION FOR] permission [, ...]

ON [schema_name.]object_name [(column [, ...])]

FROM database_principal[, ...]

[CASCADE]

EXAMPLE :

REVOKE SELECT

ON Invoices

FROM AnneRoberts;

#3. COMMIT

USE : save work done.

SYNTAX : COMMIT;


#4. ROLLBACK

USE : restore database to original sice the last COMMIT.

SYNTAX : ROLLBACK;

#5. SAVEPOINT

USE : identify a point in a transaction in which you can later rollback.

SYNTAX : SAVEPOINT SAVEPOINT_NAME;

& then, ROLLBACK TO SAVEPOINT_NAME;

RELEASE SAVEPOINT SAVEPOINT_NAME;

#6. SET TRANSACTION

USE : set space transaction, change transaction options like what rollback

segments to use.

SYNTAX : SET TRANSACTION [ READ WRITE | READ ONLY ];


SIMPLE QUERIES & NESTED QUERIES

A Simple Query is a query that searches using just one parameter. A simple query might use all

of the fields in a table and search using just one parameter, Or it might use just the necessary

fields which the information is required, but it will still use just one parameter(search criteria).

The following are some types of queries:

• A select query retrieves data from one or more of the tables in your database, or other

queries there, and displays the results in a datasheet. You can also use a select query to

group data, and to calculate sums, averages, counts, and other types of totals.

• A parameter query is a type of select query that prompts you for input before it runs. The

query then uses your input as criteria that control your results. For example, a typical

parameter query asks you for starting high and low values, and only returns records that

fall within those values.

• A cross-tab query uses row headings and column headings so you can see your data in

terms of two categories at once.

• An action query alters your data or your database. For example, you can use an action

query to create a new table, or add, delete, or change your data.

A Nested Query or a subquery or inner query is a query in a query.

A subquery is usually added in the WHERE Clause of sql statement. Most of the time, a

subquery is used when you know how to search for a value using a SELECT statement, but do

not know the exact value.

A subquery is also called an inner query or inner select, while the statement containing a

subquery is also called an outer query or outer select.

A query result can be used in a condition of a Where clause. In such case, a query is called a

subquery and complete SELECT statement is called a nested query. We can also used subquery

can also be placed within HAVING clause. But subquery cannot be used with ORDERBY

clause.

Subqueries are queries nested inside other queries, marked off with parentheses, and sometimes

referred to as "inner" queries within "outer" queries. Most often, you see subqueries in WHERE

or HAVING clauses.

A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT,

UPDATE, or DELETE statement, or inside another subquery.


A subquery can appear anywhere an expression can be used, if it returns a single value.

Statements that include a subquery usually take one of these formats:

WHERE expression [NOT] IN (subquery).

WHERE expression comparison_operator [ANY | ALL] (subquery).

WHERE [NOT] EXISTS (subquery).

Following are the TYPES of Nested Queries:

Single - Row Subqueries

The single-row subquery returns one row. A special case is the scalar subquery, which returns a

single row with one column. Scalar subqueries are acceptable (and often very useful) in virtually

any situation where you could use a literal value, a constant, or an expression. The single row

query uses any operator in the query .i.e. (=, <=, >= <>, <, >). If any of the operators in the

preceding table are used with a subquery that returns more than one row, the query will fail.

Multiple-row subqueries

Multiple-row subqueries return sets of rows. These queries are commonly used to generate result

sets that will be passed to a DML or SELECT statement for further processing. Both single-row

and multiple-row subqueries will be evaluated once, before the parent query is run. Since it

returns multiple values, the query must use the set comparison operators (IN, ALL, ANY). If you

use a multi row sub query with the equals comparison operators, the database will return an error

if more than one row is returned. The operators in the following table can use multiple-row

subqueries:

Symbol Meaning

IN equal to any member in a list

ANY returns rows that match any value on a list

ALL returns rows that match all the values in a list

Multiple–Column Subquery

A subquery that compares more than one column between the parent query and subquery is

called the multiple column subqueries. In multiple-column subqueries, rows in the subquery

results are evaluated in the main query in pair-wise comparison. That is, column-to-column

comparison and row-to-row comparison.


Correlated Subquery

A correlated subquery has a more complex method of execution than single- and multiple-row

subqueries and is potentially much more powerful. If a subquery references columns in the

parent query, then its result will be dependent on the parent query. This makes it impossible to

evaluate the subquery before evaluating the parent query.

Some points to remember about the subquery are:

• Subqueries are queries nested inside other queries, marked off with parentheses.

• The result of inner query will pass to outer query for the preparation of final result.

• ORDER BY clause is not supported for Nested Queries.

• You cannot use Between Operator.

• Subqueries will always return only a single value for the outer query.

• A sub query must be put in the right hand of the comparison operator.

• A query can contain more than one sub-query.


NORMALIZATION

Normalization is the process of efficiently organizing data in a database. There are two goals of

the normalization process: eliminating redundant data (for example, storing the same data in

more than one table) and ensuring data dependencies make sense (only storing related data in a

table). Both of these are worthy goals as they reduce the amount of space a database consumes

and ensure that data is logically stored.

Normalization is a process, in which we systematically examine relations for anomalies and,

when detected, remove those anomalies by splitting up the relation into two new, related

relations.

Normalization is an important part of the database development process: Often during

normalization, the database designers get their first real look into how the data are going to

interact in the database.

Finding problems with the database structure at this stage is strongly preferred to finding

problems further along in the development process because at this point it is fairly easy to cycle

back to the conceptual model (Entity Relationship model) and make changes. Normalization can

also be thought of as a trade-off between data redundancy and performance. Normalizing a

relation reduces data redundancy but introduces the need for joins when all of the data is required

by an application such as a report query.

Problems without Normalization

Without normalization it becomes difficult to handle and update the database, without facing

data loss. Insertion, updation, deletion anomalies are very frequent if database is not normalized.

To understand these anomalies lets us take an example of student table.

S_id S_name S_address Subject_opted

401 Adam Noida Bio

402 Alex Panipat Maths

403 Stuart Jammu Maths

404 Adam Noida Physic


Updation Anamoly:

To update address of the student who occur twice or more than twice in a table, we will have to

update S_address columns in all the row, else data will become inconsistent.

Insertion anamoly:

Suppose for the new admission we have a S_id(student id), name, address of the student but if

student is not opted for any subjects yet than we have to inset Null there , leading to insertion

anamoly.

Deletion Anamoly:

If S_id 401 has only one subject and temporarily he drops it , when we delete that row entire

student record will be deleted along with it.


BENEFITS OF NORMALIZATION

Normalization produces smaller tables with smaller rows:

More rows per page (less logical I/O)

More rows per I/O (more efficient)

More rows fit in cache (less physical I/O)

The benefits of normalization include:

Searching, sorting, and creating indexes is faster, since tables are narrower, and more

rows fit on a data page.

You usually have more tables.

You can have more clustered indexes (one per table), so you get more flexibility in tuning

queries.

Index searching is often faster, since indexes tend to be narrower and shorter.

More tables allow better use of segments to control physical placement of data.

You usually have fewer indexes per table, so data modification commands are faster.

Fewer null values and less redundant data, making your database more compact.

Triggers execute more quickly if you are not maintaining redundant data.

Data modification anomalies are reduced.

Normalization is conceptually cleaner and easier to maintain and change as your needs

change.


NORMAL FORMS (1NF, 2NF, 3NF, BCNF)

Relations can fall into one or more categories (or classes) called Normal Forms .

Normal Form: A class of relations free from a certain set of modification anomalies.

Normal forms are given names such as:

1. First Normal Form

2. Second Normal Form

3. Third Normal Form

4. BCNF

These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF

The Normalization Process for a given relation consists of:

Apply the definition of each normal form (starting with 1NF).

If a relation fails to meet the definition of a normal form, change the relation (most often by

splitting the relation into two new relations) until it meets the definition.

Re-test the modified/new relations to ensure they meet the definitions of each normal form.

First Normal Form (1NF)

A relation is in first normal form if it meets the definition of a relation:

1. Each attribute (column) value must be a single value only.

2. All values for a given attribute (column) must be of the same type.

3. Each attribute (column) name must be unique.

4. The order of attributes (columns) is insignificant

5. No two tuples (rows) in a relation can be identical.

6. The order of the tuples (rows) is insignificant

Each table should be organized into row and each row should have a primary key that

distinguishes it as unique. The primary key is usually a single column but sometimes more than

one column can be combined to create a single primary key.

For example consider a table is not in first normal form


In First Normal Form any row must not have a column in which more than one value is saves,

like separated with commas rather that, we must separated such data into multiple rows.

Table in first Normal Form

Student Age Subject

Adam 15 Biology

Adam 15 Maths

Alex 14 Maths

Stuart 17 Maths

Using First Normal Form data redundancy increases as there will be many columns with the

same data in multiple rows but each row as a whole will be unique.

Second Normal Form (2NF)

A relation is in second normal form (2NF) if all of its non-key attributes are dependent on

all of the key.

Another way to say this: A relation is in second normal form if it is free from partial-key

dependencies

Relations that have a single attribute for a key are automatically in 2NF.

This is one reason why we often use artificial identifiers (non-composite keys) as keys

As per the second normal form there must not be any partial dependency of any colomn on

primary key. It means that for a table that has concatenated primary key, each colomn in the

table that is not part of primary key must depend upon the entire concatenated key for its

existence. If any column depends only on one part of the concatenates key, then the table fails

second normal form

In the example of First Normal Form, there are two rows for Adam, to include multiple subjects

that he has opted for. While this is searchable, and follows First Normal Form, it is an inefficient

use of space. Also in the above table in first normal form while the candidate key is {Student,

Student Age Subject

Adam 15 Biology, Maths

Alex 14 Maths

Stuart 17 Maths


subject} , Age of student only depends on student columns which is incorrect as per second

normal form. To achieve second normal form , it would be helpful to split out the subject into an

independent table, and match then up using the student names as foreign keys.

New student table following second normal form will be:

Student Age

Adam 15

Alex 14

Stuart 17

In student table the candidate key will be student column, because all other column i.e. Age

depend on it

New subject table introduced for second normal form will be:

Student Subject

Adam Biology

Adam Maths

Alex Maths

Stuart Maths

In subject Table the candidate key will be {subject, Student} column. Now both the above table

qualifies for second normal form and will never suffer updated anomalies.

Third Normal Form (3NF)

A relation is in third normal form (3NF) if it is in second normal form and it contains no

transitive dependencies. Consider relation R containing attributes A, B and C. R(A, B, C)

If A → B and B → C then A → C

Transitive Dependency: Three attributes with the above dependencies

Third normal forms apply that every non prime attribute of table must be dependent on primary

key. The transitive function dependency should be removed from the table. The table must be in

Second Normal Form. For example the table with the following field

Student_Detail table:


Student_id Student_name DOB Street City State Zip

In this table student_id is the primary key, but street, city, state depends upon zip. The

dependency between zip and other field is transitive dependency. Hence to apply third normal

form we need to move the street, city, state to the new table, with zip as primary key.

New Student_Detail Table:

Student_id Student_name DOB Zip

Address_Table:

Zip Street City State

The advantage of removing transitive dependency is:

1. Amount of data duplication is reduce.

2. Data integrity achieved.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd normal form (BCNF)

A relation is in BCNF, if and only if, every determinant is a candidate key.

The difference between 3NF and BCNF is that for a functional dependency A->B,

3NF allows this dependency in a relation if B is a primary-key attribute and A is not a

candidate key,

Where as BCNF insists that for this dependency to remain in a relation, A must be a

candidate key.

Boyce and codd normal form is the high version of the Third Normal Form. This form deal with

certain type of anamoly that is not held by third normal form. A third Normal form table which

does not have any multiple overlapping candidate key is said to be in BCNF.

Client Interview

ClientNo interviewDate InterviewTime StaffNo roomNo

CR76 13/5/02 10:30 SG5 G101

CR76 13/5/02 12:00 SG5 G101

CR74 13/5/02 12:00 SG37 G102


CR56 1/7/02 10:30 SG5 G102

1. FD1: clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary Key)

2. FD2: staffNo, interviewDate, interviewTime- > clientNo (Candidate key)

3. FD3: roomNo, interviewDate, interviewTime -> clientNo, staffNo (Candidate key)

4. FD4: staffNo, interviewDate- > roomNo (not a candidate key)

As a consequence the ClientInterview relation may suffer from update anomalies.

For example, two tuples have to be updated if the roomNo need be changed for staffNo

SG5 on the 13-May-02.

To transform the ClientInterview relation to BCNF, we must remove the violating

functional dependency by creating two new relations called Interview and StaffRoom as

shown below:

1. Interview (clientNo, interviewDate, interviewTime, staffNo)

2. StaffRoom (staffNo, interviewDate, roomNo)

Interview

ClientNo InterviewDate InterviewTime StaffNo

CR76 13/5/02 10:30 SG5

CR76 13/5/02 12:00 SG5

CR74 13/5/02 12:00 SG37

CR56 1/7/02 10:30 SG5

StaffRoom

staffNo InterviewDate RoomNo

SG5 13/5/02 G101

SG37 13/5/02 G102

SG5 1/7/02 G102


FUNCTIONAL DEPENDENCY

Functional dependency is a relationship that exists when one attribute uniquely determines

another attribute.

A functional dependency occurs when one attribute in a relation uniquely determines another

attribute. This can be written A -> B which would be the same as stating "B is functionally

dependent upon A"

Example: If R is a relation with attributes X and Y, a functional dependency between the

attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a

determinant set and Y is a dependent attribute. Each value of X is associated precisely with one

Y value.

Functional dependency in a database serves as a constraint between two sets of attributes.

Defining functional dependency is an important part of relational database design and contributes

to aspect normalization.

Consider an Example:

REPORT (Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where:

Student#-Student Number

Course#-Course Number

CourseName -CourseName

IName- Name of the instructor who delivered the course

Room#-Room number which is assigned to respective instructor

Marks- Scored in Course Course# by student Student #

Grade –Obtained by student Student# in course Course #


Student#,Course# together (called composite attribute) defines EXACTLY

ONE value of marks .This can be symbolically represented as

Student#Course# -> Marks

REMARK: This type of dependency is called functional dependency. In above example Marks

is functionally dependent on Student#Course#.

Other function dependency in the bove example are

• Course# -> CourseName

• Course#-> IName(Assuming one course is taught by one and only one instructor )

• IName -> Room# (Assuming each instructor has his /her own and non-shared room)

• Marks ->Grade

• Formally we can define functional dependency as: In a given relation R, X and Y are

attributes. Attribute Y is functional dependent on attribute X if each value of X

determines exactly one value of Y. This is represented as X->Y, however X may be

composite in nature.


UNIT 5

RELATIONAL DATABASE DESIGN

Relational Database Design: Introduction to Relational Database Design, DBMS

v/s RDBMS. Integrity rule, Concept of Concurrency Control and Database

Security.


INTRODUCTION TO RELATIONAL DATABASE

DESIGN

Just as a house without a foundation will fall over, a database with poorly designed tables and

relationships will fail to meet the needs of its users. And hence, the need of a sound relational

database design originates.

The History of Relational Database Design

Dr. E. F. Codd first introduced formal relational database design in 1969 while he was at IBM.

Relational theory, which is based on set theory, applies to both databases and database

applications. Codd developed 12 rules that determine how well an application and its data adhere

to the relational model. Since Codd first conceived these 12 rules, the number of rules has

expanded into the hundreds.

Goals of Relational Database Design

The number one goal of relational database design is to, as closely as possible, develop a

database that models some real-world system. This involves breaking the real-world system into

tables and fields and determining how the tables relate to each other. Although on the surface

this task might appear to be trivial, it can be an extremely cumbersome process to translate a

real-world system into tables and fields.

A properly designed database has many benefits. The processes of adding, editing, deleting, and

retrieving table data are greatly facilitated by a properly designed database. In addition, reports

are easier to build. Most importantly, the database becomes easy to modify and maintain.

Rules of Relational Database Design

To adhere to the relational model, tables must follow certain rules. These rules determine what is

stored in tables and how the tables are related.

1. The Rules of Tables

Each table in a system must store data about a single entity. An entity usually represents a real-

life object or event. Examples of objects are customers, employees, and inventory items.

Examples of events include orders, appointments, and doctor visits.

2. The Rules of Uniqueness and Keys

Tables are composed of rows and columns. To adhere to the relational model, each table must

contain a unique identifier. Without a unique identifier, it becomes programmatically impossible

to uniquely address a row. You guarantee uniqueness in a table by designating a primary key,

which is a single column or a set of columns that uniquely identifies a row in a table.

Each column or set of columns in a table that contains unique values is considered a candidate

key. One candidate key becomes the primary key. The remaining candidate keys become


alternate keys. A primary key made up of one column is considered a simple key. A primary key

comprising multiple columns is considered a composite key.It is generally a good idea to pick a

primary key that is

Minimal (has as few columns as possible)

Stable (rarely changes)

Simple (is familiar to the user)

Following these rules greatly improves the performance and maintainability of your database

application, particularly if you are dealing with large volumes of data.

3. The Rules of Foreign Keys and Domains

A foreign key in one table is the field that relates to the primary key in a second table. For

example, the CustomerID is the primary key in the Customers table. It is the foreign key in

the Orders table.A domain is a pool of values from which columns are drawn. A simple

example of a domain is the specific data range of employee hire dates. In the case of the

Orders table, the domain of the CustomerID column is the range of values for the

CustomerID in the Customers table.

4. Normalization and Normal Forms Some of the most difficult decisions that you face as a developer are what tables to create and

what fields to place in each table, as well as how to relate the tables that you create.

Normalization is the process of applying a series of rules to ensure that your database achieves

optimal structure. Normal forms are a progression of these rules. Each successive normal form

achieves a better database design than the previous form did. Although there are several levels of

normal forms, it is generally sufficient to apply only the first three levels of normal forms.

5. Denormalization—Purposely Violating the Rules Although the developer's goal is normalization, often it makes sense to deviate from normal

forms. We refer to this process as denormalization. The primary reason for applying

denormalization is to enhance performance.If you decide to denormalize, document your

decision. Make sure that you make the necessary application adjustments to ensure that you

properly maintain the denormalized fields. Finally, test to ensure that the denormalization

process actually improves performance.

6. Integrity Rules Although integrity rules are not part of normal forms, they are definitely part of the database

design process. Integrity rules are broken into two categories. They include overall integrity rules

and database-specific integrity rules.

7. Database-Specific Rules The other set of rules applied to a database are not applicable to all databases but are, instead,

dictated by business rules that apply to a specific application. Database-specific rules are as

important as overall integrity rules. They ensure that only valid data is entered into a database.

An example of a database-specific integrity rule is that the delivery date for an order must fall

after the order date.


(Also, see Codd’s 12 rules)

Examining the Types of Relationships

Three types of relationships can exist between tables in a database: one-to-many, one-to-one, and

many-to-many. Setting up the proper type of relationship between two tables in your database is

imperative. The right type of relationship between two tables ensures

Data integrity

Optimal performance

Ease of use in designing system objects

The reasons behind these benefits are covered throughout this chapter. Before you can

understand the benefits of relationships, though, you must understand the types of relationships

available.

One-to-Many

A one-to-many relationship is by far the most common type of relationship. In a one-to-many

relationship, a record in one table can have many related records in another table. A common

example is a relationship set up between a Customers table and an Orders table. For each

customer in the Customers table, you want to have more than one order in the Orders table.

On the other hand, each order in the Orders table can belong to only one customer. The

Customers table is on the one side of the relationship, and the Orders table is on the many

side. For you to implement this relationship, the field joining the two tables on the one side of the

relationship must be unique.

One-to-One

In a one-to-one relationship, each record in the table on the one side of the relationship can have

only one matching record in the table on the many side of the relationship. This relationship is

not common and is used only in special circumstances. Usually, if you have set up a one-to-one

relationship, you should have combined the fields from both tables into one table.

Many-to-Many

In a many-to-many relationship, records in both tables have matching records in the other table.

An example is an Orders table and a Products table. Each order probably will contain

multiple products, and each product is found on many different orders. The solution is to create a

third table called OrderDetails. You relate the OrderDetails table to the Orders table

in a one-to-many relationship based on the OrderID field. You relate it to the Products table

in a one-to-many relationship based on the ProductID field.


DBMS VS RDBMS

History of DBMS and RDBMS

Database management systems first appeared on the scene in 1960 as computers began to grow

in power and speed. In the middle of 1960, there were several commercial applications in the

market that were capable of producing ―navigational‖ databases. These navigational databases

maintained records that could only be processed sequentially, which required a lot of computer

resources and time.

Relational database management systems were first suggested by Edgar Codd in the 1970s.

Because navigational databases could not be ―searched‖, Edgar Codd suggested another model

that could be followed to construct a database. This was the relational model that allowed users

to ―search‖ it for data. It included the integration of the navigational model, along with a tabular

and hierarchical model.

Difference between DBMS and RDBMS:- DBMS:

A DBMS is a storage area that persist the data in files. To perform the database

operations, the file should be in use.

Relationship can be established between 2 files.

There are limitations to store records in a single database file depending upon the

database manager used.

DBMS allows the relations to be established between 2 files.

Data is stored in flat files with metadata.

DBMS does not support client / server architecture.

DBMS does not follow normalization. Only single user can access the data.

DBMS does not impose integrity constraints.

ACID properties of database must be implemented by the user or the developer.

DBMS is used for simpler applications.

Small sets of data can be managed by DBMS.

RDBMS:-

RDBMS stores the data in tabular form.

It has additional condition for supporting tabular structure or data that enforces

relationships among tables.

RDBMS supports client/server architecture.

RDBMS follows normalization.

RDBMS allows simultaneous access of users to data tables.

RDBMS imposes integrity constraints.

ACID properties of the database are defined in the integrity constraints.

RDBMS is used for more complex applications.

RDBMS solution is required by large sets of data.


INTEGRITY RULE

Data integrity refers to maintaining and assuring the accuracy and consistency of data over its

entire life-cycle and is a critical aspect to the design, implementation and usage of any system

which stores, processes, or retrieves data.

Data integrity is the opposite of data corruption, which is a form of data loss. The overall intent

of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a

database correctly rejecting mutually exclusive possibilities,) and upon later retrieval, ensure the

data is the same as it was when it was originally recorded. In short, data integrity aims to prevent

unintentional changes to information. Data integrity is not to be confused with data security, the

discipline of protecting data from unauthorized parties.

Any unintended changes to data as the result of a storage, retrieval or processing operation,

including malicious intent, unexpected hardware failure, and human error, is failure of data

integrity. If the changes are the result of unauthorized access, it may also be a failure of data

security.

TYPES OF INTEGRITY RULES/CONSTRAINTS

Data integrity is normally enforced in a database system by a series of integrity constraints or

rules. Three types of integrity constraints are an inherent part of the relational data model: entity

integrity, referential integrity and domain integrity:

Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule

which states that every table must have a primary key and that the column or columns

chosen to be the primary key should be unique and not null.

Referential integrity concerns the concept of a foreign key. The referential integrity rule

states that any foreign-key value can only be in one of two states. The usual state of

affairs is that the foreign key value refers to a primary key value of some table in the

database. Occasionally, and this will depend on the rules of the data owner, a foreign-key

value can be null. In this case we are explicitly saying that either there is no relationship

between the objects represented in the database or that this relationship is unknown.

Domain integrity specifies that all columns in relational database must be declared upon

a defined domain. The primary unit of data in the relational data model is the data item.

Such data items are said to be non-decomposable or atomic. A domain is a set of values

of the same type. Domains are therefore pools of values from which actual values

appearing in the columns of a table are drawn.

User-defined integrity refers to a set of rules specified by a user, which do not belong to

the entity, domain and referential integrity categories.


If a database supports these features it is the responsibility of the database to insure data integrity

as well as the consistency model for the data storage and retrieval. If a database does not support

these features it is the responsibility of the applications to ensure data integrity while the

database supports the consistency model for the data storage and retrieval.

Having a single, well-controlled, and well-defined data-integrity system increases

stability (one centralized system performs all data integrity operations)

performance (all data integrity operations are performed in the same tier as the

consistency model)

re-usability (all applications benefit0 from a single centralized data integrity system)

Maintainability (one centralized system for all data integrity administration).

Many companies, and indeed many database systems themselves, offer products and services to

migrate out-dated and legacy systems to modern databases to provide these data-integrity

features. This offers organizations substantial savings in time, money, and resources because

they do not have to develop per-application data-integrity systems that must be re-factored each

time business requirements change.

Example

An example of a data-integrity mechanism is the parent-and-child relationship of related records.

If a parent record owns one or more related child records all of the referential integrity processes

are handled by the database itself, which automatically insures the accuracy and integrity of the

data so that no child record can exist without a parent (also called being orphaned) and that no

parent loses their child records. It also ensures that no parent record can be deleted while the

parent record owns any child records. All of this is handled at the database level and does not

require coding integrity checks into each applications.


CONCEPT OF CONCURRENCY CONTROL

Definition

Concurrency control is a database management systems (DBMS) concept that is used to address

conflicts with the simultaneous accessing or altering of data that can occur with a multi-user

system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous

transactions while preserving data integrity. The Concurrency is about to control the multi-user

access of Database.

Illustrative Example

To illustrate the concept of concurrency control, consider two travelers who go to electronic

kiosks at the same time to purchase a train ticket to the same destination on the same train.

There's only one seat left in the coach, but without concurrency control, it's possible that both

travelers will end up purchasing a ticket for that one seat. However, with concurrency control,

the database wouldn't allow this to happen. Both travellers would still be able to access the train

seating database, but concurrency control would preserve data accuracy and allow only one

traveler to purchase the seat.

This example also illustrates the importance of addressing this issue in a multi-user database.

Obviously, one could quickly run into problems with the inaccurate data that can result from

several transactions occurring simultaneously and writing over each other. The following section

provides strategies for implementing concurrency control.

Database transaction and the ACID rules

The concept of a database transaction (or atomic transaction) has evolved in order to enable

both a well understood database system behavior in a faulty environment where crashes can

happen any time, and recovery from a crash to a well understood database state. A database

transaction is a unit of work, typically encapsulating a number of operations over a database

(e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in

database and also other systems. Each transaction has well defined boundaries in terms of which

program/code executions are included in that transaction (determined by the transaction's

programmer via special transaction commands). Every database transaction obeys the following

rules (by support in the database system; i.e., a database system is designed to guarantee them for

the transactions it runs):

Atomicity - Either the effects of all or none of its operations remain ("all or nothing"

semantics) when a transaction is completed (committed or aborted respectively). In other

words, to the outside world a committed transaction appears (by its effects on the

database) to be indivisible (atomic), and an aborted transaction does not affect the

database at all, as if never happened.


Consistency - Every transaction must leave the database in a consistent (correct) state,

i.e., maintain the predetermined integrity rules of the database (constraints upon and

among the database's objects). A transaction must transform a database from one

consistent state to another consistent state (however, it is the responsibility of the

transaction's programmer to make sure that the transaction itself is correct, i.e., performs

correctly what it intends to perform (from the application's point of view) while the

predefined integrity rules are enforced by the DBMS). Thus since a database can be

normally changed only by transactions, all the database's states are consistent.

Isolation - Transactions cannot interfere with each other (as an end result of their

executions). Moreover, usually (depending on concurrency control method) the effects of

an incomplete transaction are not even visible to another transaction. Providing isolation

is the main goal of concurrency control.

Durability - Effects of successful (committed) transactions must persist through crashes

(typically by recording the transaction's effects and its commit event in a non-volatile

memory).

Why is concurrency control needed?

If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction

concurrency exists. However, if concurrent transactions with interleaving operations are allowed

in an uncontrolled manner, some unexpected, undesirable result may occur, such as:

1. The lost update problem: A second transaction writes a second value of a data-item

(datum) on top of a first value written by a first concurrent transaction, and the first value

is lost to other transactions running concurrently which need, by their precedence, to read

the first value. The transactions that have read the wrong value end with incorrect results.

2. The dirty read problem: Transactions read a value written by a transaction that has been

later aborted. This value disappears from the database upon abort, and should not have

been read by any transaction ("dirty read"). The reading transactions end with incorrect

results.

3. The incorrect summary problem: While one transaction takes a summary over the values

of all the instances of a repeated data-item, a second transaction updates some instances

of that data-item. The resulting summary does not reflect a correct result for any (usually

needed for correctness) precedence order between the two transactions (if one is executed

before the other), but rather some random result, depending on the timing of the updates,

and whether certain update results have been included in the summary or not.

Most high-performance transactional systems need to run transactions concurrently to meet their

performance requirements. Thus, without concurrency control such systems can neither provide

correct results nor maintain their databases consistent.

Concurrency Control Locking Strategies

Pessimistic Locking: This concurrency control strategy involves keeping an entity in a database

locked the entire time it exists in the database's memory. This limits or prevents users from


altering the data entity that is locked. There are two types of locks that fall under the category of

pessimistic locking: write lock and read lock.

With write lock, everyone but the holder of the lock is prevented from reading, updating, or

deleting the entity. With read lock, other users can read the entity, but no one except for the lock

holder can update or delete it.

Optimistic Locking: This strategy can be used when instances of simultaneous transactions, or

collisions, are expected to be infrequent. In contrast with pessimistic locking, optimistic locking

doesn't try to prevent the collisions from occurring. Instead, it aims to detect these collisions and

resolve them on the chance occasions when they occur.

Pessimistic locking provides a guarantee that database changes are made safely. However, it

becomes less viable as the number of simultaneous users or the number of entities involved in a

transaction increase because the potential for having to wait for a lock to release will increase.

Optimistic locking can alleviate the problem of waiting for locks to release, but then users have

the potential to experience collisions when attempting to update the database.

Lock Problems:

Deadlock:

When dealing with locks two problems can arise, the first of which being deadlock. Deadlock

refers to a particular situation where two or more processes are each waiting for another to

release a resource, or more than two processes are waiting for resources in a circular chain.

Deadlock is a common problem in multiprocessing where many processes share a specific type

of mutually exclusive resource. Some computers, usually those intended for the time-sharing

and/or real-time markets, are often equipped with a hardware lock, or hard lock, which

guarantees exclusive access to processes, forcing serialization. Deadlocks are particularly

disconcerting because there is no general solution to avoid them.

A fitting analogy of the deadlock problem could be a situation like when you go to unlock your

car door and your passenger pulls the handle at the exact same time, leaving the door still locked.

If you have ever been in a situation where the passenger is impatient and keeps trying to open the

door, it can be very frustrating. Basically you can get stuck in an endless cycle, and since both

actions cannot be satisfied, deadlock occurs.

Livelock:

Livelock is a special case of resource starvation. A livelock is similar to a deadlock, except that

the states of the processes involved constantly change with regard to one another wile never

progressing. The general definition only states that a specific process is not progressing. For

example, the system keeps selecting the same transaction for rollback causing the transaction to

never finish executing. Another livelock situation can come about when the system is deciding

which transaction gets a lock and which waits in a conflict situation.


An illustration of livelock occurs when numerous people arrive at a four way stop, and are not

quite sure who should proceed next. If no one makes a solid decision to go, and all the cars just

keep creeping into the intersection afraid that someone else will possibly hit them, then a kind of

livelock can happen.

Basic Timestamping:

Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method

doesn‘t use locks to control concurrency, so it is impossible for deadlock to occur. According to

this method a unique timestamp is assigned to each transaction, usually showing when it was

started. This effectively allows an age to be assigned to transactions and an order to be assigned.

Data items have both a read-timestamp and a write-timestamp. These timestamps are updated

each time the data item is read or updated respectively.

Problems arise in this system when a transaction tries to read a data item which has been written

by a younger transaction. This is called a late read. This means that the data item has changed

since the initial transaction start time and the solution is to roll back the timestamp and acquire a

new one. Another problem occurs when a transaction tries to write a data item which has been

read by a younger transaction. This is called a late write. This means that the data item has been

read by another transaction since the start time of the transaction that is altering it. The solution

for this problem is the same as for the late read problem. The timestamp must be rolled back and

a new one acquired.

Adhering to the rules of the basic timestamping process allows the transactions to be serialized

and a chronological schedule of transactions can then be created. Timestamping may not be

practical in the case of larger databases with high levels of transactions. A large amount of

storage space would have to be dedicated to storing the timestamps in these cases.


DATABASE SECURITY

"Secret Passwords, iron bolts, gated driveways, access cards, etc. - layers of physical security in

the real world are also found in the database world as well …. Creating and enforcing security

procedures helps to protect what is rapidly becoming the most important corporate asset:

DATA."

Database security concerns the use of a broad range of information security controls to protect

databases (potentially including the data, the database applications or stored functions, the

database systems, the database servers and the associated network links) against compromises of

their confidentiality, integrity and availability. The three main objectives of database security

are:

1. Secrecy / confidentiality: Information is not disclosed to unauthorized users. Private

remains private.

2. Integrity: Ensuring data are accurate; and data must be protected from unauthorized

modification/destruction (only authorized users can modify data)

3. Availability: Ensuring data is accessible whenever needed by the organization.

(Authorized users should not be denied access)

In order to achieve these objectives, following are employed:

1. A clear and consistent security policy. (about security measures to be enforced; What

data is to be protected, and which users get access to which portion of data)

2. Security mechanisms of underlying DBMS & OS; also external mechanisms, as securing

access to buildings. i.e. Security measures at various levels, must be taken to ensure

proper security.

Authorization and Authentication are the two A‘s of security, that every secure system must be

good at.

The Sources of External Security Threats are:

1. Physical threats: This includes physical threat to the Hardware of the database system.

And they may occur due to danger in: buildings; network; due to human errors (eg.

privileged accounts left logged in)

2. Hackers & Crackers:

white hat hackers: "good guys", hired to fix/test systems; don't release information

about system vulnerability to public until fixed.

Script kiddies: hacker "wannabes"; little programming skills and rely on tools written by

others.

black hat hackers: hackers who are motivated by greed or a desire to cause harm; most

dangerous; very knowledgeable and their activities are often undetectable.


Cyber-terrorists: hackers motivated by political, religious or philosophical agenda. They

may try to deface websites that support opposing positions. Current global climate fears

that they may even attempt to disable networks that handles utilities such as nuclear

plants and water system.

3. Types of Attacks:

Denial of Service (DoS) attack: A denial-of-service (DoS) or distributed denial-of-service

(DDoS) attack is an attempt to make a machine or network resource unavailable to its intended

users. Although the means to carry out, the motives for, and targets of a DoS attack vary, it

generally consists of efforts to temporarily or indefinitely interrupt or suspend services of a host

connected to the Internet. As clarification, distributed denial-of-service attacks are sent by two or

more persons, or bots, and denial-of-service attacks are sent by one person or system.

Buffer Overflow: There is a loophole in the programming error in system. A very popular

example: SQL injection.

A buffer overflow occurs when data written to a buffer also corrupts data values in memory

addresses adjacent to the destination buffer due to insufficient bounds checking. This can occur

when copying data from one buffer to another without first checking that the data fits within the

destination buffer.

Malware: Malware, short for malicious software, is any software used to disrupt computer

operation, gather sensitive information, or gain access to private computer systems. It can appear

in the form of executable code, scripts, active content, and other software. 'Malware' is a general

term used to refer to a variety of forms of hostile or intrusive software.

Social Engineering: The psychological manipulation of people into performing actions or

divulging confidential information. A type of confidence trick for the purpose of information

gathering, fraud, or system access, it differs from a traditional "con" in that it is often one of

many steps in a more complex fraud scheme.

Brute forces: A cryptanalytic attack that can, in theory, be used against any encrypted data.

(except for data encrypted in an information-theoretically secure manner). Such an attack might

be used when it is not possible to take advantage of other weaknesses in an encryption system (if

any exist) that would make the task easier. It consists of systematically checking all possible

keys or passwords until the correct one is found. In the worst case, this would involve traversing

the entire search space.

Now, as we have seen the sources of external security threats, let us study the Sources of

Internal Security Threats.

There may be employees threats: either intentional or accidental.

Intentional Employee threat:

personnel who employ hacking techniques to upgrade their legitimate access to root or

administrator.


personnel who take advantage of legitimate access to divulge trade secrets, steal money,

personal / political gain.

family members of employees who are visiting office & have been given access.

personnel who break into secure machine room to gain physical access to mainfram&

other large-system consoles.

former employees, seeking revenge.

Unintentional / Accidental Employee threat:

becoming victim to social engineering attack (unknowingly helping a hacker)

unknowingly revealing confidential information

physical damage (accidental) leading to data loss

inaccurate / improper usage

Other threats:

electrical power fluctuations

hardware failures

Natural disasters: fires, flood.

Now, knowing the sources of both external and internal source of security threats, let us move to

the solutions. They are also both external and internal.

Some External solutions to the security issues are:

1. Securing the perimeter: Firewall

2. Handling Malware

3. fixing buffer overflows

4. Physical server security:

security cameras; smart locks; removal of signs from machine/server room or hallways

(so that no one can locate sensitive hardware rooms easily); privileged accounts must

never be left logged in.

5. User Authentication:

Positive User Identification requires 3 things:

a) something the user knows: user IDs and passwords

b) something the user has: physical login devices, eg. for $5, PayPal sends small device

that generates 1 time password.

c) something the user is: biometrics

6. VPNs:

provides encryption for data transmissions over the Internet; uses IPSec protocol.

7. Combating Social Engineering


8. Handling other employee threats:

policies; employee training sessions; when employee is fired, its account is properly

erased, etc.

Some Internal solutions to the security threats are:

1. Internal database User-IDs & passwords

2. To provides control of access rights to tables, views and their components:

Types of Access Rights: The typical SQL-based DBMS provides 6 types of access

rights: SELECT: to retrieve, INSERT, UPDATE, DELETE, REFERENCES: to reference

table via a foreign key, and ALL PRIVILEGES.

3. Using an authorization matrix: a set of roles that are required for a business user. It is a

normal spreadsheet document with list of roles. Further, it also contains the list of

transaction in every role. When a new user joins the organization, he can find out the

roles for which access is required based on the FUG (Functional User Group) in the

authorization matrix.

4. Database Implementations (Data dictionary): A data dictionary is one tool organizations

can use to help ensure data accuracy.

GRANTING & REVOKING ACCESS RIGHTS:

Granting and revoking access-rights is the one of the most visible security feature of DBMS.

Using corresponding commands permissions to various objects of the database can be granted or

revoked. The following SQL commands can be used to grant and revoke access rights of a table

or a view to user(s).

Granting Rights:

Syntax:

GRANT type_of_rights ON table_or_view_name TO user_id

Examples:

GRANT SELECT ON order_summary TO acctg_mgr

GRANT SELECT ON order_summary TO acctg_mgr WITH GRANT

OPTION

(now user can also grant / pass rights to others)

GRANT SELECT, UPDATE (retail_price, distributor_name) ON

item TO intern1, intern2, intern3

GRANT SELECT ON order_summary TO PUBLIC


Revoking Rights:

Syntax:

REVOKE type_of_rights ON table_or_view_name FROM user_id

Examples:

the examples are similar to those of Granting rights

if rights have been passed by the user, i.e. the user has already granted rights to others, then:

REVOKE SELECT ON order_summary FROM acctg_mgr RESTRICT

(if rights would have been passed, it will not revoke)

REVOKE SELECT ON order_summary FROM acctg_mgr CASCADE

(if rights would have been passed, it will revoke rights from all users to those rights have

been passed)


End.

Dbms notesization 2014

Education

Transcript of Dbms notesization 2014