Dbms notesization 2014
-
Upload
gopal-sharma -
Category
Education
-
view
307 -
download
1
Transcript of Dbms notesization 2014
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 1
Paper: BCA-302
DATABASE
MANAGEMENT
SYSTEM
DEPARTMENT OF COMPUTER SCIENCE
DEV SANSKRITI VISHWAVIDYALAYA, SHANTIKUNJ,HARIDWAR (UK)
July-Dec 2014. Notes-ization @ DSVV.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 2
PREAMBLE
ACKNOWLEDGEMENTS
Department of Computer Science at Dev Sanskriti Vishwavidyalaya, Shantikunj,
Haridwar (Uttarakhand) was established in year 2006. Department started Bachelor
of Computer Applications (BCA) in year 2012. The serene and vibrant
environment of the university is a boon for the students. Academically they learn
new things everyday but along with that the curriculum of life management
induces virtues of humanities in them.
It was an initiative taken by students of BCA (2013-2016) batch to work in a team
and instead of doing revision only to do a prevision on the subject. They gave it a
name ―Notes-ization‖. Every one contributed to it as per his/her own caliber. But
finally it‘s an sincere effort by Manan Singh (Student BCA III Sem) to finally
make the work presentable and reliable to make the effort of his team mates
fruitful and worth significant. Special thanks to all the web sources. Thank you
every one for this inspirational work. Hope it will benefit one an all. Thanks again
for carrying the spirit of SHARE-CARE-PROSPER
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 3
TABLE OF CONTENTS
UNIT TOPICS UNIT 1 Introduction to Database: Definition of Database, Components
of DBMS, Three Level of Architecture proposal for DBMS,
Advantage & Disadvantage of DBMS, Data independence,
Purpose of Database Management Systems, Structure of DBMS,
DBA and its responsibilities, Data Dictionary, Advantages of
Data Dictionary.
UNIT 2 Data Models: Introduction to Data Models, Object Based
Logical Model, Record Base Logical Model- Relational Model,
Network Model, Hierarchical Model. Entity Relationship
Model, Entity Set, Attribute, Relationship Set. Entity
Relationship Diagram (ERD), Extended features of ERD.
UNIT 3.1 Relational Databases: Introduction to Relational Databases and
Terminology- Relation, Tuple, Attribute, Cardinality, Degree,
Domain. Keys- Super Key, Candidate Key, Primary Key,
Foreign Key.
UNIT 3.2 Relational Algebra: Operations, Select, Project, Union,
Difference, Intersection Cartesian product, Join, Natural Join.
UNIT 4 Structured Query Language (SQL): Introduction to SQL,
History of SQL, Concept of SQL, DDL Commands, DML
Commands, DCL Commands, Simple Queries, Nested Queries,
Normalization: Benefits of Normalization, Normal Forms-
1NF, 2NF, 3NF, BCNF & and Functional Dependency.
UNIT 5 Relational Database Design: Introduction to Relational
Database Design, DBMS v/s RDBMS. Integrity rule, Concept of
Concurrency Control and Database Security.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 4
UNIT 1
INTRODUCTION TO DATABASE
Introduction to Database: Definition of Database, Components of DBMS, Three
Level of Architecture proposal for DBMS, Advantage & Disadvantage of DBMS,
Data independence, Purpose of Database Management Systems, Structure of
DBMS, DBA and its responsibilities, Data Dictionary, Advantages of Data
Dictionary.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 5
DEFINITION OF DATABASE
A database can be summarily described as a repository for data. A database is structured
collection of data. Thus, card indices, printed catalogues of archaeological artifacts and
telephone directories are all examples of databases. It may be stored on a computer and
examined using a program. These programs are often called `databases', but more strictly are
database management systems (DMS).
Computer-based databases are usually organized into one or more tables. A table stores data in a
format similar to a published table and consists of a series of rows and columns. To carry the
analogy further, just as a published table will have a title at the top of each column, so each
column in a database table will have a name, often called a field name. The term field is often
used instead of column. Each row in a table will represent one example of the type of object
about which data has been collected.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 6
COMPONENTS OF DBMS
A database management system (DBMS) consists of several components. Each component plays
very important role in the database management system environment. The major components of
database management system are:
Software
Hardware
Data
Procedures
Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the
database and to control and manage the overall computerized database
1. DBMS software itself is the most important software component in the overall system.
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic
that are used to access database in database management system. Each program contains
statements that request the DBMS to perform operation on database. The operations may
include retrieving, updating, deleting data etc. The application program may be
conventional or online workstations or terminals
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with
associated I/O devices like disk drives), storage devices, I/O channels, electromechanical devices
that make interface between computers and the real world systems etc. and so on. It is impossible
to implement the DBMS without the hardware devices. In a network, a powerful computer with
high data processing speed and a storage device with large storage capacity are required as
database server.
Characteristics:
It is helpful to categorize computer memory into two classes: internal memory and external
memory. Although some internal memory is permanent, such as ROM, we are interested here
only in memory that can be changed by programs. This memory is often known as RAM. This
memory is volatile, and any electrical interruption causes the loss of data.
By contrast, magnetic disks and tapes are common forms of external memory. They are
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 7
Non-volatile memory and they retain their content for practically unlimited amounts of time. The
physical characteristics of magnetic tapes force them to be accessed sequentially, making them
useful for backup purposes, but not for quick access to specific data.
In examining the memory needs of a DBMS, we need to consider the following issues:
•Data of a DBMS must have a persistent character; in other words, data must remain available
long after any program that is using it has completed its work. Also, data must remain intact even
if the system breaks down.
•A DBMS must access data at a relatively high rate.
•Such a large quantity of data needs to be stored that the storage medium must be low cost.These
requirements are satisfied at the present stage of technological development only by magnetic
disks.
Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process
the data. In DBMS, databases are defined, constructed and then data is stored, updated and
retrieved to and from the databases. The database contains both the actual (or operational) data
and the metadata (data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the
DBMS. The users that operate and manage the DBMS require documented procedures on hot use
or run the database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.
Database Access Language
The database access language is used to access the data to and from the database. The users use
the database access language to enter new data, change the existing data in database and to
retrieve required data from databases. The user writes a set of appropriate commands in a
database access language and submits these to the DBMS. The DBMS translates the user
commands and sends it to a specific part of the DBMS called the Database Jet Engine. The
database engine generates a set of results according to the commands submitted by user, converts
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 8
these into a user readable form called an Inquiry Report and then displays them on the screen.
The administrators may also use the database access language to create and maintain the
databases.
The most popular database access language is SQL (Structured Query Language). Relational
databases are required to have a database query language.
Users
The users are the people who manage the databases and perform different operations on the
databases in the database system. There are three kinds of people who play different roles in
database system
1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic,
Java, or C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called
database administrator or simply DBA.
End-Users
The end-users are the people who interact with database management system to perform
different operations on database such as retrieving, updating, inserting, deleting data etc.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 9
3 LEVEL OF ARCHITECTURE PROPOSAL OF DBMS
The logical architecture, also known as the ANSI/SPARC architecture, was elaborated at the
beginning of the 1970s. It distinguishes three layers of data abstraction:
1. The physical layer contains specific and detailed information that describe show data are
stored: addresses of various data components, lengths in bytes, etc. DBMSs aim to
achieve data independence, which means that the database organization at the physical
level should be indifferent to application programs.
2. The logical layer describes data in a manner that is similar to, say, definitions of
structures in C. This layer has a conceptual character; it shields the user from the tedium
of details contained by the physical layer, but is essential in formulating queries for the
DMBS.
3. The user layer contains each user‘s perspective of the content of the database.
The logical architecture describes how data in the database is perceived by users. It is not
concerned with how the data is handled and processed by the DBMS, but only with how it looks.
The method of data storage on the underlying file system is not revealed, and the users can
manipulate the data without worrying about where it is located or how it is actually stored. This
results in the database having different levels of abstraction.
The majority of commercial Database Management System available today is based on the
ANSI/SPARC generalized DBMS architecture, as proposed by the ANSI/SPARC Study Group
on Data Base Management Systems. Hence this is also called as the ANSI/SPARC model. It
divides the system into three levels of abstraction: the internal or physical level, the conceptual
level, and the external or view level.
The External or View Level: The external or view level is the highest level of abstraction of database. It provides a window on
the conceptual view, which allows the user to see only the data of interest to them. The user can
be either an application program or an end user. There can be many external views as any
number of external schemas can be defined and they can overlap each other. It consists of the
definition of logical records and relationships in the external view. It also contains the method of
deriving the objects such as entities, attributes and relationships in the external view from the
conceptual view.
The Conceptual Level or Global Level: The conceptual level presents a logical view of the entire database as a unified whole. It allows
the user to bring all the data in the database together and see it in a consistent manner. Hence,
there is only one conceptual schema per database. The first stage in the design of a database is to
define the conceptual view, and a DBMS provides a data definition language for this purpose. it
describes all the records and relationships included in the database.
The data definition language used to create the conceptual level must not specify any physical
storage considerations that should be handled by the physical level. It does not provide any
storage or access details, but defines the information content only.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 10
The Internal or Physical Level: The collection of files permanently stored on secondary storage devices is known as the physical
database. The physical or internal level is the one closest to the physical storage and it provide a
low level description of the physical database, and an interface between the operating system file
system and the record structures used in higher level of abstraction. It is at this level that record
types and methods of storage are defined, as well as how stored fields are represented, what
physical sequence the stored records are in, and what other physical structures exist.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 11
ADVANTAGES & DISADVANTAGES OF DBMS
Advantages of the DBMS:
The DBMS serves as the intermediary between the user and the database. The database structure
itself is stored as a collection of files, and the only way to access the data in those files is through
the DBMS. The DBMS receives all application requests and translates them into the complex
operations required to fulfill those requests. The DBMS hides much of the database‘s internal
complexity from the application programs and users.
The different advantages of DBMS are as follows:
1. Improved data sharing.
The DBMS helps create an environment in which end users have better access to more and
better-managed data. Such access makes it possible for end users to respond quickly to changes
in their environment.
2. Improved data security.
The more users access the data, the greater the risks of data security breaches. Corporations
invest considerable amounts of time, effort, and money to ensure that corporate data are used
properly. A DBMS provides a framework for better enforcement of data privacy and security
policies.
3. Better data integration.
Wider access to well-managed data promotes an integrated view of the organization‘s operations
and a clearer view of the big picture. It becomes much easier to see how actions in one segment
of the company affect other segments.
4. Minimized data inconsistency.
Data inconsistency exists when different versions of the same data appear in different places.
For example, data inconsistency exists when a company‘s sales department stores a sales
representative‘s name as ―Bill Brown‖ and the company‘s personnel department stores that same
person‘s name as ―William G. Brown,‖ or when the company‘s regional sales office shows the
price of a product as $45.95 and its national sales office shows the same product‘s price as
$43.95. The probability of data inconsistency is greatly reduced in a properly designed database.
5. Improved data access.
The DBMS makes it possible to produce quick answers to ad hoc queries. From a database
perspective, a query is a specific request issued to the DBMS for data manipulation—for
example, to read or update the data. Simply put, a query is a question, and an ad hoc query is a
spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to
the application. For example, end users, when dealing with large amounts of sales data, might
want quick answers to questions (ad hoc queries) such as:
- What was the dollar volume of sales by product during the past six months?
- What is the sales bonus figure for each of our salespeople during the past three months?
- How many of our customers have credit balances of $3,000 or more?
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 12
6.Improved decision making.
Better-managed data and improved data access make it possible to generate better-quality
information, on which better decisions are based. The quality of the information generated
depends on the quality of the underlying data. Data quality is a comprehensive approach to
promoting the accuracy, validity, and timeliness of the data. While the DBMS does not guarantee
data quality, it provides a framework to facilitate data quality initiatives.
7.Increased end-user productivity.
The availability of data, combined with the tools that transform data into usable information,
empowers end users to make quick, informed decisions that can make the difference between
success and failure in the global economy.
Disadvantages of Database:
Although the database system yields considerable advantages over previous data management
approaches, database systems do carry significant disadvantages. For example:
1. Increased costs.
Database systems require sophisticated hardware and software and highly skilled personnel. The
cost of maintaining the hardware, software, and personnel required to operate and manage a
database system can be substantial. Training, licensing, and regulation compliance costs are
often overlooked when database systems are implemented.
2. Management complexity.
Database systems interface with many different technologies and have a significant impact on a
company‘s resources and culture. The changes introduced by the adoption of a database system
must be properly managed to ensure that they help advance the company‘s objectives. Given the
fact that database systems hold crucial company data that are accessed from multiple sources,
security issues must be assessed constantly.
3. Maintaining currency.
To maximize the efficiency of the database system, you must keep your system current.
Therefore, you must perform frequent updates and apply the latest patches and security measures
to all components. Because database technology advances rapidly, personnel training costs tend
to be significant. Vendor dependence. Given the heavy investment in technology and personnel
training, companies might be reluctant to change database vendors. As a consequence, vendors
are less likely to offer pricing point advantages to existing customers, and those customers might
be limited in their choice of database system components.
4. Frequent upgrade/replacement cycles.
DBMS vendors frequently upgrade their products by adding new functionality. Such new
features often come bundled in new upgrade versions of the software. Some of these versions
require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs
money to train database users and administrators to properly use and manage the new features.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 13
DATA INDEPENDENCE
A major objective for three-level architecture is to provide data independence, which means that
upper levels are unaffected by changes in lower levels.
There are two kinds of data independence:
• Logical data independence
• Physical data independence
Logical Data Independence
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. The change would be absorbed by the mapping between
the external and conceptual levels. Logical data independence also insulates application
programs from operations such as combining two records into one or splitting an existing record
into two or more records. This would require a change in the external/conceptual mapping so as
to leave the external view unchanged.
Physical Data Independence
Physical data independence indicates that the physical storage structures or devices could be
changed without affecting conceptual schema. The change would be absorbed by the mapping
between the conceptual and internal levels. Physical data independence is achieved by the
presence of the internal level of the database and the mapping or transformation from the
conceptual level of the database to the internal level. Conceptual level to internal level mapping,
therefore provides a means to go from the conceptual view (conceptual records) to the internal
view and hence to the stored data in the database (physical records).
If there is a need to change the file organization or the type of physical device used as a result of
growth in the database or new technology, a change is required in the conceptual/ internal
mapping between the conceptual and internal levels. This change is necessary to maintain the
conceptual level invariant. The physical data independence criterion requires that the conceptual
level does not specify storage structures or the access methods (indexing, hashing etc.) used to
retrieve the data from the physical storage medium. Making the conceptual schema physically
data independent means that the external schema, which is defined on the conceptual schema, is
in turn physically data independent.
The Logical data independence is difficult to achieve than physical data independence as it
requires the flexibility in the design of database and prograll1iller has to foresee the future
requirements or modifications in the design.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 14
PURPOSE OF DBMS
Database management systems were developed to handle the following difficulties of typical
file-processing systems supported by conventional operating systems. Data redundancy and
inconsistency. Difficulty in accessing data isolation – multiple files and formats. Integrity
problems, Atomicity of updates, Concurrent access by multiple users and Security problems.
In the early days, database applications were built directly on top of the
file system.
Drawbacks of using file systems to store data:
- Data redundancy and inconsistency.
- Multiple file formats, duplication of information in different file.
- Difficulty in accessing data.
- Need to write a new program to carry out each new task.
- Data isolation — multiple files and formats. - Integrity constraints
- Hard to add new constraints or change existing ones.
These problems and others led to the development of database management systems.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 15
STRUCTURE OF DBMS
The components in the structure of DBMS are described below:
DBA :- DBA means Database Administrator. He\She is person which is responsible for the
installation, configuration, upgrading, administration, monitoring, maintenance, and security of
databases in an organization.
Database Schema: - A database schema defines its entities and the relationship among them.
Database schema is a descriptive detail of the database, which can be depicted by means of
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 16
schema diagrams. All these activities are done by database designer to help programmers in
order to give some ease of understanding all aspect of database.
DDL Processor: - The DDL Processor or Compiler converts the data definition statements into a
set of tables. These tables contain the metadata concerning the database and are in a form that
can be used by other components of DBMS.
Data Dictionary: - Information pertaining to the structure and usage of data contained in the
database, the metadata, is maintained in a data dictionary. The term system catalog also describes
this meta data. The data dictionary, which is a database itself, documents the data. Each database
user can consult the data dictionary to learn what each piece of data and various synonyms of the
data fields mean.
Integrity Checker: - It checks the integrity constraints so that only valid data can be entered into
the database.
User: - The users are either application programmers or on-line terminal users of any degree of
sophistication. Each user has a language at his or her disposal. For the application programmer it
will be a conventional programming language, such as COBOL or PL/I; for the terminal user it
will be either a query language or a special purpose language tailored to that user‘s requirements
and supported by an on-line application program.
Queries:- In DBMS a search questions that instruct the program to locate records that need
specific criteria is called Query.
Query Processor: - The query processor transforms user queries into a series of low level
instructions. It is used to interpret the online user's query and convert it into an efficient series of
operations in a form capable of being sent to the run time data manager for execution. The query
processor uses the data dictionary to find the structure of the relevant portion of the database and
uses this information in modifying the query and preparing and optimal plan to access the
database.
Programmer:- Programmer can manipulate the database in all possible ways.
Application Program:- Complete, self-contained computer program that performs a specific
useful task, other than system maintenance functions application programs.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 17
DML Processor:- DML processor process the data manipulation statements such as select ,
update , delete etc. that are passed by the application programmer into a computer program that
perform specified task by programmer such as delete a table etc.
Authorization Control: - The authorization control module checks the authorization of users in
terms of various privileges to users.
Command Process: - The command processor processes the queries passed by authorization
control module.
Query Optimizer: - The query optimizers determine an optimal strategy for the query
execution.
Transaction Manager: - The transaction manager ensures that the transaction properties should
be maintained by the system.
Scheduler: - It provides an environment in which multiple users can work on same piece of data
at the same time in other words it supports concurrency.
Buffer Manager: - The buffer manager is the software layer responsible for bringing pages from
disk to main memory as needed. The buffer manager manages the available main memory by
partitioning it into a collection of pages, which we collectively refer to as the buffer pool.
Recovery Manager: - The recovery manager , which is responsible for maintaining a log and
restoring the system to a consistent state after a crash. It is responsible for ensuring transaction
atomicity and durability.
Physical Database: - The physical database specifies additional storage details. We must decide
what file organization to use to store the relations and create auxiliary data structure called
indexes.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 18
DBA & ITS RESPONSIBILITIES
A Database Administrator (acronym: DBA) is an IT Professionals responsible for: Installation,
Configuration, Upgrade, Administration, Monitoring, Maintenance and Securing, of databases in
an organization.
Database administrator responsibilities are as follows:-
1. Database Installation and upgrading
2. Database configuration including configuration of background Processes
3. Database performance optimization & fine tuning
4. Configuring the Database in Archive log mode
5. Maintaining Database in archive log mode
6. Devising Database backup strategy
7. Monitoring & checking the Database backup & recovery process
8. Database troubleshooting
9. Database recovery in case of crash
10. Database security
11. Enabling auditing features wherever required
12. Table space management
13. Database Analysis report
14. Database health monitoring
15. Centralized controlled
List of skills required to become database administrators are:-
Communication skills
Knowledge of database theory
Knowledge of database design
Knowledge about the RDBMS itself, e.g. Oracle Database, IBM DB2, Microsoft SQL
Server, Adaptive Server Enterprise, MaxDB, PostgreSQL
Knowledge of Structured Query Language (SQL) e.g. SQL/PSM, Transact-SQL
General understanding of distributed computing architectures, e.g. Client/Server,
Internet/Intranet, Enterprise
General understanding of the underlying operating system, e.g. Windows, Unix, Linux.
General understanding of storage technologies, memory management, disk arrays,
NAS/SAN, networking
General understanding of routine maintenance, recovery, and handling failover of a
Database
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 19
DATA DICTIONARY & ITS ADVANTAGES
A data dictionary, or metadata repository, as defined in the Dictionary of Computing, is a
"centralized repository of information about data such as meaning, relationships to other data,
origin, usage, and format." The term may have one of several closely related meanings pertaining
to databases and database management systems (DBMS):
a document describing a database or collection of databases.
an integral component of a DBMS that is required to determine its structure.
a piece of middleware that extends or supplants the native data dictionary of a DBMS.
The term data dictionary and data repository are used to indicate a more general software
utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the
information stored in it to the user and the DBA, but it is mainly accessed by the various
software modules of the DBMS itself, such as DDL and DML compilers, the query optimizer,
the transaction processor, report generators, and the constraint enforcer. On the other hand, a
data dictionary is a data structure that stores metadata, i.e., (structured) data about data.
Any well designed database will surely include a data dictionary as it gives database
administrators and other users easy access to the type of data that they should expect to see in
every table, row, and column of the database, without actually accessing the database.
Since a database is meant to be built and used by multiple users, making sure that everyone is
aware of the types of data each field will accept becomes a challenge, especially when there is a
lack of consistency when assigning data types to fields. A data dictionary is a simple yet
effective add-on to ensure data consistency.
Some of the typical components of a data
dictionary entry are:
• Name of the table
• Name of the fields in each table
• Data type of the field (integer, date,
text…)
• Brief description of the expected data
for each field
• Length of the field
• Default value for that field
• Is the field Nullable or Not Nullable?
• Constraints that apply to each field, if
any
Not all of these fields (and many others) will apply to every single entry in the data dictionary.
For example, if the entry were about the root description of the table, it might not require any
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 20
information regarding fields. Some data dictionaries also include location details, such as each
field‘s current location, where it actually came from, and details of the physical location such as
the IP address or DNS of the server.
Format and Storage
There exists no standard format for creating a data dictionary. Meta-data differs from table to
table. Some database administrators prefer to create simple text files, while others use diagrams
and flow charts to display all their information. The only prerequisite for a data dictionary is that
it should be easily searchable.
Again, the only applicable rule for data dictionary storage is that it should be at a convenient
location that is easily accessible to all database users. The types of files used to store data
dictionaries range from text files, xml files, spreadsheets, an additional table in the database
itself, to handwritten notes. It is the database administrator‘s duty to make sure that this
document is always up to date, accurate, and easily accessible.
Creating the Data Dictionary
First, all the information required to create the data dictionary must be identified and recorded in
the design documents. If the design documents are in a compatible format, it should be possible
to directly export the data in them to the desired format for the data dictionary. For example,
applications like Microsoft Visio allow database creation directly from the design structure and
would make creation of the data dictionary simpler. Even without the use of such tools, scripts
can be deployed to export data from the database to the document. There is always the option of
manually creating these documents as well.
Advantages of a Data Dictionary
The primary advantage of creating an informative and well designed data dictionary is that it
exudes clarity on the rest of the database documentation. Also, when a new user is introduced to
the system or a new administrator takes over the system, identifying table structures and types
becomes simpler. In scenarios involving large databases where it is impossible for an
administrator to completely remember specific bits of information about thousands of fields, a
data dictionary becomes a crucial necessity.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 21
UNIT 2
DATA MODELS
Data Models: Introduction to Data Models, Object Based Logical Model, Record
Base Logical Model- Relational Model, Network Model, Hierarchical Model.
Entity Relationship Model, Entity Set, Attribute, Relationship Set. Entity
Relationship Diagram (ERD), Extended features of ERD.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 22
INTRODUCTION TO DATA MODELS
Data Model can be defined as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an organization.
The importance of data models is that data models can facilitate interaction among the designer,
the application programmer and the end user. Also, a well- developed data model can even foster
improved understanding of the organization for which the database design is developed. Data
models are a communication tool as well.
A data model comprises of three components:
• A structural part, consisting of a set of rules according to which databases can be constructed.
• A manipulative part, defining the types of operation that are allowed on the data (this includes
the operations that are used for updating or retrieving data from the database and for changing
the structure of the database).
• Possibly a set of integrity rules, which ensures that the data is accurate.
The purpose of a data model is to represent data and to make the data understandable. There
have been many data models proposed in the literature. They fall into three broad categories:
• Object Based Data Models
• Physical Data Models
• Record Based Data Models
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 23
OBJECT BASED LOGICAL MODEL ,
Object based data models use concepts such as entities, attributes, and relationships. An entity is a distinct
object (a person, place, concept, and event) in the organization that is to be represented in the database.
An attribute is a property that describes some aspect of the object that we wish to record, and a
relationship is an association between entities.
Some of the more common types of object based data model are:
• Entity-Relationship
• Object Oriented
• Semantic
• Functional
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 24
RECORD BASED LOGICAL MODEL & ITS TYPES
Record based logical models are used in describing data at the logical and view levels. In
contrast to object based data models, they are used to specify the overall logical structure of the
database and to provide a higher-level description of the implementation. Record based models
are so named because the database is structured in fixed format records of several types. Each
record type defines a fixed number of fields, or attributes, and each field is usually of a fixed
length.
The three most widely accepted record based data models are:
• Hierarchical Model
• Network Model
• Relational Model
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 25
RELATIONAL MODEL
The relational model for database is a database model based on first-order predicate logic, first
formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all
data is represented in terms of tuples, grouped into relations. A database organized in terms of
the relational model is a relational database.
Advantages of Relational Model:
Conceptual Simplicity: We have seen that both the hierarchical and network models are
conceptually simple, but relational model is simpler than both of those two.
Structural Independence: In the Relational model, changes in the structure do not affect the
data access.
Design Implementation: the relational model achieves both data independence and structural
independence.
Ad hoc query capability: the presence of very powerful, flexible and easy to use capability is
one of the main reason for the immense popularity of the relational database model.
Disadvantages of Relational Model:
Hardware overheads: relational database systems hide the implementation complexities and the
physical data storage details from the user. For doing this, the relational database system need
more powerful hardware computers and data storage devices.
Ease of design can lead to bad design: the relational database is easy to design and use. The
user needs not to know the complexities of the data storage. This ease of design and use can lead
to the development and implementation of the very poorly designed database management
system.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 26
NETWORK MODEL
The network model is a database model conceived as a flexible way of representing objects and
their relationships. Its distinguishing feature is that the schema, viewed as a graph in which
object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or
lattice.
While the hierarchical database model structures data as a tree of records, with each record
having one parent record and many children, the network model allows each record to have
multiple parent and child records, forming a generalized graph structure.
Advantages Network Model :
Conceptual Simplicity: just like hierarchical model it also simple and easy to implement.
Capability to handle more relationship types: the network model can handle one to one1:1 and
many to many N: N relationship.
Ease to access data: the data access is easier than the hierarchical model.
Data Integrity: Since it is based on the parent child relationship, there is always a link between
the parent segment and the child segment under it.
Data Independence: The network model is better than hierarchical model in case of data
independence.
Disadvantages of Network Model:
System Complexity: All the records have to maintain using pointers thus the database structure
becomes more complex.
Operational Anomalies: As discussed earlier in network model large number of pointers is
required so insertion, deletion and updating more complex.
Absence of structural Independence: there is lack of structural independence because when we
change the structure then it becomes compulsory to change the application too.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 27
HIERARCHICAL MODEL
A hierarchical database model is a data model in which the data is organized into a tree-like
structure. The data is stored as records which are connected to one another through links. A
record is a collection of fields, with each field containing only one value. The entity type of a
record defines which fields the record contains.
Advantages of Hierarchical model
1.Simplicity: Since the database is based on the hierarchical structure, the relationship between
the various layers is logically simple.
2.Data Security :Hierarchical model was the first database model that offered the data security
that is provided by the dbms.
3.Data Integrity: Since it is based on the parent child relationship, there is always a link
between the parent segment and the child segment under it.
4.Efficiency: It is very efficient because when the database contains a large number of 1:N
relationship and when the user require large number of transaction.
Disadvantages of Hierarchical model:
1. Implementation complexity: Although it is simple and easy to design, it is quite complex to
implement.
2.Database Management Problem: If you make any changes in the database structure, then you
need to make changes in the entire application program that access the database.
3.Lack of Structural Independence: there is lack of structural independence because when we
change the structure then it becomes compulsory to change the application too.
4.Operational Anomalies: Hierarchical model suffers from the insert, delete and update
anomalies, also retrieval operation is difficult.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 28
ENTITY RELATIONSHIP MODEL
In DBMS, an entity–relationship model (ER model) is a data model for describing the data or
information aspects of a business domain or its process requirements, in an abstract way that
lends itself to ultimately being implemented in a database such as a relational database. The main
components of ER models are entities (things) and the relationships that can exist among them,
and databases.
Entity–relationship modeling was developed by Peter Chen and published in a 1976 paper.
However, variants of the idea existed previously, and have been devised subsequently such as
supertype and subtype data entities and commonality relationships.
ER model is represents real world situations using concepts, which are commonly used by
people. It allows defining a representation of the real world at logical level.ER model has no
facilities to describe machine-related aspects.
In ER model the logical structure of data is captured by indicating the grouping of data into
entities. The ER model also supports a top-down approach by which details can be given in
successive stages.
Entity: - An entity is something which is described in the database by storing its data, it
may be a concrete entity a conceptual entity.
Entity set:- An entity set is a collection of similar entities.
Attribute:- An attribute describes a property associated with entities. Attribute will have a
name and a value for each entity.
Domain:- A domain defines a set of permitted values for a attribute.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 29
ENTITY SET
Entity set:- An entity set is a collection of similar entities.
A database can be modeled as:
*"a collection of entities,
*"relationship among entities.
An entity is an object that exists and is distinguishable from other objects.
Ex:- specific person, company, event, plant
Entities have attributes
Ex:- people have names and addresses.
An entity set is a set of entities of the same type that share the same properties.
Ex:- set of all persons, companies, trees, holidays.
Entity is a thing in the real world with an independent existence. and entity set is collection or set
all entities of a particular entity type at any point of time. Take an example: a company have
many employees ,and these employees are defined as entities(e1,e2,e3....) and all these entities
having same attributes are defined under ENTITY TYPE employee, and set{e1,e2,.....} is called
entity set. we can also understand this by an anology. entity type is like fruit which is a class .we
haven't seen any "fruit" yet though we have seen instance of fruit like "apple ,banana,mango etc.
hence..fruit=entity type=EMPLOYEE apple=entity=e1 or e2 or e3enity set= bucket of apple,
banana ,mango etc={e1,e2......}
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 30
ATTRIBUTE In a database management system (DBMS), an attribute may describe a component of the
database, such as a table or a field, or may be used itself as another term for a field.
A table contains one or more columns there columns are the attribute in DBMS For Example--
say you have a table named "employee information" which have the following columns
ID,NAME,ADDRESS THEN id ,name address are the attributes of employee.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 31
RELATIONSHIP SET
The association among entities is called relationship. For example, employee entity has relation
works at with department. Another example is for student who enrolls in some course. Here,
Works at and Enrolls are called relationship.
Relationship Set
Relationship of similar type is called relationship set. Like entities, a relationship too can have
attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in an relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set which can be associated to the number of entities of other set via relationship set. One-to-one: one entity from entity set A can be associated with at most one entity of entity set B and vice versa.
One-to-many: One entity from entity set A can be associated with more than one entities of
entity set B but from entity set B one entity can be associated with at most one entity.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 32
Many-to-one: More than one entities from entity set A can be associated with at most one entity
of entity set B but one entity from entity set B can be associated with more than one entity from
entity set A.
Many-to-many: one entity from A can be associated with more than one entity from B and vice
versa
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 33
ENTITY RELATIONSHIP DIAGRAM (ERD)
Definition: An entity-relationship (ER) diagram is a specialized graphic that illustrates the
relationships between entities in a database. ER diagrams often use symbols to represent three
different types of information. Boxes are commonly used to represent entities. Diamonds are
normally used to represent relationships and ovals are used to represent attributes.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 34
Components of ER Diagram
The ER diagram has three main components:
1) Entity
An Entity can be an object, place, person or class. In ER Diagram, an entity is represented using
rectangles. Consider an example of an Organization. Employee, manager, Department, Product
and many more can be taken as entities from an Organization.
Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity as it
cannot be uniquely identified by its own attributes alone.Weak entity is an entity that depends on
another entity. Weak entity doen‘t have key attribute of their own. Double rectangle represents
weak entity.
2) Attribute
An Attribute describes a property or characterstic of an entity. For example, Name, Age,
Address etc can be attributes of a Student. Databases contain information about each entity. This
information is tracked in individual fields known as attributes, which normally correspond to the
columns of a database table.An attribute is represented using eclipse.
Key Attribute
A key attribute is the unique, distinguishing characteristic of the entity. For example, an
employee‘s social security number might be the employee‘s key attribute.Key attribute
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 35
represents the main characterstic of an Entity. It is used to represent Primary key. Ellipse with
underlying lines represent Key Attribute.
Composite Attribute
An attribute can also have their own attributes. These attributes are known as Composite
attribute.
3) Relationship
Relationships illustrate how two entities share information in the database structure.A
Relationship describes relations between entities. Relationship is represented using diamonds.
There are three types of relationship that exist between Entities.
Binary Relationship
Recursive Relationship
Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 36
1. One to One : This type of relationship is rarely seen in real world.
The above example describes that one student can enroll ony for one course and a course
will also have only one Student. This is not what you will usually see in relationship.
2. One to Many : It reflects business rule that one entity is associated with many number of
same entity. For example, Student enrolls for only one Course but a Course can have
many Students.
The arrows in the diagram describes that one student can enroll for only one course.
3. Many to Many :
The above diagram represents that many students can enroll for more than one courses.
Recursive Relationship
In some cases, entities can be self-linked. For example, employees can supervise other
employees.
Ternary Relationship
Relationship of degree three is called Ternary relationship.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 37
EXTENDED FEATURES OF ERD
ER Model has the power of expressing database entities in conceptual hierarchical manner such
that, as the hierarchical goes up it generalize the view of entities and as we go deep in the
hierarchy it gives us detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to
represent a more generalized view. For example, a particular student named, Mira can be
generalized along with all the students, the entity shall be student, and further a student is person.
The reverse is called specialization where a person is student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain
the properties of all the generalized entities is called Generalization. In generalization, a number
of entities are brought together into one generalized entity based on their similar characteristics.
For an example, pigeon, house sparrow, crow and dove all can be generalized as Birds.
Specialization
Specialization is a process, which is opposite to generalization, as mentioned above. In
specialization, a group of entities is divided into sub-groups based on their characteristics. Take a
group Person for example. A person has name, date of birth, gender etc. These properties are
common in all persons, human beings. But in a company, a person can be identified as employee,
employer, customer or vendor based on what role do they play in company.
Similarly, in a school database, a person can be specialized as teacher, student or staff; based on
what role do they play in school as entities.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 38
Inheritance
We use all above features of ER-Model, in order to create classes of objects in object oriented
programming. This makes it easier for the programmer to concentrate on what she is
programming. Details of entities are generally hidden from the user, this process known as
abstraction.
One of the important features of Generalization and Specialization, is inheritance, that is, the
attributes of higher-level entities are inherited by the lower level entities.
For example, attributes of a person like name, age, and gender can be inherited by lower level
entities like student and teacher etc.
Aggregation
The E-R model cannot express relationships among relationships.
When would we need such a thing?
Consider a DB with information about employees who work on a particular project and use a
number of machines doing that work. We get the E-R diagram shown in Figure below.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 39
Figure 2.20: E-R diagram with redundant relationships
Relationship sets work and uses could be combined into a single set. However, they shouldn't be,
as this would obscure the logical structure of this scheme.
The solution is to use aggregation.
An abstraction through which relationships are treated as higher-level entities.
For our example, we treat the relationship set work and the entity sets employee and
project as a higher-level entity set called work.
Figure below shows the E-R diagram with aggregation.
Figure 2.21: E-R diagram with aggregation
Transforming an E-R diagram with aggregation into tabular form is easy. We create a table for
each entity and relationship set as before.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 40
The table for relationship set uses contains a column for each attribute in the primary key of
machinery and work.
Aggregation is an abstraction in which relationship sets are treated as higher level entity sets.
Here a relationship set is embedded inside an entity set, and these entity sets can participate in
relationships.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 41
UNIT 3.1
RELATIONAL DATABASES
Relational Databases: Introduction to Relational Databases and Terminology-
Relation, Tuple, Attribute, Cardinality, Degree, Domain. Keys- Super Key,
Candidate Key, Primary Key, Foreign Key.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 42
INTRODUCTION TO RELATIONAL DATABASES
Relational database was proposed by Edgar Codd (of IBM Research) around 1969. It has since
become the dominant database model for commercial applications (in comparison with other
database models such as hierarchical, network and object models). Today, there are many
commercial Relational Database Management System (RDBMS), such as Oracle, IBM DB2 and
Microsoft SQL Server. There are also many free and open-source RDBMS, such as MySQL,
mSQL (mini-SQL) and the embedded JavaDB.
A relational database organizes data in tables (or relations). A table is made up of rows and
columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A
database table is similar to a spreadsheet. However, the relationships that can be created among
the tables enable a relational database to efficiently store huge amount of data, and effectively
retrieve selected data.
A language called SQL (Structured Query Language) was developed to work with relational
databases.
Features of RDBMS
Features and characteristics of an RDBMS can be best understood by the Codd‘s 12 rules.
Codd’s12 Rules
Codd's thirteen rules are a set of thirteen rules (numbered zero to twelve) proposed by Edgar F.
Codd, a pioneer of the relational model for databases, designed to define what is required from a
database management system in order for it to be considered relational, i.e., a relational database
management system (RDBMS). They are sometimes jokingly referred to as "Codd's Twelve
Commandments". They are as follows:
Rule 0: The Foundation rule:
A relational database management system must manage its stored data using only its
relational capabilities. The system must qualify as relational, as a database, and as a
management system. For a system to qualify as a relational database management system
(RDBMS), that system must use its relational facilities (exclusively) to manage the
database.
Rule 1: The information rule:
All information in a relational database (including table and column names) is
represented in only one way, namely as a value in a table.
Rule 2: The guaranteed access rule:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 43
All data must be accessible. It says that every individual scalar value in the database must
be logically addressable by specifying the name of the containing table, the name of the
containing column and the primary key value of the containing row.
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support
a representation of "missing information and inapplicable information" that is systematic,
distinct from all regular values (for example, "distinct from zero or any other number", in
the case of numeric values), and independent of data type. It is also implied that such
representations must be manipulated by the DBMS in a systematic way.
Rule 4: Active onlinecatalog based on the relational model:
The system must support an online, inline, relational catalog that is accessible to
authorized users by means of their regular query language. That is, users must be able to
access the database's structure (catalog) using the same query language that they use to
access the database's data.
Rule 5: The comprehensive data sublanguage rule:
The system must support at least one relational language that
1. Has a linear syntax
2. Can be used both interactively and within application programs,
3. Supports data definition operations (including view definitions), data
manipulation operations (update as well as retrieval), security and integrity
constraints, and transaction management operations (begin, commit, and
rollback).
Rule 6: The view updating rule:
All views that are theoretically updatable must be updatable by the system.
Rule 7: High-level insert, update, and delete:
The system must support set-at-a-time insert, update, and delete operators. This means
that data can be retrieved from a relational database in sets constructed of data from
multiple rows and/or multiple tables. This rule states that insert, update, and delete
operations should be supported for any retrievable set rather than just for a single row in a
single table.
Rule 8: Physical data independence:
Changes to the physical level (how the data is stored, whether in arrays or linked lists
etc.) must not require a change to an application based on the structure.
Rule 9: Logical data independence:
Changes to the logical level (tables, columns, rows, and so on) must not require a change
to an application based on the structure. Logical data independence is more difficult to
achieve than physical data independence.
Rule 10: Integrity independence:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 44
Integrity constraints must be specified separately from application programs and stored in
the catalog. It must be possible to change such constraints as and when appropriate
without unnecessarily affecting existing applications.
Rule 11: Distribution independence:
The distribution of portions of the database to various locations should be invisible to
users of the database. Existing applications should continue to operate successfully:
1. when a distributed version of the DBMS is first introduced; and
2. when existing distributed data are redistributed around the system.
Rule 12: The non-subversion rule:
If the system provides a low-level (record-at-a-time) interface, then that interface cannot
be used to subvert the system, for example, bypassing a relational security or integrity
constraint.
Advantages of RDBMS
RDBMS offers an extremely structured way of managing data (although a good database design
is needed) as everything in an RDBMS is represented as values in relations (i.e. tables). Also,
many obvious advantages are visible within the 13 rules stated by Codd.
Disadvantages of RDBMS
RDBMS is very good for related data, but an unorganized and unrelated data creates only chaos
within RDBMS. That‘s a reason why the emerging trends such as Big Data (where a lot of data
from various sources is to be analyzed) don‘t welcome RDBMS, but non-relational (or non-SQL
DBMSs) DBMS for their purpose.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 45
TERMINOLOGIES: (RELATION, TUPLE, ATTRIBUTE,
CARDINALITY, DEGREE, DOMAIN)
Relation:
Definition-
A database relation is a predefined row/column format for storing information in a relational
database. Relations are equivalent to tables. It is also known as table.
Example-
Tuple:
Definition-
In the context of databases, a tuple is one record (one row).
Example-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 46
Attribute:
Definition-
In general, an attribute is a characteristic. In a database management system (DBMS), an
attribute refers to a database component, such a table. It also may refer to a database field.
Attributes describe the instances in the row of a database.
Example-
Degree:
Definition-
It is the number of attribute of its relation schema. It is an association among two or more
entities.
Example-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 47
Cardinality:
Definition-
In the context of databases, cardinality refers to the uniqueness of data values contained in a
column.
It is not common, but cardinality also sometimes refers to the relationships between tables.
Cardinality between tables can be one-to-one, many-to-one, or many-to-many.
Example-
Domain
Definition-
In database technology, domain refers to the description of an attribute's allowed values. The
physical description is a set of values the attribute can have, and the semantic, or logical,
description is the meaning of the attribute.
Example-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 48
KEYS: (SUPER KEYS, CANDIDATE KEY, PRIMARY
KEY, FOREIGN KEY)
Definition of a Key-
Simply consists of one or more attributes that determine other attributes.
The key is defined as the column or attribute of the database table. For example if a table has id,
name and address as the column names then each one is known as the key for that table. We can
also say that the table has 3 keys as id, name and address. The keys are also used to identify each
record in the database table.
The following are the various types of keys available in the DBMS system.
Super key
Candidate key
Primary key
Foreign key
Super Key-
A superkey is a combination of columns that uniquely identifies any row within a relational
database management system (RDBMS) table. A candidate key is a closely related concept
where the superkey is reduced to the minimum number of columns required to uniquely identify
each row.
For example, imagine a table used to store customer master details that contains columns such
as:
customer name
customer id
social security number (SSN)
address
date of birth
A certain set of columns may be extracted and guaranteed unique to each customer. Examples of
superkeys are as follows:
Name, SSN, Birthdate
ID, Name, SSN
However, this process may be further reduced. It can be assumed that each customer id is unique
to each customer. So, the superkey may be reduced to just one field, customer id, which is the
candidate key. However, to ensure absolute uniqueness, a composite candidate key may be
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 49
formed by combining customer id with SSN.
A primary key is a special term for candidate keys designated as unique identifiers for all table
rows. Until this point, only columns have been considered for suitability and are thus termed
candidate keys. Once a candidate key is decided, it may be defined as the primary key at the
point of table creation.
Candidate key-
A candidate key is a column, or set of columns, in a table that can uniquely identify any database
record without referring to any other data. Each table may have one or more candidate keys, but
one candidate key is special, and it is called the primary key. This is usually the best among the
candidate keys.
When a key is composed of more than one column, it is known as a composite key.
The best way to define candidate keys is with an example. For example, a bank‘s database is
being designed. To uniquely define each customer‘s account, a combination of the customer‘s ID
or social security number (SSN) and a sequential number for each of his or her accounts can be
used. So, Mr. Andrew Smith‘s checking account can be numbered 223344-1, and his savings
account 223344-2. A candidate key has just been created.
In this case, the bank‘s database can issue unique account numbers that are guaranteed to prevent
the problem just highlighted. For good measure, these account numbers can have some built-in
logic. For example checking accounts can begin with a ‗C,‘ followed by the year and month of
creation, and within that month, a sequential number.
Note that it was possible to uniquely identify each account using the aforementioned SSNs and a
sequential number (assuming no government mess-up, in which the same number is issued to
two people). So, this is a candidate key that can potentially be used to identify records. However,
a much better way of doing the same thing has just been demonstrated - creating a candidate key.
In fact, if the chosen candidate key is so good that it can certainly uniquely identify each and
every record, then it should be used as the primary key. All databases allow the definition of one,
and only one, primary key per table.
Primary key-
It is a candidate key that is chosen by the database designer to identify entities with in an entity
set. Primary key is the minimal super keys. In the ER diagram primary key is represented by
underlining the primary key attribute. Ideally a primary key is composed of only a single
attribute. But it is possible to have a primary key composed of more than one attribute.
A primary key is a special relational database table column (or combination of columns)
designated to uniquely identify all table records.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 50
A primary key‘s main features are:
It must contain a unique value for each row of data.
It cannot contain null values.
A primary key is either an existing table column or a column that is specifically generated by the
database according to a defined sequence.
For example, students are routinely assigned unique identification (ID) numbers, uniquely-
identifiable Social Security numbers.
For example, a database must hold all of the data stored by a commercial bank. Two of the
database tables include the CUSTOMER_MASTER, which stores basic and static customer data
(e.g., name, date of birth, address and Social Security number, etc.) and the
ACCOUNTS_MASTER, which stores various bank account data (e.g., account creation date,
account type, withdrawal limits or corresponding account information, etc.).
To uniquely identify customers, a column or combination of columns is selected to guarantee
that two customers never have the same unique value. Thus, certain columns are immediately
eliminated, e.g., surname and date of birth. A good primary key candidate is the column that is
designated to hold unique and government-assigned Social Security numbers. However, some
account holders (e.g., children) may not have Social Security numbers, and this column‘s
candidacy is eliminated. The next logical option is to use a combination of columns such as the
surname to the date of birth to the email address, resulting in a long and cumbersome primary
key.
Foreign Key-
A foreign key is a column or group of columns in a relational database table that provides a link
between data in two tables. It acts as a cross-reference between tables because it references the
primary key of another table, thereby establishing a link between them.
In complex databases, data in a domain must be added across multiple tables, thus maintaining a
relationship between them. The concept of referential integrity is derived from foreign key
theory.
Foreign keys and their implementation are more complex than primary keys.
For any column acting as a foreign key, a corresponding value should exist in the link table.
Special care must be taken while inserting data and removing data from the foreign key column,
as a careless deletion or insertion might destroy the relationship between the two tables.
For instance, if there are two tables, customer and order, a relationship can be created between
them by introducing a foreign key into the order table that refers to the customer ID in the
customer table. The customer ID column exists in both customer and order tables. The customer
ID in the order table becomes the foreign key, referring to the primary key in the customer table.
To insert an entry into the order table, the foreign key constraint must be satisfied.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 51
Some referential actions associated with a foreign key action include the following: Cascade: When rows in the parent table are deleted, the matching foreign key columns in the
child table are also deleted, creating a cascading delete.
Set Null: When a referenced row in the parent table is deleted or updated, the foreign key values
in the referencing row are set to null to maintain the referential integrity.
Triggers: Referential actions are normally implemented as triggers. In many ways foreign key
actions are similar to user-defined triggers. To ensure proper execution, ordered referential
actions are sometimes replaced with their equivalent user-defined triggers.
Set Default: This referential action is similar to "set null." The foreign key values in the child
table are set to the default column value when the referenced row in the parent table is deleted or
updated.
Restrict: This is the normal referential action associated with a foreign key. A value in the parent
table cannot be deleted or updated as long as it is referred to by a foreign key in another table.
No Action: This referential action is similar in function to the "restrict" action except that a no-
action check is performed only after trying to alter the table.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 52
UNIT 3.2
RELATIONAL ALGEBRA
Relational Algebra: Operations, Select, Project, Union, Difference, Intersection
Cartesian product, Join, Natural Join.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 53
INTRODUCTION
Relational algebra, first described by E.F. Codd while at IBM, is a family of algebra with a
well-founded semantics used for modeling the data stored in relational databases, and defining
queries on it.
In relational algebra the queries are composed using a collection of operators, and each query
describes a step by step procedure for computing the desired result.
The queries are specified in operational and procedural manner that‘s why its called the
procedural language also.
There are many operations which we include in the relational algebra .
Each relational query describes a step by step procedure for computing the desired answer
,based on the order in which operators are applied in the query.
The procedural nature of the algebra allows us to think of an algebra as a recipe, or a plan for
evaluating a query, and relational system in fact use algebra expressions to represent query
evaluation plans.
Relational algebra expression
It is an expression which is a composition of the operators and it forms a complex query called
a relational algebra expression.
A unary algebra operator applied to a single expression ,and a binary algebra operator applied to
two expression
Fundamental operations of Relational algebra:
Select
Project
Union
Set different
Cartesian product
Rename
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 54
SELECT
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a
relation based on a selection condition.
The selection condition acts as a filter
Keeps only those tuples that satisfy the qualifying condition
Tuples satisfying the condition are selected whereas the other tuples are discarded
(filtered out)
Examples:
A. Select the STUDENT tuples whose age is 18
sigmaage=18 (STUDENT)
B. Select the STUDENT tuples whose course is bca
sigmacourse=BCA (STUDENT)
C. Select the students from the ―student relation instances‖ whose gender is male
sigmagender=F(STUDENT)
Student name Age gender course
Ritika 18 F BCA
Prerna 19 F Bsc.
Ankush 20 M BA
Preeti 18 F Bsc.
Pragyan 20 M BA
Ritu 18 F BCA
Janvi 20 F BCA
Answer of the first select statement is :
A.
Student name Age gender course
Ritika 18 F BCA
Preeti 18 F Bsc.
Ritu 18 F BCA
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 55
PROJECT
PROJECT Operation is denoted by p (pi)
If we are interested in only certain attributes of relation, we use PROJECT.
This operation keeps certain columns (attributes) from a relation and discards the other columns.
Example:
To list all the students name and course only in the student relation model.
Pistudent_name, course (student)
(output from the table first)
Student-name Course
Ritika BCA
Prerna Bsc.
Ankush BA
Preeti Bsc.
Pragyan BA
Ritu BCA
Janvi BCA
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 56
UNION
It is a Binary operation, denoted by sign of union in set theory. The result of R union S, is a
relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are
eliminated.
The two operand relations R and S must be ―type compatible‖ (or UNION compatible), & R and
S must have same number of attributes.
Each pair of corresponding attributes must be type compatible (have same or compatible
domains). Eg. in the bank enterprise we have depositor and borrower almost similar attributes
and types.
Customer name Id no.
RITA 301
GITA 302
RAM 303
(DEPOSITOR‘S RELATIONAL MODEL)
Customer name Id no.
Sham 300
Surbhi 304
Rita 301
Ram 303
(Borrower‘s relational model)
(Output: a union b)
Customer_name Id no
Rita 301
Gita 302
Ram 303
Sham 300
Surbhi 304
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 57
DIFFERENCE
SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by – .The result of R – S, is a
relation that includes all tuples that are in R but not in S. The attribute names in the result will be
the same as the attribute names in R. The two operand relations R and S must be ―type
compatible‖
Output: a-b
Customer name Idno
Gita 302
The elements of a which are not belongs to b contains only a single result
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 58
INTERSECTION
INTERSECTION: The result of the operation R intersection S, is a relation that includes all
tuples that are in both R and S.
The attribute names in the result will be the same as the attribute names in R
The two operand relations R and S must be ―type compatible‖
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 59
CARTESIAN PRODUCT ,
The resulting relation state has one tuple for each combination of tuples—one from R and one
from S. Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have
nR * nS tuples.
The two operands do NOT have to be "type compatible‖.
Example:
R.
A 1
B 2
D 3
F 4
S.
D 3
E 4
Output: R*S
A 1 D 3
A 1 E 4
B 2 D 3
B 2 E 4
D 3 D 3
D 3 E 4
F 4 D 3
F 4 E 4
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 60
JOIN ,
It is just a cross product of two relations.
Join allow you to evaluate a join condition between the attributes of the relations on
which the join operations undertaken .
It is used to combine related tuples from two relations.
Join condition is called theta.
Notation:-
R JOINjoin condition S
Let us take an instance:-
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 61
NATURAL JOIN
Another variation of JOIN called NATURAL JOIN — denoted by *
Invariably the JOIN involves an equality test, and thus is often described as an equi-join. Such
joins result in two attributes in the resulting relation having exactly the same value. A 'natural
join' will remove the duplicate attribute(s).
In most systems a natural join will require that the attributes have the same name to
identify the attribute(s) to be used in the join. This may require a renaming mechanism.
If you do use natural joins make sure that the relations do not have two attributes with the
same name by accident.
Example:
The following query results refer to this database state.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 62
A simple database:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 63
Example Natural Join Operations on the sample database above:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 64
SUMMARY OF OPERATIONS
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 65
UNIT 4
STRUCTURED QUERY LANGUAGE (SQL)
&
NORMALIZATION
Structured Query Language (SQL): Introduction to SQL, History of SQL,
Concept of SQL, DDL Commands, DML Commands, DCL Commands, Simple
Queries, Nested Queries,
Normalization: Benefits of Normalization, Normal Forms- 1NF, 2NF, 3NF,
BCNF & and Functional Dependency.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 66
INTRODUCTION TO SQL
Introduction & Brief History:
SQL is a special-purpose programming language designed for managing data held in a relational
database management system (RDBMS). Originally based upon relational algebra and tuple
relational calculus, SQL consists of a data definition language and a data manipulation language.
The scope of SQL includes data insert, query, update and delete, schema creation and
modification, and data access control.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks." Despite not entirely adhering to the relational model as described by Codd, it became the
most widely used database language.
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the
International Organization for Standardization (ISO) in 1987. Since then, the standard has been
revised to include a larger set of features.
Why SQL?
Allows users to access data in relational database management systems.
Allows users to describe the data.
Allows users to define the data in database and manipulate that data.
Allows embedding within other languages using SQL modules, libraries & pre-compilers.
Allows users to create and drop databases and tables. Allows users to create view, stored procedure, functions in a database.
Allows users to set permissions on tables, procedures and views
Advantages of SQL:
High Speed: SQL Queries can be used to retrieve large amounts of records from a
database quickly and efficiently.
Well Defined Standards Exist: SQL databases use long-established standard, which is
being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.
No Coding Required: Using standard SQL it is easier to manage database systems
without having to write substantial amount of code.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 67
Emergence of ORDBMS: Previously SQL databases were synonymous with relational
database. With the emergence of Object Oriented DBMS, object storage capabilities are
extended to relational databases
Disadvantages of SQL:
Difficulty in Interfacing: Interfacing an SQL database is more complex than adding a few
lines of code.
More Features Implemented in Proprietary way: Although SQL databases conform to
ANSI &ISO standards, some databases go for proprietary extensions to standard SQL to
ensure vendor lock-in.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 68
HISTORY OF SQL
In 1970 Edgar F. Codd, member of IBM Lab, published the classic paper, ‘A relational
model of data large shared data banks‘.
With Codd‘s paper ,a great deal of research and experiments started and led to the design
and prototype implementation of a number of relational languages.
One such language was Structured English Query Language (SEQUEL), defined by
Donald D. Chamberlin and Raymond F. Boyce.
The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark
of the UK-based Hawker Siddeley aircraft company.
A revised version of SEQUEL was released in 1976-77 called SEQUEL/2 or SQL
In 1978, IBM worked to develop Codd's ideas and released a product named System/R.
In 1986IBM developed the first prototype of relational database and standardized by
ANSI. The first relational database was released by Relational Software and its later
becoming ORACLE.
IN 1986 ANSI and ISO published an SQL standard called ‗SQU-86‘.
The next version of standard was SQL-89,SQL-92, followed by SQL-1999,SQL-
2003,SQL-2006, SQL-2008.
According to the industry trends , it is obvious that the relational model and SQL Will continue
to enhance its position in near future
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 69
CONCEPT BEHIND SQL
SQL Process
When you are executing an SQL command for any RDBMS, the system determines the best way
to carry out your request and SQL engine figures out how to interpret the task.
There are various components included in the process. These components are:-
Query Dispatcher
Optimization Engines
Classic Query Engine
SQL Query Engine
Classic query engine handles all non-SQL queries but SQL query engine won't handle logical
files.
SQL Architecture
Types of SQL Commands
The following sections discuss the basic categories of commands used in SQL to perform various
functions . The main categories are:-
DDL (Data Definition Language)
DML (Data Manipulation Language)
DQL (Data Query Language)
DCL (Data Control Language)
TCL (Transactional Control Language)
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 70
DDL COMMANDS
DDL (Data Definition Language) Commands of SQL allow the Data Definition functions like
creating, altering and dropping the tables.
The following are the various DDL Commands, along with their syntax, use and examples:
#1. CREATE
USE: creates a new table, view of a table, or other objects in database.
SYNTAX:
CREATE TABLE table_name(
Column_name1 data_type(size),
Column_name2 data_type(size),
….
);
EXAMPLE :
CREATE TABLE Persons
(PersonIDint,
LastNamevarchar(255),
FirstNamevarchar(255),
Address varchar(255),
City varchar(255)
);
#2. ALTER
USE : modifies an existing database object such as a table.
SYNTAX :
ALTER TABLE table_name
ADD column_namedatatype;
or
ALTER TABLE table_name
DROP COLUMN column_name;
or
ALTER TABLE table_name
MODIFY COLUMN column_namedatatype;
EXAMPLE :
ALTER TABLE Persons
ADD DateOfBirth date;
or
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 71
ALTER TABLE Persons
DROP COLUMN DateOfBirth;
or
ALTER TABLE Persons
ALTER COLUMN DateOfBirth year;
#3. DROP
USE : deletes an entire table, a view of a table, or other object in the database.
SYNTAX : DROP TABLE table_name;
EXAMPLE : DROP TABLE Persons;
#4. TRUNCATE
USE : remove all records from a table, including all spaces allocated for the
records are removed; also, reinitializes the primary key.
SYNTAX : TRUNCATE TABLE table_name;
EXAMPLE : TRUNCATE TABLE persons;
#5. COMMENT
USE : Add comments to the data dictionary.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 72
DML COMMANDS
DML (Data Manipulation Language) Commands of SQL allow the Data Manipulation functions
like inserting, updating and deleting data values in the tables created using DDL Commands.
The following are the various DML Commands, along with their syntax, use and examples:
#1. INSERT
USE : creates a record.
SYNTAX :
INSERT INTO table_name
VALUES (value1,value2,value3,...);
or
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);
EXAMPLE :
INSERT INTO Persons VALUES(1,‘manan’,’07-08-1994’);
#2. UPDATE
USE : modifies records.
SYNTAX :
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE some_column=some_value;
EXAMPLE :
UPDATE Students
SET Fine=0
WHERE Stu_ID=404;
#3. DELETE
USE : delete records (but the structure remain intact).
SYNTAX :
DELETE FROM table_name
WHERE some_column=some_value;
EXAMPLE :
DELETE FROM Persons
WHERE Stu_ID=21;
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 73
#4. CALL
USE : call a PL/SQL or java subprogram.
#5. EXPLAIN PLAN
USE : explain access path to data.
SYNTAX :
EXPLAIN PLAN FOR
SQL_Statement;
EXAMPLE :
EXPLAIN PLAN FOR
SELECT last_name FROM employees;
#6. LOCK TABLE
USE : control concurrency.
SYNTAX :
LOCK TABLE table_name
IN EXCLUSIVE MODE
NOWAIT;
This locks the table in exclusive mode but does not wait if another user already has locked the table:
EXAMPLE :
LOCK TABLE employees
IN EXCLUSIVE MODE
NOWAIT;
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 74
DCL COMMANDS
DCL (Data Control Language) Commands of SQL allow the Data Manipulation functions like
granting and revoking permissions, committing changes, roll backing, etc.
The following are the various DCL Commands, along with their syntax, use and examples:
#1. GRANT
USE : gives a privilege to user(s).
SYNTAX :
GRANT permission [, ...]
ON [schema_name.]object_name [(column [, ...])]
TO database_principal[, ...]
[WITH GRANT OPTION]
EXAMPLE :
GRANT SELECT
ON Invoices
TO AnneRoberts;
#2. REVOKE
USE : takes back privileges/grants from users.
SYNTAX :
REVOKE [GRANT OPTION FOR] permission [, ...]
ON [schema_name.]object_name [(column [, ...])]
FROM database_principal[, ...]
[CASCADE]
EXAMPLE :
REVOKE SELECT
ON Invoices
FROM AnneRoberts;
#3. COMMIT
USE : save work done.
SYNTAX : COMMIT;
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 75
#4. ROLLBACK
USE : restore database to original sice the last COMMIT.
SYNTAX : ROLLBACK;
#5. SAVEPOINT
USE : identify a point in a transaction in which you can later rollback.
SYNTAX : SAVEPOINT SAVEPOINT_NAME;
& then, ROLLBACK TO SAVEPOINT_NAME;
RELEASE SAVEPOINT SAVEPOINT_NAME;
#6. SET TRANSACTION
USE : set space transaction, change transaction options like what rollback
segments to use.
SYNTAX : SET TRANSACTION [ READ WRITE | READ ONLY ];
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 76
SIMPLE QUERIES & NESTED QUERIES
A Simple Query is a query that searches using just one parameter. A simple query might use all
of the fields in a table and search using just one parameter, Or it might use just the necessary
fields which the information is required, but it will still use just one parameter(search criteria).
The following are some types of queries:
• A select query retrieves data from one or more of the tables in your database, or other
queries there, and displays the results in a datasheet. You can also use a select query to
group data, and to calculate sums, averages, counts, and other types of totals.
• A parameter query is a type of select query that prompts you for input before it runs. The
query then uses your input as criteria that control your results. For example, a typical
parameter query asks you for starting high and low values, and only returns records that
fall within those values.
• A cross-tab query uses row headings and column headings so you can see your data in
terms of two categories at once.
• An action query alters your data or your database. For example, you can use an action
query to create a new table, or add, delete, or change your data.
A Nested Query or a subquery or inner query is a query in a query.
A subquery is usually added in the WHERE Clause of sql statement. Most of the time, a
subquery is used when you know how to search for a value using a SELECT statement, but do
not know the exact value.
A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
A query result can be used in a condition of a Where clause. In such case, a query is called a
subquery and complete SELECT statement is called a nested query. We can also used subquery
can also be placed within HAVING clause. But subquery cannot be used with ORDERBY
clause.
Subqueries are queries nested inside other queries, marked off with parentheses, and sometimes
referred to as "inner" queries within "outer" queries. Most often, you see subqueries in WHERE
or HAVING clauses.
A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT,
UPDATE, or DELETE statement, or inside another subquery.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 77
A subquery can appear anywhere an expression can be used, if it returns a single value.
Statements that include a subquery usually take one of these formats:
WHERE expression [NOT] IN (subquery).
WHERE expression comparison_operator [ANY | ALL] (subquery).
WHERE [NOT] EXISTS (subquery).
Following are the TYPES of Nested Queries:
Single - Row Subqueries
The single-row subquery returns one row. A special case is the scalar subquery, which returns a
single row with one column. Scalar subqueries are acceptable (and often very useful) in virtually
any situation where you could use a literal value, a constant, or an expression. The single row
query uses any operator in the query .i.e. (=, <=, >= <>, <, >). If any of the operators in the
preceding table are used with a subquery that returns more than one row, the query will fail.
Multiple-row subqueries
Multiple-row subqueries return sets of rows. These queries are commonly used to generate result
sets that will be passed to a DML or SELECT statement for further processing. Both single-row
and multiple-row subqueries will be evaluated once, before the parent query is run. Since it
returns multiple values, the query must use the set comparison operators (IN, ALL, ANY). If you
use a multi row sub query with the equals comparison operators, the database will return an error
if more than one row is returned. The operators in the following table can use multiple-row
subqueries:
Symbol Meaning
IN equal to any member in a list
ANY returns rows that match any value on a list
ALL returns rows that match all the values in a list
Multiple–Column Subquery
A subquery that compares more than one column between the parent query and subquery is
called the multiple column subqueries. In multiple-column subqueries, rows in the subquery
results are evaluated in the main query in pair-wise comparison. That is, column-to-column
comparison and row-to-row comparison.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 78
Correlated Subquery
A correlated subquery has a more complex method of execution than single- and multiple-row
subqueries and is potentially much more powerful. If a subquery references columns in the
parent query, then its result will be dependent on the parent query. This makes it impossible to
evaluate the subquery before evaluating the parent query.
Some points to remember about the subquery are:
• Subqueries are queries nested inside other queries, marked off with parentheses.
• The result of inner query will pass to outer query for the preparation of final result.
• ORDER BY clause is not supported for Nested Queries.
• You cannot use Between Operator.
• Subqueries will always return only a single value for the outer query.
• A sub query must be put in the right hand of the comparison operator.
• A query can contain more than one sub-query.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 79
NORMALIZATION
Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminating redundant data (for example, storing the same data in
more than one table) and ensuring data dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.
Normalization is a process, in which we systematically examine relations for anomalies and,
when detected, remove those anomalies by splitting up the relation into two new, related
relations.
Normalization is an important part of the database development process: Often during
normalization, the database designers get their first real look into how the data are going to
interact in the database.
Finding problems with the database structure at this stage is strongly preferred to finding
problems further along in the development process because at this point it is fairly easy to cycle
back to the conceptual model (Entity Relationship model) and make changes. Normalization can
also be thought of as a trade-off between data redundancy and performance. Normalizing a
relation reduces data redundancy but introduces the need for joins when all of the data is required
by an application such as a report query.
Problems without Normalization
Without normalization it becomes difficult to handle and update the database, without facing
data loss. Insertion, updation, deletion anomalies are very frequent if database is not normalized.
To understand these anomalies lets us take an example of student table.
S_id S_name S_address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physic
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 80
Updation Anamoly:
To update address of the student who occur twice or more than twice in a table, we will have to
update S_address columns in all the row, else data will become inconsistent.
Insertion anamoly:
Suppose for the new admission we have a S_id(student id), name, address of the student but if
student is not opted for any subjects yet than we have to inset Null there , leading to insertion
anamoly.
Deletion Anamoly:
If S_id 401 has only one subject and temporarily he drops it , when we delete that row entire
student record will be deleted along with it.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 81
BENEFITS OF NORMALIZATION
Normalization produces smaller tables with smaller rows:
More rows per page (less logical I/O)
More rows per I/O (more efficient)
More rows fit in cache (less physical I/O)
The benefits of normalization include:
Searching, sorting, and creating indexes is faster, since tables are narrower, and more
rows fit on a data page.
You usually have more tables.
You can have more clustered indexes (one per table), so you get more flexibility in tuning
queries.
Index searching is often faster, since indexes tend to be narrower and shorter.
More tables allow better use of segments to control physical placement of data.
You usually have fewer indexes per table, so data modification commands are faster.
Fewer null values and less redundant data, making your database more compact.
Triggers execute more quickly if you are not maintaining redundant data.
Data modification anomalies are reduced.
Normalization is conceptually cleaner and easier to maintain and change as your needs
change.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 82
NORMAL FORMS (1NF, 2NF, 3NF, BCNF)
Relations can fall into one or more categories (or classes) called Normal Forms .
Normal Form: A class of relations free from a certain set of modification anomalies.
Normal forms are given names such as:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF
The Normalization Process for a given relation consists of:
Apply the definition of each normal form (starting with 1NF).
If a relation fails to meet the definition of a normal form, change the relation (most often by
splitting the relation into two new relations) until it meets the definition.
Re-test the modified/new relations to ensure they meet the definitions of each normal form.
First Normal Form (1NF)
A relation is in first normal form if it meets the definition of a relation:
1. Each attribute (column) value must be a single value only.
2. All values for a given attribute (column) must be of the same type.
3. Each attribute (column) name must be unique.
4. The order of attributes (columns) is insignificant
5. No two tuples (rows) in a relation can be identical.
6. The order of the tuples (rows) is insignificant
Each table should be organized into row and each row should have a primary key that
distinguishes it as unique. The primary key is usually a single column but sometimes more than
one column can be combined to create a single primary key.
For example consider a table is not in first normal form
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 83
In First Normal Form any row must not have a column in which more than one value is saves,
like separated with commas rather that, we must separated such data into multiple rows.
Table in first Normal Form
Student Age Subject
Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths
Using First Normal Form data redundancy increases as there will be many columns with the
same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
A relation is in second normal form (2NF) if all of its non-key attributes are dependent on
all of the key.
Another way to say this: A relation is in second normal form if it is free from partial-key
dependencies
Relations that have a single attribute for a key are automatically in 2NF.
This is one reason why we often use artificial identifiers (non-composite keys) as keys
As per the second normal form there must not be any partial dependency of any colomn on
primary key. It means that for a table that has concatenated primary key, each colomn in the
table that is not part of primary key must depend upon the entire concatenated key for its
existence. If any column depends only on one part of the concatenates key, then the table fails
second normal form
In the example of First Normal Form, there are two rows for Adam, to include multiple subjects
that he has opted for. While this is searchable, and follows First Normal Form, it is an inefficient
use of space. Also in the above table in first normal form while the candidate key is {Student,
Student Age Subject
Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 84
subject} , Age of student only depends on student columns which is incorrect as per second
normal form. To achieve second normal form , it would be helpful to split out the subject into an
independent table, and match then up using the student names as foreign keys.
New student table following second normal form will be:
Student Age
Adam 15
Alex 14
Stuart 17
In student table the candidate key will be student column, because all other column i.e. Age
depend on it
New subject table introduced for second normal form will be:
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
In subject Table the candidate key will be {subject, Student} column. Now both the above table
qualifies for second normal form and will never suffer updated anomalies.
Third Normal Form (3NF)
A relation is in third normal form (3NF) if it is in second normal form and it contains no
transitive dependencies. Consider relation R containing attributes A, B and C. R(A, B, C)
If A → B and B → C then A → C
Transitive Dependency: Three attributes with the above dependencies
Third normal forms apply that every non prime attribute of table must be dependent on primary
key. The transitive function dependency should be removed from the table. The table must be in
Second Normal Form. For example the table with the following field
Student_Detail table:
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 85
Student_id Student_name DOB Street City State Zip
In this table student_id is the primary key, but street, city, state depends upon zip. The
dependency between zip and other field is transitive dependency. Hence to apply third normal
form we need to move the street, city, state to the new table, with zip as primary key.
New Student_Detail Table:
Student_id Student_name DOB Zip
Address_Table:
Zip Street City State
The advantage of removing transitive dependency is:
1. Amount of data duplication is reduce.
2. Data integrity achieved.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd normal form (BCNF)
A relation is in BCNF, if and only if, every determinant is a candidate key.
The difference between 3NF and BCNF is that for a functional dependency A->B,
3NF allows this dependency in a relation if B is a primary-key attribute and A is not a
candidate key,
Where as BCNF insists that for this dependency to remain in a relation, A must be a
candidate key.
Boyce and codd normal form is the high version of the Third Normal Form. This form deal with
certain type of anamoly that is not held by third normal form. A third Normal form table which
does not have any multiple overlapping candidate key is said to be in BCNF.
Client Interview
ClientNo interviewDate InterviewTime StaffNo roomNo
CR76 13/5/02 10:30 SG5 G101
CR76 13/5/02 12:00 SG5 G101
CR74 13/5/02 12:00 SG37 G102
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 86
CR56 1/7/02 10:30 SG5 G102
1. FD1: clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary Key)
2. FD2: staffNo, interviewDate, interviewTime- > clientNo (Candidate key)
3. FD3: roomNo, interviewDate, interviewTime -> clientNo, staffNo (Candidate key)
4. FD4: staffNo, interviewDate- > roomNo (not a candidate key)
As a consequence the ClientInterview relation may suffer from update anomalies.
For example, two tuples have to be updated if the roomNo need be changed for staffNo
SG5 on the 13-May-02.
To transform the ClientInterview relation to BCNF, we must remove the violating
functional dependency by creating two new relations called Interview and StaffRoom as
shown below:
1. Interview (clientNo, interviewDate, interviewTime, staffNo)
2. StaffRoom (staffNo, interviewDate, roomNo)
Interview
ClientNo InterviewDate InterviewTime StaffNo
CR76 13/5/02 10:30 SG5
CR76 13/5/02 12:00 SG5
CR74 13/5/02 12:00 SG37
CR56 1/7/02 10:30 SG5
StaffRoom
staffNo InterviewDate RoomNo
SG5 13/5/02 G101
SG37 13/5/02 G102
SG5 1/7/02 G102
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 87
FUNCTIONAL DEPENDENCY
Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute.
A functional dependency occurs when one attribute in a relation uniquely determines another
attribute. This can be written A -> B which would be the same as stating "B is functionally
dependent upon A"
Example: If R is a relation with attributes X and Y, a functional dependency between the
attributes is represented as X->Y, which specifies Y is functionally dependent on X. Here X is a
determinant set and Y is a dependent attribute. Each value of X is associated precisely with one
Y value.
Functional dependency in a database serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and contributes
to aspect normalization.
Consider an Example:
REPORT (Student#, Course#, CourseName, IName, Room#, Marks, Grade) Where:
Student#-Student Number
Course#-Course Number
CourseName -CourseName
IName- Name of the instructor who delivered the course
Room#-Room number which is assigned to respective instructor
Marks- Scored in Course Course# by student Student #
Grade –Obtained by student Student# in course Course #
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 88
Student#,Course# together (called composite attribute) defines EXACTLY
ONE value of marks .This can be symbolically represented as
Student#Course# -> Marks
REMARK: This type of dependency is called functional dependency. In above example Marks
is functionally dependent on Student#Course#.
Other function dependency in the bove example are
• Course# -> CourseName
• Course#-> IName(Assuming one course is taught by one and only one instructor )
• IName -> Room# (Assuming each instructor has his /her own and non-shared room)
• Marks ->Grade
• Formally we can define functional dependency as: In a given relation R, X and Y are
attributes. Attribute Y is functional dependent on attribute X if each value of X
determines exactly one value of Y. This is represented as X->Y, however X may be
composite in nature.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 89
UNIT 5
RELATIONAL DATABASE DESIGN
Relational Database Design: Introduction to Relational Database Design, DBMS
v/s RDBMS. Integrity rule, Concept of Concurrency Control and Database
Security.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 90
INTRODUCTION TO RELATIONAL DATABASE
DESIGN
Just as a house without a foundation will fall over, a database with poorly designed tables and
relationships will fail to meet the needs of its users. And hence, the need of a sound relational
database design originates.
The History of Relational Database Design
Dr. E. F. Codd first introduced formal relational database design in 1969 while he was at IBM.
Relational theory, which is based on set theory, applies to both databases and database
applications. Codd developed 12 rules that determine how well an application and its data adhere
to the relational model. Since Codd first conceived these 12 rules, the number of rules has
expanded into the hundreds.
Goals of Relational Database Design
The number one goal of relational database design is to, as closely as possible, develop a
database that models some real-world system. This involves breaking the real-world system into
tables and fields and determining how the tables relate to each other. Although on the surface
this task might appear to be trivial, it can be an extremely cumbersome process to translate a
real-world system into tables and fields.
A properly designed database has many benefits. The processes of adding, editing, deleting, and
retrieving table data are greatly facilitated by a properly designed database. In addition, reports
are easier to build. Most importantly, the database becomes easy to modify and maintain.
Rules of Relational Database Design
To adhere to the relational model, tables must follow certain rules. These rules determine what is
stored in tables and how the tables are related.
1. The Rules of Tables
Each table in a system must store data about a single entity. An entity usually represents a real-
life object or event. Examples of objects are customers, employees, and inventory items.
Examples of events include orders, appointments, and doctor visits.
2. The Rules of Uniqueness and Keys
Tables are composed of rows and columns. To adhere to the relational model, each table must
contain a unique identifier. Without a unique identifier, it becomes programmatically impossible
to uniquely address a row. You guarantee uniqueness in a table by designating a primary key,
which is a single column or a set of columns that uniquely identifies a row in a table.
Each column or set of columns in a table that contains unique values is considered a candidate
key. One candidate key becomes the primary key. The remaining candidate keys become
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 91
alternate keys. A primary key made up of one column is considered a simple key. A primary key
comprising multiple columns is considered a composite key.It is generally a good idea to pick a
primary key that is
Minimal (has as few columns as possible)
Stable (rarely changes)
Simple (is familiar to the user)
Following these rules greatly improves the performance and maintainability of your database
application, particularly if you are dealing with large volumes of data.
3. The Rules of Foreign Keys and Domains
A foreign key in one table is the field that relates to the primary key in a second table. For
example, the CustomerID is the primary key in the Customers table. It is the foreign key in
the Orders table.A domain is a pool of values from which columns are drawn. A simple
example of a domain is the specific data range of employee hire dates. In the case of the
Orders table, the domain of the CustomerID column is the range of values for the
CustomerID in the Customers table.
4. Normalization and Normal Forms Some of the most difficult decisions that you face as a developer are what tables to create and
what fields to place in each table, as well as how to relate the tables that you create.
Normalization is the process of applying a series of rules to ensure that your database achieves
optimal structure. Normal forms are a progression of these rules. Each successive normal form
achieves a better database design than the previous form did. Although there are several levels of
normal forms, it is generally sufficient to apply only the first three levels of normal forms.
5. Denormalization—Purposely Violating the Rules Although the developer's goal is normalization, often it makes sense to deviate from normal
forms. We refer to this process as denormalization. The primary reason for applying
denormalization is to enhance performance.If you decide to denormalize, document your
decision. Make sure that you make the necessary application adjustments to ensure that you
properly maintain the denormalized fields. Finally, test to ensure that the denormalization
process actually improves performance.
6. Integrity Rules Although integrity rules are not part of normal forms, they are definitely part of the database
design process. Integrity rules are broken into two categories. They include overall integrity rules
and database-specific integrity rules.
7. Database-Specific Rules The other set of rules applied to a database are not applicable to all databases but are, instead,
dictated by business rules that apply to a specific application. Database-specific rules are as
important as overall integrity rules. They ensure that only valid data is entered into a database.
An example of a database-specific integrity rule is that the delivery date for an order must fall
after the order date.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 92
(Also, see Codd’s 12 rules)
Examining the Types of Relationships
Three types of relationships can exist between tables in a database: one-to-many, one-to-one, and
many-to-many. Setting up the proper type of relationship between two tables in your database is
imperative. The right type of relationship between two tables ensures
Data integrity
Optimal performance
Ease of use in designing system objects
The reasons behind these benefits are covered throughout this chapter. Before you can
understand the benefits of relationships, though, you must understand the types of relationships
available.
One-to-Many
A one-to-many relationship is by far the most common type of relationship. In a one-to-many
relationship, a record in one table can have many related records in another table. A common
example is a relationship set up between a Customers table and an Orders table. For each
customer in the Customers table, you want to have more than one order in the Orders table.
On the other hand, each order in the Orders table can belong to only one customer. The
Customers table is on the one side of the relationship, and the Orders table is on the many
side. For you to implement this relationship, the field joining the two tables on the one side of the
relationship must be unique.
One-to-One
In a one-to-one relationship, each record in the table on the one side of the relationship can have
only one matching record in the table on the many side of the relationship. This relationship is
not common and is used only in special circumstances. Usually, if you have set up a one-to-one
relationship, you should have combined the fields from both tables into one table.
Many-to-Many
In a many-to-many relationship, records in both tables have matching records in the other table.
An example is an Orders table and a Products table. Each order probably will contain
multiple products, and each product is found on many different orders. The solution is to create a
third table called OrderDetails. You relate the OrderDetails table to the Orders table
in a one-to-many relationship based on the OrderID field. You relate it to the Products table
in a one-to-many relationship based on the ProductID field.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 93
DBMS VS RDBMS
History of DBMS and RDBMS
Database management systems first appeared on the scene in 1960 as computers began to grow
in power and speed. In the middle of 1960, there were several commercial applications in the
market that were capable of producing ―navigational‖ databases. These navigational databases
maintained records that could only be processed sequentially, which required a lot of computer
resources and time.
Relational database management systems were first suggested by Edgar Codd in the 1970s.
Because navigational databases could not be ―searched‖, Edgar Codd suggested another model
that could be followed to construct a database. This was the relational model that allowed users
to ―search‖ it for data. It included the integration of the navigational model, along with a tabular
and hierarchical model.
Difference between DBMS and RDBMS:- DBMS:
A DBMS is a storage area that persist the data in files. To perform the database
operations, the file should be in use.
Relationship can be established between 2 files.
There are limitations to store records in a single database file depending upon the
database manager used.
DBMS allows the relations to be established between 2 files.
Data is stored in flat files with metadata.
DBMS does not support client / server architecture.
DBMS does not follow normalization. Only single user can access the data.
DBMS does not impose integrity constraints.
ACID properties of database must be implemented by the user or the developer.
DBMS is used for simpler applications.
Small sets of data can be managed by DBMS.
RDBMS:-
RDBMS stores the data in tabular form.
It has additional condition for supporting tabular structure or data that enforces
relationships among tables.
RDBMS supports client/server architecture.
RDBMS follows normalization.
RDBMS allows simultaneous access of users to data tables.
RDBMS imposes integrity constraints.
ACID properties of the database are defined in the integrity constraints.
RDBMS is used for more complex applications.
RDBMS solution is required by large sets of data.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 94
INTEGRITY RULE
Data integrity refers to maintaining and assuring the accuracy and consistency of data over its
entire life-cycle and is a critical aspect to the design, implementation and usage of any system
which stores, processes, or retrieves data.
Data integrity is the opposite of data corruption, which is a form of data loss. The overall intent
of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a
database correctly rejecting mutually exclusive possibilities,) and upon later retrieval, ensure the
data is the same as it was when it was originally recorded. In short, data integrity aims to prevent
unintentional changes to information. Data integrity is not to be confused with data security, the
discipline of protecting data from unauthorized parties.
Any unintended changes to data as the result of a storage, retrieval or processing operation,
including malicious intent, unexpected hardware failure, and human error, is failure of data
integrity. If the changes are the result of unauthorized access, it may also be a failure of data
security.
TYPES OF INTEGRITY RULES/CONSTRAINTS
Data integrity is normally enforced in a database system by a series of integrity constraints or
rules. Three types of integrity constraints are an inherent part of the relational data model: entity
integrity, referential integrity and domain integrity:
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule
which states that every table must have a primary key and that the column or columns
chosen to be the primary key should be unique and not null.
Referential integrity concerns the concept of a foreign key. The referential integrity rule
states that any foreign-key value can only be in one of two states. The usual state of
affairs is that the foreign key value refers to a primary key value of some table in the
database. Occasionally, and this will depend on the rules of the data owner, a foreign-key
value can be null. In this case we are explicitly saying that either there is no relationship
between the objects represented in the database or that this relationship is unknown.
Domain integrity specifies that all columns in relational database must be declared upon
a defined domain. The primary unit of data in the relational data model is the data item.
Such data items are said to be non-decomposable or atomic. A domain is a set of values
of the same type. Domains are therefore pools of values from which actual values
appearing in the columns of a table are drawn.
User-defined integrity refers to a set of rules specified by a user, which do not belong to
the entity, domain and referential integrity categories.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 95
If a database supports these features it is the responsibility of the database to insure data integrity
as well as the consistency model for the data storage and retrieval. If a database does not support
these features it is the responsibility of the applications to ensure data integrity while the
database supports the consistency model for the data storage and retrieval.
Having a single, well-controlled, and well-defined data-integrity system increases
stability (one centralized system performs all data integrity operations)
performance (all data integrity operations are performed in the same tier as the
consistency model)
re-usability (all applications benefit0 from a single centralized data integrity system)
Maintainability (one centralized system for all data integrity administration).
Many companies, and indeed many database systems themselves, offer products and services to
migrate out-dated and legacy systems to modern databases to provide these data-integrity
features. This offers organizations substantial savings in time, money, and resources because
they do not have to develop per-application data-integrity systems that must be re-factored each
time business requirements change.
Example
An example of a data-integrity mechanism is the parent-and-child relationship of related records.
If a parent record owns one or more related child records all of the referential integrity processes
are handled by the database itself, which automatically insures the accuracy and integrity of the
data so that no child record can exist without a parent (also called being orphaned) and that no
parent loses their child records. It also ensures that no parent record can be deleted while the
parent record owns any child records. All of this is handled at the database level and does not
require coding integrity checks into each applications.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 96
CONCEPT OF CONCURRENCY CONTROL
Definition
Concurrency control is a database management systems (DBMS) concept that is used to address
conflicts with the simultaneous accessing or altering of data that can occur with a multi-user
system. Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous
transactions while preserving data integrity. The Concurrency is about to control the multi-user
access of Database.
Illustrative Example
To illustrate the concept of concurrency control, consider two travelers who go to electronic
kiosks at the same time to purchase a train ticket to the same destination on the same train.
There's only one seat left in the coach, but without concurrency control, it's possible that both
travelers will end up purchasing a ticket for that one seat. However, with concurrency control,
the database wouldn't allow this to happen. Both travellers would still be able to access the train
seating database, but concurrency control would preserve data accuracy and allow only one
traveler to purchase the seat.
This example also illustrates the importance of addressing this issue in a multi-user database.
Obviously, one could quickly run into problems with the inaccurate data that can result from
several transactions occurring simultaneously and writing over each other. The following section
provides strategies for implementing concurrency control.
Database transaction and the ACID rules
The concept of a database transaction (or atomic transaction) has evolved in order to enable
both a well understood database system behavior in a faulty environment where crashes can
happen any time, and recovery from a crash to a well understood database state. A database
transaction is a unit of work, typically encapsulating a number of operations over a database
(e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in
database and also other systems. Each transaction has well defined boundaries in terms of which
program/code executions are included in that transaction (determined by the transaction's
programmer via special transaction commands). Every database transaction obeys the following
rules (by support in the database system; i.e., a database system is designed to guarantee them for
the transactions it runs):
Atomicity - Either the effects of all or none of its operations remain ("all or nothing"
semantics) when a transaction is completed (committed or aborted respectively). In other
words, to the outside world a committed transaction appears (by its effects on the
database) to be indivisible (atomic), and an aborted transaction does not affect the
database at all, as if never happened.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 97
Consistency - Every transaction must leave the database in a consistent (correct) state,
i.e., maintain the predetermined integrity rules of the database (constraints upon and
among the database's objects). A transaction must transform a database from one
consistent state to another consistent state (however, it is the responsibility of the
transaction's programmer to make sure that the transaction itself is correct, i.e., performs
correctly what it intends to perform (from the application's point of view) while the
predefined integrity rules are enforced by the DBMS). Thus since a database can be
normally changed only by transactions, all the database's states are consistent.
Isolation - Transactions cannot interfere with each other (as an end result of their
executions). Moreover, usually (depending on concurrency control method) the effects of
an incomplete transaction are not even visible to another transaction. Providing isolation
is the main goal of concurrency control.
Durability - Effects of successful (committed) transactions must persist through crashes
(typically by recording the transaction's effects and its commit event in a non-volatile
memory).
Why is concurrency control needed?
If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction
concurrency exists. However, if concurrent transactions with interleaving operations are allowed
in an uncontrolled manner, some unexpected, undesirable result may occur, such as:
1. The lost update problem: A second transaction writes a second value of a data-item
(datum) on top of a first value written by a first concurrent transaction, and the first value
is lost to other transactions running concurrently which need, by their precedence, to read
the first value. The transactions that have read the wrong value end with incorrect results.
2. The dirty read problem: Transactions read a value written by a transaction that has been
later aborted. This value disappears from the database upon abort, and should not have
been read by any transaction ("dirty read"). The reading transactions end with incorrect
results.
3. The incorrect summary problem: While one transaction takes a summary over the values
of all the instances of a repeated data-item, a second transaction updates some instances
of that data-item. The resulting summary does not reflect a correct result for any (usually
needed for correctness) precedence order between the two transactions (if one is executed
before the other), but rather some random result, depending on the timing of the updates,
and whether certain update results have been included in the summary or not.
Most high-performance transactional systems need to run transactions concurrently to meet their
performance requirements. Thus, without concurrency control such systems can neither provide
correct results nor maintain their databases consistent.
Concurrency Control Locking Strategies
Pessimistic Locking: This concurrency control strategy involves keeping an entity in a database
locked the entire time it exists in the database's memory. This limits or prevents users from
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 98
altering the data entity that is locked. There are two types of locks that fall under the category of
pessimistic locking: write lock and read lock.
With write lock, everyone but the holder of the lock is prevented from reading, updating, or
deleting the entity. With read lock, other users can read the entity, but no one except for the lock
holder can update or delete it.
Optimistic Locking: This strategy can be used when instances of simultaneous transactions, or
collisions, are expected to be infrequent. In contrast with pessimistic locking, optimistic locking
doesn't try to prevent the collisions from occurring. Instead, it aims to detect these collisions and
resolve them on the chance occasions when they occur.
Pessimistic locking provides a guarantee that database changes are made safely. However, it
becomes less viable as the number of simultaneous users or the number of entities involved in a
transaction increase because the potential for having to wait for a lock to release will increase.
Optimistic locking can alleviate the problem of waiting for locks to release, but then users have
the potential to experience collisions when attempting to update the database.
Lock Problems:
Deadlock:
When dealing with locks two problems can arise, the first of which being deadlock. Deadlock
refers to a particular situation where two or more processes are each waiting for another to
release a resource, or more than two processes are waiting for resources in a circular chain.
Deadlock is a common problem in multiprocessing where many processes share a specific type
of mutually exclusive resource. Some computers, usually those intended for the time-sharing
and/or real-time markets, are often equipped with a hardware lock, or hard lock, which
guarantees exclusive access to processes, forcing serialization. Deadlocks are particularly
disconcerting because there is no general solution to avoid them.
A fitting analogy of the deadlock problem could be a situation like when you go to unlock your
car door and your passenger pulls the handle at the exact same time, leaving the door still locked.
If you have ever been in a situation where the passenger is impatient and keeps trying to open the
door, it can be very frustrating. Basically you can get stuck in an endless cycle, and since both
actions cannot be satisfied, deadlock occurs.
Livelock:
Livelock is a special case of resource starvation. A livelock is similar to a deadlock, except that
the states of the processes involved constantly change with regard to one another wile never
progressing. The general definition only states that a specific process is not progressing. For
example, the system keeps selecting the same transaction for rollback causing the transaction to
never finish executing. Another livelock situation can come about when the system is deciding
which transaction gets a lock and which waits in a conflict situation.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 99
An illustration of livelock occurs when numerous people arrive at a four way stop, and are not
quite sure who should proceed next. If no one makes a solid decision to go, and all the cars just
keep creeping into the intersection afraid that someone else will possibly hit them, then a kind of
livelock can happen.
Basic Timestamping:
Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method
doesn‘t use locks to control concurrency, so it is impossible for deadlock to occur. According to
this method a unique timestamp is assigned to each transaction, usually showing when it was
started. This effectively allows an age to be assigned to transactions and an order to be assigned.
Data items have both a read-timestamp and a write-timestamp. These timestamps are updated
each time the data item is read or updated respectively.
Problems arise in this system when a transaction tries to read a data item which has been written
by a younger transaction. This is called a late read. This means that the data item has changed
since the initial transaction start time and the solution is to roll back the timestamp and acquire a
new one. Another problem occurs when a transaction tries to write a data item which has been
read by a younger transaction. This is called a late write. This means that the data item has been
read by another transaction since the start time of the transaction that is altering it. The solution
for this problem is the same as for the late read problem. The timestamp must be rolled back and
a new one acquired.
Adhering to the rules of the basic timestamping process allows the transactions to be serialized
and a chronological schedule of transactions can then be created. Timestamping may not be
practical in the case of larger databases with high levels of transactions. A large amount of
storage space would have to be dedicated to storing the timestamps in these cases.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 100
DATABASE SECURITY
"Secret Passwords, iron bolts, gated driveways, access cards, etc. - layers of physical security in
the real world are also found in the database world as well …. Creating and enforcing security
procedures helps to protect what is rapidly becoming the most important corporate asset:
DATA."
Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the
database systems, the database servers and the associated network links) against compromises of
their confidentiality, integrity and availability. The three main objectives of database security
are:
1. Secrecy / confidentiality: Information is not disclosed to unauthorized users. Private
remains private.
2. Integrity: Ensuring data are accurate; and data must be protected from unauthorized
modification/destruction (only authorized users can modify data)
3. Availability: Ensuring data is accessible whenever needed by the organization.
(Authorized users should not be denied access)
In order to achieve these objectives, following are employed:
1. A clear and consistent security policy. (about security measures to be enforced; What
data is to be protected, and which users get access to which portion of data)
2. Security mechanisms of underlying DBMS & OS; also external mechanisms, as securing
access to buildings. i.e. Security measures at various levels, must be taken to ensure
proper security.
Authorization and Authentication are the two A‘s of security, that every secure system must be
good at.
The Sources of External Security Threats are:
1. Physical threats: This includes physical threat to the Hardware of the database system.
And they may occur due to danger in: buildings; network; due to human errors (eg.
privileged accounts left logged in)
2. Hackers & Crackers:
white hat hackers: "good guys", hired to fix/test systems; don't release information
about system vulnerability to public until fixed.
Script kiddies: hacker "wannabes"; little programming skills and rely on tools written by
others.
black hat hackers: hackers who are motivated by greed or a desire to cause harm; most
dangerous; very knowledgeable and their activities are often undetectable.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 101
Cyber-terrorists: hackers motivated by political, religious or philosophical agenda. They
may try to deface websites that support opposing positions. Current global climate fears
that they may even attempt to disable networks that handles utilities such as nuclear
plants and water system.
3. Types of Attacks:
Denial of Service (DoS) attack: A denial-of-service (DoS) or distributed denial-of-service
(DDoS) attack is an attempt to make a machine or network resource unavailable to its intended
users. Although the means to carry out, the motives for, and targets of a DoS attack vary, it
generally consists of efforts to temporarily or indefinitely interrupt or suspend services of a host
connected to the Internet. As clarification, distributed denial-of-service attacks are sent by two or
more persons, or bots, and denial-of-service attacks are sent by one person or system.
Buffer Overflow: There is a loophole in the programming error in system. A very popular
example: SQL injection.
A buffer overflow occurs when data written to a buffer also corrupts data values in memory
addresses adjacent to the destination buffer due to insufficient bounds checking. This can occur
when copying data from one buffer to another without first checking that the data fits within the
destination buffer.
Malware: Malware, short for malicious software, is any software used to disrupt computer
operation, gather sensitive information, or gain access to private computer systems. It can appear
in the form of executable code, scripts, active content, and other software. 'Malware' is a general
term used to refer to a variety of forms of hostile or intrusive software.
Social Engineering: The psychological manipulation of people into performing actions or
divulging confidential information. A type of confidence trick for the purpose of information
gathering, fraud, or system access, it differs from a traditional "con" in that it is often one of
many steps in a more complex fraud scheme.
Brute forces: A cryptanalytic attack that can, in theory, be used against any encrypted data.
(except for data encrypted in an information-theoretically secure manner). Such an attack might
be used when it is not possible to take advantage of other weaknesses in an encryption system (if
any exist) that would make the task easier. It consists of systematically checking all possible
keys or passwords until the correct one is found. In the worst case, this would involve traversing
the entire search space.
Now, as we have seen the sources of external security threats, let us study the Sources of
Internal Security Threats.
There may be employees threats: either intentional or accidental.
Intentional Employee threat:
personnel who employ hacking techniques to upgrade their legitimate access to root or
administrator.
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 102
personnel who take advantage of legitimate access to divulge trade secrets, steal money,
personal / political gain.
family members of employees who are visiting office & have been given access.
personnel who break into secure machine room to gain physical access to mainfram&
other large-system consoles.
former employees, seeking revenge.
Unintentional / Accidental Employee threat:
becoming victim to social engineering attack (unknowingly helping a hacker)
unknowingly revealing confidential information
physical damage (accidental) leading to data loss
inaccurate / improper usage
Other threats:
electrical power fluctuations
hardware failures
Natural disasters: fires, flood.
Now, knowing the sources of both external and internal source of security threats, let us move to
the solutions. They are also both external and internal.
Some External solutions to the security issues are:
1. Securing the perimeter: Firewall
2. Handling Malware
3. fixing buffer overflows
4. Physical server security:
security cameras; smart locks; removal of signs from machine/server room or hallways
(so that no one can locate sensitive hardware rooms easily); privileged accounts must
never be left logged in.
5. User Authentication:
Positive User Identification requires 3 things:
a) something the user knows: user IDs and passwords
b) something the user has: physical login devices, eg. for $5, PayPal sends small device
that generates 1 time password.
c) something the user is: biometrics
6. VPNs:
provides encryption for data transmissions over the Internet; uses IPSec protocol.
7. Combating Social Engineering
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 103
8. Handling other employee threats:
policies; employee training sessions; when employee is fired, its account is properly
erased, etc.
Some Internal solutions to the security threats are:
1. Internal database User-IDs & passwords
2. To provides control of access rights to tables, views and their components:
Types of Access Rights: The typical SQL-based DBMS provides 6 types of access
rights: SELECT: to retrieve, INSERT, UPDATE, DELETE, REFERENCES: to reference
table via a foreign key, and ALL PRIVILEGES.
3. Using an authorization matrix: a set of roles that are required for a business user. It is a
normal spreadsheet document with list of roles. Further, it also contains the list of
transaction in every role. When a new user joins the organization, he can find out the
roles for which access is required based on the FUG (Functional User Group) in the
authorization matrix.
4. Database Implementations (Data dictionary): A data dictionary is one tool organizations
can use to help ensure data accuracy.
GRANTING & REVOKING ACCESS RIGHTS:
Granting and revoking access-rights is the one of the most visible security feature of DBMS.
Using corresponding commands permissions to various objects of the database can be granted or
revoked. The following SQL commands can be used to grant and revoke access rights of a table
or a view to user(s).
Granting Rights:
Syntax:
GRANT type_of_rights ON table_or_view_name TO user_id
Examples:
GRANT SELECT ON order_summary TO acctg_mgr
GRANT SELECT ON order_summary TO acctg_mgr WITH GRANT
OPTION
(now user can also grant / pass rights to others)
GRANT SELECT, UPDATE (retail_price, distributor_name) ON
item TO intern1, intern2, intern3
GRANT SELECT ON order_summary TO PUBLIC
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 104
Revoking Rights:
Syntax:
REVOKE type_of_rights ON table_or_view_name FROM user_id
Examples:
the examples are similar to those of Granting rights
if rights have been passed by the user, i.e. the user has already granted rights to others, then:
REVOKE SELECT ON order_summary FROM acctg_mgr RESTRICT
(if rights would have been passed, it will not revoke)
REVOKE SELECT ON order_summary FROM acctg_mgr CASCADE
(if rights would have been passed, it will revoke rights from all users to those rights have
been passed)
Dev Sanskriti Vishwavidyalaya,Haridwar,UK | www.dsvv.ac.in 105
End.