Introduction Database Concepts - swatitechsol.com · Introduction Database Concepts CO attained :...
Transcript of Introduction Database Concepts - swatitechsol.com · Introduction Database Concepts CO attained :...
Introduction Database Concepts
CO attained : CO1
Hours Required: 05
Self Study: 08
Prepared and presented by :
Ms. Swati Abhang
Contents
Introduction
Characteristics of databases,
File system V/s Database system,
Users of a Database system
Data Models, Schemas, and Instances,
Three-Schema Architecture and Data Independence,
Database Administrator (DBA), Role of a DBA
What is a Database System?
1-3
Application
program
End-user
DBMS
What is a DATABASE?
Definition
A database is a well organized collection of data that
are related in meaningful way which can be accessed in a
different logical orders but are stored only once. The data in
the database is therefor integrated, structured and shared.
What is a Database System? (cont.)
1-5
Major components of a database system: Data: integrated and shared.
Hardware: disk, CPU, Main Memory, ...
Software: DBMS
Users:
1. Application programmers
2. End users
3. Database administrator (DBA)
Defining external schema
Defining conceptual schema
Defining internal schema
Liaison with users
Defining security and integrity checks
Defining backup and recovery procedures
Monitoring performance and changing requirements
Why Database ?
1-6
Redundancy can be reduced
Inconsistency can be avoided
The data can be shared
Standards can be enforced
Security restrictions can be applied
Integrity can be maintained
Provision of data independence
objective !
Characteristics of databases
Persistent Data
Metadata and self-describing Nature
Data Independence
Access flexibility and Security
Characteristics of dbms
To incorporate the requirements of the organization,
system should be designed for easy maintenance.
Information systems should allow interactive access to
data to obtain new information without writing fresh
programs.
System should be designed to co-relate different data to
meet new requirements.
An independent central repository, which gives
information and meaning of available data is required.
1-8
Characteristics of dbms
Integrated database will help in understanding the inter-
relationships between data stored in different applications.
The stored data should be made available for access by
different users simultaneously.
Automatic recovery feature has to be provided to
overcome the problems with processing system failure.
1-9
1-10
Functions of the DBMS
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Security and Integrity
Data Recovery and Concurrency
Data Dictionary
Performance
History of Database Systems
1950s and early 1960s:
Data processing using magnetic tapes for storage
Tapes provided only sequential access
Punched cards for input
Late 1960s and 1970s:
Hard disks allowed direct access to data
Network and hierarchical data models in widespread use
Ted Codd defines the relational data model
Would win the ACM Turing Award for this work
IBM Research begins System R prototype
UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
History (cont.)
1980s: Research relational prototypes evolve into commercial systems
SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s: Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
Early 2000s: XML and XQuery standards
Automated database administration
Later 2000s: Giant data storage systems
Google BigTable, Yahoo PNuts, Amazon, ..
Database: Historical Roots
Manual File System
to keep track of data
used tagged file folders in a filing cabinet
organized according to expected use
e.g. file per customer
easy to create, but hard to
locate data
aggregate/summarize data
Computerized File System
to accommodate the data growth and information need
manual file system structures were duplicated in the computer
Data Processing (DP) specialists wrote customized programs to
write, delete, update data (i.e. management)
extract and present data in various formats (i.e. report)
S511 Session 2, IU-SLIS 13
File System: Example
S511 Session 2, IU-SLIS 14
Database Systems: Design, Implementation, & Management: Rob & Coronel
File System: Weakness Weakness
“Islands of data” in scattered file systems.
Problems Duplication
same data may be stored in multiple files
Inconsistency
same data may be stored by different names in different format
Rigidity
requires customized programming to implement any changes
cannot do ad-hoc queries
Implications Waste of space
Data inaccuracies
High overhead of data manipulation and maintenance
S511 Session 2, IU-SLIS 15
File System: Problem Case
S511 Session 2, IU-SLIS 16
CUSTOMER file AGENT file SALES file
A_Name (15 char)
Carol Johnson
A_Name (20 char)
Carol T. Johnson
AGENT (20 char)
Carol J. Smith
- inconsistent field name, field size - inconsistent data values - data duplication
Database System vs. File System
S511 Session 2, IU-SLIS 17
Database Systems: Design, Implementation, & Management: Rob & Coronel
Hierarchical Database
Background
Developed to manage large amount of data for complex manufacturing projects
e.g., Information Management System (IMS)
IBM-Rockwell joint venture
clustered related data together
hierarchically associated data clusters using pointers
Hierarchical Database Model
Assumes data relationships are hierarchical
One-to-Many (1:M) relationships
Each parent can have many children
Each child has only one parent
Logically represented by an upside down tree
S511 Session 2, IU-SLIS 18
Hierarchical Database: Example
S511 Session 2, IU-SLIS 19
Database Systems: Design, Implementation, & Management: Rob & Coronel
Hierarchical Database: Pros & Cons
Advantages Conceptual simplicity
groups of data could be related to each other
related data could be viewed together
Centralization of data
reduced redundancy and promoted consistency
Disadvantages Limited representation of data relationships
did not allow Many-to-Many (M:N) relations
Complex implementation
required in-depth knowledge of physical data storage
Structural Dependence
data access requires physical storage path
Lack of Standards
limited portability
S511 Session 2, IU-SLIS 20
Data Manipulation Language (DML)
Language for accessing and manipulating the data
organized by the appropriate data model
DML also known as query language
Two classes of languages
Procedural – user specifies what data is required and how to
get those data
Declarative (nonprocedural) – user specifies what data is
required without specifying how to get those data
SQL is the most widely used query language
Data Definition Language (DDL)
Specification notation for defining the database schema Example: create table account (
account_number char(10),
branch_name char(10),
balance integer)
DDL compiler generates a set of tables stored in a data dictionary
Data dictionary contains metadata (i.e., data about data) Database schema
Data storage and definition language Specifies the storage structure and access methods used
Integrity constraints Domain constraints
Referential integrity (e.g. branch_name must correspond to a valid branch in the branch table)
Authorization
Users of Database system
Database Users
Database Administrators
In a database environment, the primary resource is the
database itself and the secondary resource is the DBMS and
related software
authorizing access to the database
coordinating and monitoring its use
acquiring software and hardware resources as needed
Database Designers
identifying the data to be stored in the database
choosing appropriate structures to represent and store this
data undertaken before the database is actually implemented
and populated with data
Database Users …..
communicate with all prospective database users, in order to understand their requirements
develop a view of the database that meets the data and processing requirements for each group of users
These views are then analyzed and integrated with the views of other user groups. The final database design must be capable of supporting the requirements of all user groups
End Users
access to the database for querying, updating, and generating reports
Casual end users: occasionally access the database
need different information each time
learn only a few facilities that they may use repeatedly.
Database Users …..
use a sophisticated database query language to specify their requests
typically middle- or high-level managers or other occasional browsers
Naive or parametric end users constantly querying and updating the database, using standard types of queries and
updates called canned transactions that have been carefully programmed and
tested
need to learn very little about the facilities provided by the DBMS
Bank tellers check account balances and post withdrawals and deposits
Reservation clerks for airlines, hotels, and car rental companies check availability
for a given request and make reservations
Clerks at receiving stations for courier mail enter package identifications via bar
codes and descriptive information through buttons to update a central database of
received and in-transit packages
Database Users …..
Sophisticated end users Engineers, scientists, business analysts, and others who
thoroughly familiarize themselves with the facilities of the DBMS so as to implement their applications to meet their complex requirements
Try to learn most of the DBMS facilities in order to achieve their complex requirements
Stand-alone users Maintain personal databases by using ready-made program
packages that provide easy-to-use menu- or graphics-based interfaces. An example is the user of a tax package that stores a variety of personal financial data for tax purposes
Typically become very proficient in using a specific software package
Database Users …..
System Analysts and Application Programmers
Determine the requirements of end users, especially naive and
parametric end users, and develop specifications for canned transactions
that meet these requirements
Application programmers implement these specifications as programs;
then they test, debug, document, and maintain these canned transactions
Workers behind the Scene
Typically do not use the database for their own purposes
DBMS system designers and implementers
design and implement the DBMS modules (for implementing the catalog,
query language, interface processors, data access, concurrency control,
recovery, and security. ) and interfaces as a software package
Database Users …..
Tool developers
Tools are optional packages that are often purchased
separately
include packages for database design, performance monitoring,
natural language or graphical interfaces, prototyping, simulation,
and test data generation.
Operators and maintenance personnel
system administration personnel who are responsible for the
actual running and maintenance of the hardware and software
environment for the database system
Data Models
A collection of tools for describing Data Data relationships Data semantics Data constraints
Relational model
Entity-Relationship data model (mainly for database design)
Object-based data models (Object-oriented and Object-relational)
Semistructured data model (XML)
Other older models: Network model Hierarchical model
Relational Model
All the data is stored in various tables.
Example of tabular data in the relational model Columns
Rows
A Sample Relational Database
Instances and Schemas
Similar to types and variables in programming languages
Logical Schema – the overall logical structure of the database Example: The database consists of information about a set of customers
and accounts in a bank and the relationship between them Analogous to type information of a variable in a program
Physical schema– the overall physical structure of the database
Instance – the actual content of the database at a particular point in time Analogous to the value of a variable
Physical Data Independence – the ability to modify the physical schema without changing the logical schema Applications depend on the logical schema
In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.
Architecture of DBMS
Following are the three levels of database
architecture,
1. Physical Level
2. Conceptual Level
3. External Level
Three Levels
Architecture
of DBMS
Mapping is the process of transforming request response
between various database levels of architecture.
Mapping is not good for small database, because it takes
more time.
In External / Conceptual mapping, DBMS transforms a
request on an external schema against the conceptual
schema.
In Conceptual / Internal mapping, it is necessary to
transform the request from the conceptual to internal
levels.
1. Physical Level
Physical level describes the physical storage structure of data in database.
It is also known as Internal Level.
This level is very close to physical storage of data.
At lowest level, it is stored in the form of bits with the physical addresses on the secondary storage device.
At highest level, it can be viewed in the form of files.
The internal schema defines the various stored data types.
It uses a physical data model.
2. Conceptual Level
Conceptual level describes the structure of the whole
database for a group of users.
It is also called as the data model.
Conceptual schema is a representation of the entire
content of the database.
These schema contains all the information to build
relevant external records.
It hides the internal details of physical storage.
3. External Level
External level is related to the data which is viewed by
individual end users.
This level includes a no. of user views or external
schemas.
This level is closest to the user.
External view describes the segment of the database that
is required for a particular user group and hides the rest
of the database from that user group.
Architec
ture of
DBMS
/Databa
se
System
Internal
s
Responsibilities of DBA
Installation, configuration and upgradation of databases like Microsoft SQL/ MySQL/ Oracle Server Software.
Evaluating the features of various databases.
Establishing and maintaining sound backup and recovery policies and procedures.
Taking care of database design and implementation.
Implementing and maintaining the database security.
Database tuning, application tunning and performance monitoring.
Maintaining documentation and standards.
DBA does some technical trouble shooting and consultation to development teams.
Skill set is required to be a successful
Database Administrator
Problem Management
Incident Management
Chain Management
Capacity Planning
Types of DBA
1. Administrative DBA
2. Development DBA
3. Architect
4. Data Warehouse DBA
5. OLAP DBA
Types of DBA
1. Administrative DBA
Administrative DBA maintains the work on the server and
keeps it running.
Administrative DBA is mostly concerned with backups,
security, replication etc.
2. Development DBA
Development DBA builds queries, stored procedures etc.
which mostly meet business needs.
Development DBA is equivalent to a programmer.
Types of DBA
3. Architect
Architect builds table, design schema, foreign keys, primary keys
etc. which meets the business needs.
4. Data Warehouse DBA
Data Warehouse DBA is responsible for merging the data from
multiple sources into a data warehouse.
5. OLAP DBA
OLAP DBA builds multi-dimensional cubes for decision
support or OLAP systems.
The primary language in SQL Server is MDX.
Important Questions
1. What is a database ? Discuss its main features and explain the importance of each feature .
2. List the characteristics of databases.
3. Describe the disadvantages of using a file processing system compared to a database system ?
4. What is DBMS ? What task does a DBMS carry out ?
5. What are classes of users use database systems?
6. List three views for database system in three level architecture. Describe role of these views.
7. List the major components of DBMS architecture and describe their functions.
8. Describe role of DBA
What is meant by logical and physical structure of data ?
Explain difference between logical and physical
independence ?
Explain difference between external, internal and
conceptual view in three-tier database architecture. How
are these different schema layers related to the concept
of logical and physical data independence.