1 - Introduction

28
CS 222 Database Management System Spring 2010-11 Introduction Korra Sathya Babu Department of Computer Science NIT Rourkela CS 222 Introduction 1

description

DBMS

Transcript of 1 - Introduction

Page 1: 1 - Introduction

CS 222 Database Management System

Spring 2010-11

Introduction

Korra Sathya BabuDepartment of Computer

Science NIT Rourkela

CS 222 Introduction 1

Page 2: 1 - Introduction

Course Overview• Introduction

– Historical Perspectives, Data Independence, Architectures, Data Models, Relational Languages.

• Database Design– Functional Dependency, Decomposition, Normalization

• Query Processing– Query plan costing & optimization, join strategies,…

• Storage– Hardware architecture, File Organization, Indexing,

hashing , buffer management

Course Overview

CS 222 Introduction 2

Page 3: 1 - Introduction

• Concurrency Control– Transaction, Serializability, Concurrency Control

Mechanisms, Recovery Strategies

• Advanced Database Concepts– Database Security, OO Databases, Distributed

Databases, Real time Database

Historical Perspectives

• For Thousands of years man used to do record keeping on clay tablets, palm leaves, rock, timber, bone, ceremony, dance, music, poetry, story, etc..

CS 222 Introduction 3

Page 4: 1 - Introduction

• Mankind moved to the infancy years of computing only the new media of storage changed to electromagnetic.

• Information Storage has increased drastically

• Came the era of File processing

Historical Perspectives

• During the File processing years, total data and intelligence of the organization was resident in the program code or

CS 222 Introduction 4

Page 5: 1 - Introduction

• Collection of Individual Files were accessed by application programs

• Had its own disadvantages– Data Duplication

– Inconsistency of data

– No Integrity and security features

– Less data independence

Historical Perspectives• First General purpose DBMS (IDS) developed by Charles

Backman ( Turing Award in 1973) at GE in early 1960s

– Used Network Model

• CODASYL formed (1959)

CS 222 Introduction 5

Page 6: 1 - Introduction

– to guide standard programming language and promote analysis, design and implementation of Data Systems

– Charles Backman in CODASYL found DBTG responsible for development of COBOL

• IMS by IBM in late 1960s

– Used Hierarchical Model

• E.F.Codd proposed Relational Model (1970) in San Josh Research Lab– IBM refused to implement to preserve revenue from IMS/DB

– System R Project started at IBM due to the pressure from customers

– Managers working on the project misunderstood Codd’s idea used a non-relational language (SEQUEL (Structured English Query Language)) instad of Alpha language proposed by Codd

– Turing Award in 1981 in the area of DBs for the work on Relational Model

Historical Perspectives

CS 222 Introduction 6

Page 7: 1 - Introduction

• SQL/DB (Structured Query Language/Data System) was IBM’s first commercial implementation for its mainframe DB computers

• The relational model had strong mathematical foundation

– Twelve Rules were proposed by Codd to be called as Relational Model

– Majority of available today are Pseudo-relational

• SQL was standardized in 1980’s – Current Standard (SQL:99) was

adopted by ANSI and ISO

• James Gray (Turing Award in 1999) contributed on Transaction Management

• In 80’s and 90’s many advances

– Rich Data Models, powerful query languages, XML, data warehouses

– Emergence of ERP and MRP (Management Resource Planning) packages

CS 222 Introduction 7

Page 8: 1 - Introduction

DBMS Today• DBMS continue to gain importance as more and more data

are brought online and company adopt technologies for their day to day operations

• The field is driven by exiting visions such as

– Interactive Video,

– Streaming Data,

– Digital Libraries,

– Scientific projects like Human Genome Mapping

– NASA’s earth observation system

– Consolidate databases for Mining and decision making processes

• Got a long way to go forward

CS 222 Introduction 8

Page 9: 1 - Introduction

DBMS

• Collection of inter-related data is called Database

• A software that manages the collection of inter-related data is called DBMS

Advantages of DBMS

• Data Independence• Efficient Data Access

CS 222 Introduction 9

Page 10: 1 - Introduction

• Data Integrity and Security• Data Administration• Concurrent Access and Crash

Recovery• Reduced Application Development

TimeLevels of Abstraction (ANSI/SPARC)

CS 222 Introduction 10

Data Independence is attained

Page 11: 1 - Introduction

Conceptual Schema• The first Schema to be considered• Also called Logical Schema• It describes the data in terms of the data model

Students(sid: string, name: string, login: string,age: integer, gpa: real)Faculty(d: string, fname: string, sal: real) Courses(cid: string, cname: string, credits: integer)Rooms(rno: integer, address: string, capacity: integer)Enrolled(sid: string, cid: string, grade: string)Teaches(d: string, cid: string)Meets In(cid: string, rno: integer, time: string)

Physical Schema• Describes how a record (ex. customer) is stored

• Specified additional storage details.

CS 222 Introduction 11

Data Independence is attained

Page 12: 1 - Introduction

• Example– Store all relations as unsorted files of records. A file in a

DBMS is a collection of records– Create indexes on the first column of the Students,

Faculty, and Courses relations

External Schema• It allow data access to be customized (and authorized) at

the level of individual users or groups of users.

• Any given database has exactly one conceptual schema and one physical, but it may have several external schemas, each tailored to a particular group of users.

• Each external schema consists of a collection of one or more views and relations from the conceptual schema.

• A view is conceptually a relation, but the records in a view are not stored in the DBMS.

CS 222 Introduction 12

Page 13: 1 - Introduction

• Example, we might want to allow students to find out the names of faculty members teaching courses, as well as course enrollments. This can be done by doing the following view:– Courseinfo(cid: string, fname: string, enrollment: integer)

Data Independence

• Physical Independence– Ability to modify physical schema

without changing the logical schema

• Logical Independence

Data Models

CS 222 Introduction 13

Page 14: 1 - Introduction

• Hierarchical Model• Network Model• Relational Model• E-R Model• Object Oriented Model• Object Relational Model

CS 222 Introduction 14

Page 15: 1 - Introduction

User Interfaces to Databases

• User Interfaces– Forms & Menus– Reports– Graphical user interfaces

• Lots of Tools Available– Native to Product (eg Oracle,

Microsoft)

CS 222 Introduction 15

Page 16: 1 - Introduction

– also Independent Vendors (Powerbuilder)

– no standards

Storage Devices

• Main memory– volatile, lost on power failure– expensive and relatively small

• Hard disk– non-volatile, reasonably fast access– relatively cheap, and large

CS 222 Introduction 16

Page 17: 1 - Introduction

– main storage system for databases– Mean time to Failure: ~5 years

DATABASE SERVERS

• Major players– Oracle, IBM DB2, Microsoft SQL

Server, Informix, Sybase, Ingress

• Wide range of performance, features, and price

Database Application Classes

CS 222 Introduction 17

Page 18: 1 - Introduction

• OLTP: Online Transaction Processing– supports many small transactions

• Decision Support– Summaries/aggregates– OLAP: Online Analytical Processing

Database Architectures

• Centralized– Dumb terminals connected to single server

• Client Server– Smarter client machines connect to server

CS 222 Introduction 18

Page 19: 1 - Introduction

– Main work still done at server

• Parallel Servers– Work divided between multiple CPUs

• Distributed– Multiple independent databases in cooperation

CLIENT - SERVER TERMINOLOGY

• Service: Provided by the Server– Each Client Is a Consumer

• Shared Resources : Managed by Server

CS 222 Introduction 19

Page 20: 1 - Introduction

• Client : Initiator of a Request

Client vs Server Functionality

• Server Functions– Wait for Requests and process them

when they come– Handle Concurrent Transactions– Authentication, Authorization – Audit trails

• Client Functions

CS 222 Introduction 20

Page 21: 1 - Introduction

– Provide User Interface– Support Graphics, Multimedia

CS 222 Introduction 21

Page 22: 1 - Introduction

TWO TIER PARTITIONING Distributed Remote Distributed Remote Data Distributed

Data

Page 23: 1 - Introduction

CS 222 Introduction 24

Three Tier Applications

• Tier One : Client– e.g., web browser

• Tier Two : Application Server– e.g. enhanced web server

• Tier Three: Database Server

CS 222 Introduction 25