Chapter 2. Concepts & arch of DB sys

29
Chapter 2. Concepts & arch of DB sys 1. Introduction processing & analysis of spatial data – becoming increasingly dependent on the use of DBMS rather than conventional GIS → WHY ? 1. cost effective sys security, DB integrity, backup & recovery, data replication 2. close integration w/ mainstream business computing environments 2. DBs & DB sys 2.1 DB terminology problem space : business functions the DB is designed to address data model : describing the problem space in a comprehensible way DB schema : description of the DB including its tables & the relations among them DB engine (or DB server): a collection of programs that manipulate the data in DB data dictionary : describes the contents of the DB DB integrity rules : rules to be enforced to protect the data in the DB stored procedures : blocks of SQL code for defining, managing & querying data in DB user interface : DB front-end by which users access & interact w/ DB middleware : communication SW tools that support data transmission & data processing over networks DBMS : composed of DB, DB engine, user interface, application programs, middleware

Transcript of Chapter 2. Concepts & arch of DB sys

Chapter 2. Concepts & arch of DB sys 1. Introduction processing & analysis of spatial data – becoming increasingly dependent on the use of DBMS rather than conventional GIS → WHY ? 1. cost effective sys security, DB integrity, backup & recovery, data replication 2. close integration w/ mainstream business computing environments

2. DBs & DB sys

2.1 DB terminology problem space : business functions the DB is designed to address

data model : describing the problem space in a comprehensible way

DB schema : description of the DB including its tables & the relations among them

DB engine (or DB server): a collection of programs that manipulate the data in DB

data dictionary : describes the contents of the DB

DB integrity rules : rules to be enforced to protect the data in the DB

stored procedures : blocks of SQL code for defining, managing & querying data in DB

user interface : DB front-end by which users access & interact w/ DB

middleware : communication SW tools that support data transmission & data processing over networks

DBMS : composed of DB, DB engine, user interface, application programs, middleware

SoftwareDeveloper

DatabaseAdministrator

Database Management System(DBMS)

Development Tools andApplication Programs

End User

User Interface andMiddleware

DatabaseEngine /Server

Database

Data Files DataDictionary

Integrity RulesStored

Procedures

Data Model

DatabaseSchema

Problem Space

Data FilesData FilesData FilesData Files

Figure 2-1. Database terminology

Chapter 2. Concepts & arch of DB sys 2.2 Computer data organization & DB In early days – data were organized as independent data files – file processing systems development of modern DB

stores data + characteristics of data + integrity rules

separate data from applications – independence from business functions

provide essential tools for info resources or asset management in business, education, government

Zoning DataFiles

Buildidng PermitData Files

Assessment Data Files

ApplicationProgram in PL/1

ApplicationProgram in PL/1

ApplicationProgram in

COBOL

ApplicationProgram in

COBOL

(a) File-based data processing

Database Management System(DBMS)

Database

Data Files

IntegrityRules Stored

Procedures

DatabaseEngine /Server

UserInterface

UserInterface

(b) Integration of data files and processing procedures in a DBMS

ZoningReport

ZoningReport

BuildingPermit

BuildingPermit

AssessmentNotification

AssessmentNotification

PropertyTax Invoice

PropertyTax Invoice

UserInterface

OtherApplicationsOther

ApplicationsOtherApplications

Figure 2-2. File & DB processing

Chapter 2. Concepts & arch of DB sys 2-3 Classification of DB systems

Chapter 2. Concepts & arch of DB sys 3. DB operations 3.1 DB storage & manipulation DB - large volumes of data ex. Banking & retailing : terabytes, RS : several petabytes/year

thus, DB system usually stores data in secondary storage devices

(cf. central processing unit (CPU) – primary storage)

even bigger is tertiary storage devices : terabyte storage capacities

transmission of data – by disk blocks : storage locations of 4,000-16,000 bytes of data

not by data files

DB systems control the storage of and access to data by means of the DB engine

: 2 data manipulation SW in DB engine

a. buffer manager : handles the main memory by allocating the data read from secondary storage to

a specific page

b. file manager : keeps track of the location of data files & their relationships w/ disk blocks

Chapter 2. Concepts & arch of DB sys 3.2 DB security & integrity constraints designed to protect the data in DB from being corrupted, compromised, destroyed

security : all rules & measures that are designed to protect the DB against unauthorized use, modification,

destruction of contents

discretionary security : controls the ability of users to access specific data files ex. read-only read-write

mandatory security : classifies users & data into different security levels - access control

integrity : to protect the value of the data by safeguarding their accuracy, correctness, validity

enforced by applying certain rules – called business rules

form an integral part of the DB schema

3 integrity constraints – domain c. : specify the types of data values ex. Numeric, character, Boolean

key & relationship c. : govern the use of entities as primary, secondary, foreign

keys

semantic integrity c. : written rules stating what is allowed & not allowed in

data structure & data management

Chapter 2. Concepts & arch of DB sys 3.3 DB query query : question / task a user asks of a DB handled by a DB tool – called query manager : turn users SQL into a sequence of operations

query manager – perform query optimization : answer the query in the most efficient way

use an index created for the DB query is made up of one / more operators

ex. relational DB model – 8 operators

: SELECT : lists all of the row / those which match specific criteria

PROJECT : generate a subset of columns from a table, removing duplicate values from the result

JOIN : combine one row of a table w/ rows from another table, use columns relationships

PRODUCT : concatenate every row in one table w/ every row in another table

UNION : generate a new table by appending rows from one table w/ those of another table

INTERSECT : generate a new table consisting of all rows appearing in both of two tables

DIFFERENCE : generate a new table consisting of all rows that appear in the 1st table but not in

the 2nd of two tables

DIVIDE : create a new table consisting of all values of one column of the binary table that match,

in the other column all values in the unary table

K x y1 A 322 B 743 C 56

R

K x y1 A 322 B 743 C 56

S

SELECT all

T xB

SELECT x WHERE K = 2

SELECT x, y WHERE K = 3 x yC 56

U

(a) The SELECT relational operator

K x y1 A 322 B 743 C 32

R y3274

SPROJECT y

(b) The PROJECT relational operator

K x y1 A 322 B 743 C 56

R K z 1 273 749 88

S

K x y K z 1 A 32 1 273 C 56 3 74

T

K x y z 1 A 32 273 C 56 74

U

R JOIN S Equi-JOIN

Natural JOIN

(c) The JOIN relational operator

K x 1 A 2 B 3 C

RRK Rx SK Sz 1 A1 A2 B 2 B 3 C 3 C

T1 273 741 273 741 273 74

K z 1 273 74

SR TIME S

(d) The PRODUCT relational operator

Figure 2-3. Relational operators

(e) The UNION relational operator

(g) The DIFFERENCE relational operator

(h) The DIVIDE relational operator

R UNION S

K x y1 A 322 B 743 C 56

R K x y8 X 311 A 327 C 62

SK x y1 A 322 B 743 C 568 X 317 C 62

U

K x y1 A 322 B 743 C 56

R K x y8 X 311 A 327 C 62

SK x y1 A 32

I

(f) The INTERSECT relational operator

R INTERSECT S

K x y1 A 322 B 743 C 56

R

K x y1 A 322 B 743 C 56

R

K x y8 X 311 A 327 C 62

SR DIFFERENCE S

S DIFFERENCE R

K x y2 B 743 C 56

D

K x yE8 X 317 C 62

K x y8 X 311 A 327 C 62

S

vXY

K58

5 X 5 Y 5 Z 6 Q 6 W 7 M 7 N 8 G 8 X 8 Y 8 V

K v

S DIVIDE R

R

S Q

Figure 2-3. Relational operators

Chapter 2. Concepts & arch of DB sys 3.4 DB transactions transactions : changing values in DB

transaction design principles

: atomicity : transaction can never be completed only partially

consistency preservation : data remain in a consistent state as specified by the DB schema,

constraints, integrity rules

isolation : transactions to be independent of each other

durability / permanency : after transaction complete, its results can always be traced

transaction is controlled by the transaction manager of the DB engine

4 transaction control mechanisms

: concurrent control : locks the data items involved in the transaction

logging the transactions : keeps track of all changes made to the DB in a redo log

transaction commitment : prevents any changes to the DB unless the transaction is ready to complete

rollback : allows the DB to undo an incomplete transaction process

Database

Main Memory

Redo LogBuffer

Shared Memory

Database Buffer Cache

Redo Log

Control Files

Retrieve data

Lock

Update

Save updateProcess update

Write to log

Record update

Commit

Committed

UnlockServerComputer

2

48

3

6 5

7

2Lock

Unlock

Backup

Retrieve data

9

1Client Computer

Database Engine

1

10

10

Figure 2-4. Steps in a DB transaction

Chapter 2. Concepts & arch of DB sys 3.5 DB backup & recovery to restore the DB in a catastrophic failure such as a disk crash copy periodically the entire DB & the transaction log on to an external storage medium 3.6 DB replication & synchronization to support business needs in distributed DB systems : improve system performance, DB availability process of making a copy of a DB onto one / more additional computers located at different sites

Chapter 2. Concepts & arch of DB sys 3.7 Structured query language (SQL) standard language for querying & managing DB

non-procedural computer language – does not have IF, FOR, WHILE, GOTO, CASE

DB sub-language of about 200 words

SQL statements can be used in 5 ways

: interactive processing thru a command-line user interface

embeded in a high-level computer language

using a call level interface(CLI)

using 2 standard Java application programming interface(API) protocols : Java DB connectivity(JDBC)

embeded SQL for Java(SQLJ)

in a stand alone application program modules in the form of stored procedures, function, packages

used in typical DB operations

: DB query, data definition, data manipulation, DB connection & access control, data sharing, data

integrity

Figure 2-5. Examples of DB operations using SQL

SELECT parcel_id, areaFROM lu_2002WHERE lu_code = 'agr'

CREATE TABLE lu_2002 (parcel_id VARCHAR2(8) PRIMARY KEY, lu_code VARCHAR2(3) NOT NULL, area NUM(8,2) NOT NULL, survey_date DATE DEFAULT SYSDATE)

INSERT INTO lu_codeVALUES ('12322', 'res', 4500.00, '06-may-2002')

CONNECT system/passwordxxxCREATE ROLE db_maintenanceGRANT INSERT, UPDATE ON lu_2002 TO db_maintenanceGRANT db_maintenance TO john.young

CREATE TRIGGER copy_dataAFTER INSERT ON lu_2002FOR EACH ROWBEGIN INSERT INTO [email protected] VALUES (:new.parcel_id, :new.lu_code, :new.area, :new.survey_date);END;/

CREATE TABLE lu_2002 (parcel_id ARCHAR2(8) PRIMARY KEY, lu_code VARCHAR2(3) NOT NULL, area NUM(8,2) NOT NULL, survey_date DATE DEFAULT SYSDATE,CONSTRAINT UNIQUE (parcel_id),CONSTRAINT area_chk CHECK (area > 0.00)).

SQL Function / Example Explanation

(a) Data retrieval These SQL commands retrieve the identification numbers and areas of parcels whose land use code is 'agr' (agricultural) from data table lu_2002

(b) Data definition The SQL command CREATE TABLE specifies the structure and data types of the data table lu_2002. In the example, VARCHARS, NUM and DATE specify the data types of the four columns of the table; the numbers in the brackets specify the number of characters and digits string and numeric data types respectively. The value of survey_date will be set to the system date automatically if no date value is supplied duirng data entry. Note how the constraints PRIMARY KEY and NOT NULL are specified.

(c) Data manipulation These SQL statements add data into the table lu_2002. Character strings are put inside quotation marks but numerical values are not. Since a date value is given, the default system date as specified in the data structure in the previous example will be not used.

(d) Database connection and access control These SQL commands connect to the system, create a database maintenance role, and give this role to a user called john.young

This sequence of SQL commands copy newly input data to a data table at a remote site

(e) Data sharing

(f) Data integrity This example shows how data input constraints are imposed to ensure that the parcel_id is a unique number for each land parcel and that the area is not a negative value

Chapter 2. Concepts & arch of DB sys 4. HW & SW arch 4.1 Centralized & distributed DB arch 1) Centralized DB arch

early generations of DB systems

all processing is done in a single computer, all data are stored in the same secondary memory 2) Distributed DB arch

since 1990s, most DB systems today

include several components

: computers, OS, DB system SW, network cards & SW, communication network protocols(ex. TCP/IP),

DB engine, transaction processor(manager)

allow physical separation of processes & data

allow partitioning of a particular process / data file into smaller units

it has a number of transparency features

: distribution transparency : access any DB w/o knowing where & how the data are stored

performance transparency : behave as if it were a single centralized DB system

transaction transparency : update the DB at different sites

heterogeneity transparency : integration of different DB systems under a common schema

Figure 2-6. Configurations of a typical centralized DB system

Figure 2-7. Configuration of a typical distributed DB system

Chapter 2. Concepts & arch of DB sys 4.2 Client/server computing client – requests services, server – provides services

can be thin / fat client / server

two-tier client / server arch : one server one client

three-tier client / server arch : extension of the two-tier

client : used for interaction w/ the DB

application server : application programs are stored & executed

DB server : used for storage & retrieve of data

web-based DB systems are three-tier

Query and Display

Application Processing

Database Management

Query and Display

Application Processing

Application Processing

Database Management

Query and Display

Application Processing

Database Management

Query and Display

Application Processing

Database Management

Database Management

(a) (b) (c) (d)

Cli ent-side Server-s ide

Function s Fun ctions

Centralised Database Systems

Distributed Database Systems

Query and Display

Application Processing

Database Management

(e)

Two-tier

ArchitectureThree-tier

Architecture

Figure 2-8. Client-server computing for DB systems

Chapter 2. Concepts & arch of DB sys 4.3 DB SW DB SW comprises different modules of application programs

DB engine : storing, retrieving, manipulating, conversion, transaction logging, memory management

SQL (or program written in SQL extensions) : managing data storage & retrieval routines

network middleware : DB connectivity middleware ex. CORBA, JDBC, ODBC, ADO

* middleware tool : made up of API on client / communication SW on client & server

Operating System (OS)

Commands and Functions

DATABASEENGINE

SQL

NETWROK MIDDLEWARE

COMPILED DATABASEAPPLICATION INTERFACES

DYNAMIC DATA ANALYSIS TOOLS

Developed with Developed with

Data Analysis RoutinesSupplied by Database

Software Vendor

Third-party Data AnalysisRoutines

Third-party Prepackaged Applications

Prepackaged DatabaseApplications Supplied by

Database SoftwareVendor

Application Development

Tools Supplied by Database Software

Vendor

Third-party ApplicationDevelopment Tools

Client-sideApplications

Figure 2-9. SW layers of a DB system

Chapter 2. Concepts & arch of DB sys 4.4 Web-based DB arch WWW has significant impact on the development of DB system

Web browser interface enables a user to access data anywhere in the world

: universal access arch is introduced based on Internet standard

web-based DB – accessible to internal & external users

three-tier client/server configuration

components : a. client computer – user submits a request by HTML-formatted page thru HTTP

b. web server – equipped w/ program called a server-side extension

= web-to-DB middleware : understand, validate, process DB queries

using CGI / API protocol

functionality of web browser can be enhanced by adding client-side extensions

: plug-ins, Java, JavaScript, ActiveX, VBScript

Server Computer

Web Server

ScriptPage

HTMLPage

Web-to-dbMiddleware

Database System

Database DatabaseEngine

12

3

4

5

67

8

HTMLPage9

TCP/IP Network

Figure 2-10. Arch of a typical web-based DB

Chapter 2. Concepts & arch of DB sys 5. Data structure 5.1 Logical data structure how data are organized in a DB for optimal performance & ease of administration ex. Oracle Optimal Flexible Arch

: logical structure

table space – data file - dbf

TBLSPC-1 TBLSPC-2

d01.dbf d22.dbf

Datafile-1 Datafile-2 Datafile-3

d45.dbf

TBLSPC-1 consists of twodatafiles Datafile-1 and Datafile-2, and the table d01.dbf isallocated to Datafile-1

TBLSPC-2 consists of onedatafile Datafile-3, whichhouses two tables d22.dbfand d45.dbf

Figure 2-11. Relationship between tablespace, data files & tables in Oracle’s optimal flexible arch(OFA) logical DB structure

Chapter 2. Concepts & arch of DB sys 5.2 Physical data structure actual organization & placement of data files in the DB dependent on the model of its DB system ex. relational – values of attributes are stored in a table in a certain data type * data types character / string data type numeric data type date data type others including BLOB & user-defined abstract data type(ADT)

Chapter 2. Concepts & arch of DB sys 5.3 DB indexing an index = an element of data structure, used to speed up access to a specific part of the DB many indexing methods have been proposed

ex. B-tree – the most commonly used form

use case ) users issue a command to index → column for a row identification + index table are created

index table : root block – branch block – leaf block structure

since all leaf are at the same depth – all retrieval require the same amount of I/O

000049999999

0000166733334999

5000666683329999

0000(2311)0001(3400)

........1501(0010)

........1666(1001)1667(5423)

1668(0000)1669(9003)

........

...............

3332(8312)3333(1121)

3334(9872)3335(7500)

........

........

........4998(0200)4999(5601)

6667(0003)6668(0055)6669(0620)

........

........8331(9999)8332(8001)

5000(0211)........

6010(6089)................

6665(9001)6666(9004)

8333(0232)8334(1345)

.......................

9998(7500)9999(2001)

0000 1668 .......... ...................0010 1501 ............. ...................0620 6669 ............. .................. ........ ............. ..........6089 6010 ............. .................. ........ ............. ..........7500 3335 ............. .................. ........ ............. ..........9999 8331 ............. .........

ROWID FID Attribute1 Attribute 2 .....

Root (Header) Block

Leaf Blocks

Branch Blocks

Data File

Index File

Figure 2-12. A B-tree index