Download - Week08 - Physical Design

8/8/2019 Week08 - Physical Design

1/24

1

Database I

Methodology

Physical Design


2/24

2

Physical Database Design

Throughout the processes of conceptual and

logical database designs and the

normalization, the primary objective has beenthe storage efficiency and the consistency of

the database

In the physical database design, however,

the focus shifts from storage efficiency to theefficiency in execution


3/24

3


(Cont.)

The physical DB design involves:

Transforms logical DB design into technical

specifications for storing and retrieving data

Does not include practically implementing the

design however tool specific decisions are

involved

The Physical design requires the followinginput

Normalized relations

Definitions of each attribute (means the purpose

or objective of the attributes)


4/24

4


(Cont.)

Descriptions of data usage (how and by whomdata will be used)

Requirements for response time, data security,

backup etc. Tool to be used

Decisions that are made during this processare: Choosing data types

Deciding file organizations

Selecting structures

Preparing strategies for efficient access


5/24

5

De-normalization

De-normalization is a technique to move from higher

to lower normal forms of database modeling in order

to speed up database access

De-normalization process is applied for deriving a

physical data model from a logical design

In logical design we group things logically related

through same primary key

In physical database design fields are grouped, as

they are stored physically and accessed by DBMS


6/24

6

De-normalization (Cont.)

We should be aware that each new RDBMS

release usually bring enhanced performance

and improved access options that mayreduce the need for De-normalization

A fully normalized database schema can fail

to provide adequate system response time

due to excessive table join operations


7/24

7


De-normalization Situation 1:

Merge two Entity types into one with one to one

relationship

Even if one of the entity type is optional, so joining

can lead to wastage of storage, however if two

accessed together very frequently their merging

might be a wise decision

So those two relations must be merged for better

performance, which have one to one relationship


8/24

8


De-normalization Situation 2:

Many to many binary relationships mapped to three

relations

Queries needing data from two participating relationsneed joining of three relations that is expensive

Join is an expensive operation from execution point of

view

Consider the many to many relationship b/w EMP,PROJ and WORK

EMP (empID, eName,pjId,Sal)

PROJ (pjId,pjName)

WORK (empId.pjId,dtHired,Sal)


9/24

9


So now if we by de-normalizing these relations

and merge the WORK relation with PROJ relation

But in this case it is violating 2NF and anomalies

of 2NF would be there

But there would be only one join operation

involved by joining two tables, which increases

the efficiency

EMP (empID, eName,pjId,Sal)

PROJ (pjId,pjName, empId,dtHired,Sal)


10/24

10


De-normalization Situation 3: In 1:M situation when the ET on side does not

participate in any other relationship, then many sideET is appended with reference data rather than theforeign key

In this case the reference table should be merged withthe main table

Consider STUDENT and HOBBY relations

One student can have one hobby and one hobby canbe adopted by many students

Here hobby can be merged with the student relation

Thus redundancy of data would be there, but therewould not be any joining of two relations, which willhave a better performance


11/24

11

Partitioning

Partitioning splits same relation into two

Aims of data partitioning in database are to

Reduce workload (e.g. data access,communication costs, search space)

Balance workload

Speed up the rate of useful work (e.g. frequently

accessed objects in main memory)

There are two types of partitioning:

Horizontal Partitioning

Vertical Partitioning


12/24

12

Partitioning (Cont.)

Horizontal Partitioning

Table is split on the basis of rows, which means a

larger table is split into smaller tables

The advantage of this is that time in accessing the

records of a larger table is much more than a

smaller table

Range Partitioning In this type of partitioning range is imposed on any

particular attribute

For Example for those students whose ID is from 1-

1000 are in partition 1 and so on


13/24

13


Hash Partitioning

A particular algorithm is applied and DBMS knows that

algorithm

So hash partitioning reduces the chances ofunbalanced partitions to a large extent

List Partitioning

In this type of partitioning the values are specified for

every partition So there is a specified list for all the partitions


14/24

14


Vertical Partitioning

Vertical partitioning is done on the basis of

attributes

Same table is split into different physical records

depending on the nature of accesses

Primary key is repeated in all vertical partitions of

a table to get the original table Consider the Student relation

STD (stId, sName, sAdr, sPhone, cgpa, prName,

school, mtMrks, mtSubs, clgName,

intMarks, intSubs, dClg, bMarks, bSubs)


15/24

15


We can partition this relation vertically as

under

STD (stId, sName, sAdr, sPhone, cgpa,prName)

STDACD (sId, school, mtMrks, mtSubs,

clgName, intMarks, intSubs,

dClg, bMarks,bSubs)


16/24

16

Data Storage Concepts

Physical Storage Media Storage media are

classified according to following characteristics:

Speed of access

Cost per unit of data

Reliability

RAID Redundant Array of Inexpensive Disks

Many disk that look as a single disk to OS but have better

performance and betterreliability RAID disk drives are used frequently on servers

RAID have the property that the data are distributed over

the drives to allow parallel operations


17/24

17

Data Storage Concepts (Cont.)

Fundamental to RAID is "striping", a methodof concatenating multiple drives into onelogical storage unit

Striping involves partitioning each drive'sstorage space into stripes which may be assmall as one sector (512 bytes) or as large asseveral megabytes

The type of application environment, I/O ordata intensive, determines whether large orsmall stripes should be used


18/24

18


RAID-0

Simple Striping

Virtual single disk is divided up into strips of ksectors each

Since no redundant information is stored,

performance is very good, but the failure of

any disk in the array results in data loss


19/24

19


1

5

9

2

6

10

3

7

11

4

8

12

Note: This example is a basic virtual drive where

each element depicted as a disk is a physical disk


20/24

20


RAID-1

RAID Level 1 provides redundancy by writing all

data to two or more drives

The performance of a level 1 array tends to be

faster on reads and slower on writes compared to

a single drive, but if either drive fails, no data is

lost

This level is commonly referred to as mirroring


21/24

21


1

2

3

1

2

3

1

2

3


22/24

22


RAID-2,3 For reliability simple parity check code is used

Parity bit is stored on separate disk

RAID-4

RAID Level 4 stripes data at a block level acrossseveral drives, with parity stored on one drive

The performance of a level 4 array is very goodfor reads (the same as level 0)

Writes, however, require that parity data beupdated each time


23/24

23


RAID-5 RAID Level 5 is similar to level 4, but distributes parity

among the drives

This can speed small writes in multiprocessing systems,since the parity disk does not become a bottleneck

RAID-0 is the fastest and most efficient array typebut offers no fault-tolerance

RAID-1 is the array of choice for performance-

critical, fault-tolerant environments RAID-2 is seldom used today since ECC is

embedded in almost all modern disk drives


24/24

24


RAID-3 can be used in data intensive or single-user

environments which access long sequential records

to speed up data transfer. However, RAID-3 does

not allow multiple I/O operations to be overlapped RAID-4 offers no advantages over RAID-5 and does

not support multiple simultaneous write operations

RAID-5 is the best choices in multi-use

environments which are not write performancesensitive. However, at least three and more typically

five drives are required for RAID-5 arrays