Physical Database Design DeSiaMorePowered by DeSiaMore 1.
-
Upload
homer-powers -
Category
Documents
-
view
215 -
download
0
description
Transcript of Physical Database Design DeSiaMorePowered by DeSiaMore 1.
Physical Database Design
DeSiaMore Powered by DeSiaMore 1
Lecture Objectives
Overview of Physical Database Design Process.
Describing volume and usage analysis. Exploring the designing of fields. Designing of physical records and
denomalization.
DeSiaMore Powered by DeSiaMore 2
What is Physical Database Design Physical database design involves taking
the results from the logical design process and fine-tuning them against the usage, performance and storage requirements of some applications.
Logical database design is about implementation independence.
Physical database design is about implementation dependence.
DeSiaMore Powered by DeSiaMore 3
Introduction
The purpose of physical database design is to translate the logical description of data into the technical specifications for storing and retrieving data
The goal is to create a design for storing data that will provide adequate performance, and insure database integrity, security, and recoverability
DeSiaMore Powered by DeSiaMore 4
Inputs to Physical Design Normalized relations Attribute definitions Estimations of data processing volume Descriptions of where and when data are entered,
retrieved, deleted, and updated Response time expectations/requirements Requirements for data security, backup, recovery,
retention, and integrity Characteristics of the DBMS to be used
DeSiaMore Powered by DeSiaMore 5
What is Physical Database Design
The following activities are part of physical database design.Volume and Usage Analysis Integrity analysisControl Security AnalysisData Distribution Analysis.
DeSiaMore Powered by DeSiaMore 6
Volume Analysis
It is the first step to be taken to move from logical to physical design.
It aims at establishing estimates of the possible number of instances per entity.
This is useful because it estimates how many instances are most likely to be stored the system on average.
DeSiaMore Powered by DeSiaMore 7
Volume Analysis
The table below summarises sizing estimates for the student database.
DeSiaMore Powered by DeSiaMore 8
Volume Analysis
Data volumes reflect number of records in tables.
Access frequencies reflect number of table record accesses per unit of time
Note what attributes are used in table accesses(to aid design of table indexes)
DeSiaMore Powered by DeSiaMore 9
Usage Analysis
Usage analysis requires that we identify the major transactions required for a database system.
Transactions considered here consists of series of insertions, updates, retrieavals, or a mixture of all fours.
DeSiaMore Powered by DeSiaMore 10
A sample of Transactions
Below are simple transactions common to College DatabaseRegister new studentsAdd new coursesAssign a lecturer to a course
DeSiaMore Powered by DeSiaMore 11
Group Exercise
Given particular supermarket database design, you are required to draw on various transactions that can be done. The logical design of the database consists of products, customer, and supplier tables.
DeSiaMore Powered by DeSiaMore 12
Physical Design Decisions Specify the data type for each attribute from the
logical data model Specify physical records by grouping attributes
from the logical data model Specify the file organization technique to use for
physical storage of data records Specify indexes to optimize data retrieval Specify query optimization strategies
DeSiaMore Powered by DeSiaMore 13
Designing Fields
Field: smallest unit of data in database
Field design Choosing data typeCoding, compression, encryptionControlling data integrity
DeSiaMore Powered by DeSiaMore 14
Choosing Data Types CHAR–fixed-length character VARCHAR2–variable-length character (memo) LONG–large number NUMBER–positive/negative number INTEGER–positive/negative whole number DATE–actual date BLOB–binary large object (good for graphics,
sound clips, etc.)
DeSiaMore Powered by DeSiaMore 15
Designing Fields Choosing the field data type:
Select from available types such as: text, memo, number, date/time, currency, etc.
Seek to: Minimize storage space
e.g., Integer vs. Floating Point Represent all possible values
e.g., Floating Point vs. Integer Improve data integrity (more on next slide)
e.g., Yes/No Support all data manipulations
e.g., Date/TimeDeSiaMore Powered by DeSiaMore 16
Designing Fields Controlling data integrity
Default value e.g., value “FL” for State field
Range control e.g., value “<=100” for Test_Score field
Null value control e.g., prohibit leaving Date_of_Birth field blank
Referential integrity e.g., restrict valid values for Part_No field in Order table to
the contents of this field in the Part table
DeSiaMore Powered by DeSiaMore 17
Designing Fields Fixed-Length Fields:
Make it easy to locate a specific record in a file and/or a specific field in that record
Each field has its maximum length specified and unused space in any given field is padded with spaces (text) or leading zeros (numeric)
Variable-Length Fields:When the need arises for a variable-length field
(e.g., a memo field), this field can be stored separate from the rest of the record with a pointer used to locate it when neededDeSiaMore Powered by DeSiaMore 18
Physical Records Physical Record: “A group of fields stored in
adjacent memory locations and retrieved together as a unit.”
Page: “The amount of data read or written in one secondary memory (disk) input or output operation.”
Blocking Factor: “The number of physical records per page.”
DeSiaMore Powered by DeSiaMore 19
Database Access Model
The goal in structuring physical records is to minimize performance bottlenecks resulting from disk accesses (accessing data from disk is slow compared to main
memory)DeSiaMore Powered by DeSiaMore 20
Optimization Decisions Denormalization Partitioning Selection of File Organization Creation of Indexes
DeSiaMore Powered by DeSiaMore 21
Denormalisation
The main problem with a fully normalised database is that it has many tables.
To perform useful queries such tables have to be reconstituted via expensive join operations.
Updates frequently have to be performed across more than one table.
DeSiaMore Powered by DeSiaMore 22
Denormalisation
One obvious way of improving retrieval or update performance is to go back from a fully normalized database and introduce some controlled redundancy.
DeSiaMore Powered by DeSiaMore 23
Definition
“The process of transforming normalized relations into unnormalized physical record specifications [for the purpose of improving overall database performance].” or
DeSiaMore Powered by DeSiaMore 24
Definition
Denormalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.
You may apply Denormalization in the process of deriving a physical data model from a logical form.
DeSiaMore Powered by DeSiaMore 25
Example
Four examples of strict violations of normalization are shown in the model of schema below:ORDER (Order No, Customer No, Customer
Name, Customer Address, Order Date)ORDER LINE (Order No, Line No, Customer
No, Customer Name, Customer Address, Product Code, Unit Count, Unit Price, Total Price, Required By Date)
DeSiaMore Powered by DeSiaMore 26
Example
From the schema above It can be assumed that Customer Name
and Customer Address have been copied from a Customer table with primary key Customer No .
Customer No has been copied from the Order table to the Order Line table.
DeSiaMore Powered by DeSiaMore 27
Example
It can be assumed that Unit Price has been copied from a Product table with primary key Product Code .
Total Price can be calculated by multiplying Unit Price by Unit Count .
DeSiaMore Powered by DeSiaMore 28
Example……Benefits
Changes such as this are intended to offer performance benefits for some transactions.
For example, a query on the Order Line table that also requires the Customer No does not have to also access the Order table.
DeSiaMore Powered by DeSiaMore 29
Example……Benefits
However, there is a down side: each such additional column must be carefully controlled. It should not be able to be updated directly by
users. It must be updated automatically by the
application (e.g., via a DBMS trigger).
DeSiaMore Powered by DeSiaMore 30
Partitioning Horizontal Partitioning: Distributing the rows of
a table into two or more separate filese.g., Customer table is partitioned into four
separate files, one for each geographical region
Vertical Partitioning: Distributing the columns of a table into two or more separate filese.g., Employee table is partitioned into public file
(name, office, extension, etc.) and private file (salary, health history, etc.)
Note: the primary key is repeated in each fileDeSiaMore Powered by DeSiaMore 31
Partitioning Advantages of Partitioning:
Records used together are grouped together Each partition can be optimized for performance Security and recovery Partitions stored on different disks: less contention Parallel processing capability
Disadvantages of Partitioning: Slower retrievals when across partitions Complexity for application programmers Anomalies and extra storage space requirements
due to duplication of data across partitionsDeSiaMore Powered by DeSiaMore 32
Physical Files Physical File: A file as stored on disk Constructs to link two pieces of data:
Sequential storage Pointers
File Organization: How the files are arranged on the disk.
Access Method: How the data can be retrieved based on the file organization Relative - data accessed as an offset from the most
recently referenced point in secondary memory Direct - data accessed as a result of a calculation to
generate the beginning address of a recordDeSiaMore Powered by DeSiaMore 33
File Organizations “A technique for physically arranging the
records of a file on secondary storage devices.” Goals in selecting: (trade-offs exist, of course)
Fast data retrievalHigh throughput for input and maintenanceEfficient use of storage spaceProtection from failures or data lossMinimal need for reorganizationAccommodation for growthSecurity from unauthorized use
DeSiaMore Powered by DeSiaMore 34
File Organizations
Sequential Indexed
Indexed SequentialIndexed Nonsequential
DeSiaMore Powered by DeSiaMore 35
Sequential File Organization
Records of the file are stored in sequence by the primary key field values
DeSiaMore Powered by DeSiaMore 36
Sequential Retrieval Consider a file of 10,000 records each occupying
1 page Queries that require processing all records will
require 10,000 accesses e.g., Find all items of type 'E' Many disk accesses are wasted if few records
meet the condition However, very effective if most or all records will
be accessed (e.g., payroll)
DeSiaMore Powered by DeSiaMore 37
Indexed File Organization
Index concept is like index in a book Indexed-sequential file organization: The
records are stored sequentially by primary key values and there is an index built on the primary key field (and possibly indexes built on other fields, also)
DeSiaMore Powered by DeSiaMore 38
Indexing An index is a table file that is used to determine
the location of rows in another file that satisfy some condition
DeSiaMore Powered by DeSiaMore 39
Querying with an Index Read the index into memory Search the index to find records meeting the
condition Access only those records containing required
data Disk accesses are substantially reduced when
the query involves few records
DeSiaMore Powered by DeSiaMore 40
Maintaining an Index Adding a record requires at least two disk
accesses:Update the fileUpdate the index
Trade-off: Faster queries Slower maintenance (additions, deletions, and updates
of records)Thus, more static databases benefit more overall
DeSiaMore Powered by DeSiaMore 41
Rules of Thumbfor Using Indexes
1. Indexes are most useful on larger tables2. Index the primary key of each table
(may be automatic, as in Access)
3. Indexes are useful on search fields (WHERE)4. Indexes are also useful on fields used for
sorting (ORDER BY) and categorizing (GROUP BY)
5. Most useful to index on a field when there are many different values for that field
DeSiaMore Powered by DeSiaMore 42
Rules of Thumbfor Using Indexes
6. Find out the limits placed on indexing by your DBMS (Access allows 32 indexes per table, and no index may contain more than 10 fields)
7. Depending on the DBMS, null values may not be referenced from an index (thus, rows with a null value in the field that is indexed may not be found by a search using the index)
DeSiaMore Powered by DeSiaMore 43
Group Exercise
Consider a college database consisting of three tables, Student, Lecture, and Course. Denormalize your tables so that you increase the performance of the following query:
Give all students who take database development course lectured by Bajuna
DeSiaMore Powered by DeSiaMore 44
Next Topic
Client/Server and Middleware
DeSiaMore Powered by DeSiaMore 45