Chapter 5

36
© 2011 Pearson Education, Inc. Publishing as © 2011 Pearson Education, Inc. Publishing as Prentice Hall Prentice Hall 1 Chapter 5: Chapter 5: Physical Database Physical Database Design and Performance Design and Performance Modern Database Management Modern Database Management 10 10 th th Edition Edition Jeffrey A. Hoffer, V. Ramesh, Jeffrey A. Hoffer, V. Ramesh, Heikki Topi Heikki Topi

description

DBMS Ch5

Transcript of Chapter 5

Page 1: Chapter 5

© 2011 Pearson Education, Inc.  Publishing as © 2011 Pearson Education, Inc.  Publishing as Prentice HallPrentice Hall 11

Chapter 5:Chapter 5:Physical Database Design Physical Database Design

and Performanceand PerformanceModern Database Modern Database

ManagementManagement1010thth Edition Edition

Jeffrey A. Hoffer, V. Ramesh, Jeffrey A. Hoffer, V. Ramesh, Heikki TopiHeikki Topi

Page 2: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 22

ObjectivesObjectives Define termsDefine terms Describe the physical database design Describe the physical database design

processprocess Choose storage formats for attributesChoose storage formats for attributes Select appropriate file organizationsSelect appropriate file organizations Describe three types of file organizationDescribe three types of file organization Describe indexes and their appropriate useDescribe indexes and their appropriate use Translate a database model into efficient Translate a database model into efficient

structuresstructures Know when and how to use denormalizationKnow when and how to use denormalization

Page 3: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 33

Physical Database DesignPhysical Database Design Purpose–translate the logical Purpose–translate the logical

description of data into the description of data into the technical technical specificationsspecifications for storing and for storing and retrieving dataretrieving data

Goal–create a design for storing data Goal–create a design for storing data that will provide that will provide adequate adequate performanceperformance and insure and insure database database integrityintegrity, , securitysecurity, and , and recoverabilityrecoverability

Page 4: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 44

Physical Design ProcessPhysical Design Process

Normalized relations

Volume estimates

Attribute definitions

Response time expectations

Data security needs

Backup/recovery needs

Integrity expectations

DBMS technology used

Inputs

Attribute data types

Physical record descriptions (doesn’t always match logical design)

File organizations

Indexes and database architectures

Query optimization

Leads to

Decisions

Page 5: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 55

Figure 5-1 Composite usage map (Pine Valley Furniture Company)

Page 6: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 66

Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)

Data volumes

Page 7: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall77

Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)

Access Frequencies (per hour)

Page 8: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 88

Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)

Usage analysis:14,000 purchased parts accessed per hour 8000 quotations accessed from these 140 purchased part accesses 7000 suppliers accessed from these 8000 quotation accesses

Page 9: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 99

Figure 5-1 Composite usage map (Pine Valley Furniture Company) (cont.)

Usage analysis:7500 suppliers accessed per hour 4000 quotations accessed from these 7500 supplier accesses 4000 purchased parts accessed from these 4000 quotation accesses

Page 10: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1010

Designing FieldsDesigning Fields Field: smallest unit of data in Field: smallest unit of data in

databasedatabase Field design Field design

Choosing data typeChoosing data type Coding, compression, encryptionCoding, compression, encryption Controlling data integrityControlling data integrity

Page 11: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1111

Choosing Data TypesChoosing Data Types CHAR–fixed-length characterCHAR–fixed-length character VARCHAR2–variable-length character (memo)VARCHAR2–variable-length character (memo) LONG–large numberLONG–large number NUMBER–positive/negative numberNUMBER–positive/negative number INEGER–positive/negative whole numberINEGER–positive/negative whole number DATE–actual dateDATE–actual date BLOB–binary large object (good for graphics, BLOB–binary large object (good for graphics,

sound clips, etc.)sound clips, etc.)

Page 12: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1212

Figure 5-2 Example of a code look-up table(Pine Valley Furniture Company)

Code saves space, but costs an additional lookup to obtain actual value

Page 13: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1313

Field Data IntegrityField Data Integrity Default value–assumed value if no Default value–assumed value if no

explicit valueexplicit value Range control–allowable value limitations Range control–allowable value limitations

(constraints or validation rules)(constraints or validation rules) Null value control–allowing or prohibiting Null value control–allowing or prohibiting

empty fieldsempty fields Referential integrity–range control (and Referential integrity–range control (and

null value allowances) for foreign-key to null value allowances) for foreign-key to primary-key match-upsprimary-key match-ups

Sarbanes-Oxley Act (SOX) legislates importance of financial data integrity

Page 14: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1414

Handling Missing DataHandling Missing Data Substitute an estimate of the missing Substitute an estimate of the missing

value (e.g., using a formula)value (e.g., using a formula) Construct a report listing missing valuesConstruct a report listing missing values In programs, ignore missing data unless In programs, ignore missing data unless

the value is significant (sensitivity the value is significant (sensitivity testing)testing)

Triggers can be used to perform these operations

Page 15: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1515

Physical RecordsPhysical Records Physical Record: A group of fields Physical Record: A group of fields

stored in adjacent memory locations stored in adjacent memory locations and retrieved together as a unitand retrieved together as a unit

Page: The amount of data read or Page: The amount of data read or written in one I/O operationwritten in one I/O operation

Blocking Factor: The number of Blocking Factor: The number of physical records per pagephysical records per page

Page 16: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1616

DenormalizationDenormalization Transforming Transforming normalizednormalized relations into relations into non-non-

normalizednormalized physical record specifications physical record specifications Benefits:Benefits:

Can improve performance (speed) by reducing number of table Can improve performance (speed) by reducing number of table lookups (i.e. lookups (i.e. reduce number of necessary join queriesreduce number of necessary join queries))

Costs (due to data duplication)Costs (due to data duplication) Wasted storage spaceWasted storage space Data integrity/consistency threatsData integrity/consistency threats

Common denormalization opportunitiesCommon denormalization opportunities One-to-one relationship (Fig. 5-3)One-to-one relationship (Fig. 5-3) Many-to-many relationship with non-key attributes (associative Many-to-many relationship with non-key attributes (associative

entity) (Fig. 5-4)entity) (Fig. 5-4) Reference data (1:N relationship where 1-side has data not used Reference data (1:N relationship where 1-side has data not used

in any other relationship) (Fig. 5-5)in any other relationship) (Fig. 5-5)

Page 17: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1717

Figure 5-3 A possible denormalization situation: two entities with one-to-one relationship

Page 18: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1818

Figure 5-4 A possible denormalization situation: a many-to-many relationship with nonkey attributes

Extra table access required

Null description possible

Page 19: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 1919

Figure 5-5A possible denormalization situation:reference data

Extra table access required

Data duplication

Page 20: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2020

PartitioningPartitioning Horizontal Partitioning: Distributing the rows of Horizontal Partitioning: Distributing the rows of

a table into several separate filesa table into several separate files Useful for situations where different users need Useful for situations where different users need

access to different rowsaccess to different rows Three types: Key Range Partitioning, Hash Three types: Key Range Partitioning, Hash

Partitioning, or Composite PartitioningPartitioning, or Composite Partitioning Vertical Partitioning: Distributing the columns Vertical Partitioning: Distributing the columns

of a table into several separate relationsof a table into several separate relations Useful for situations where different users need Useful for situations where different users need

access to different columnsaccess to different columns The primary key must be repeated in each fileThe primary key must be repeated in each file

Combinations of Horizontal and VerticalCombinations of Horizontal and Vertical

Partitions often correspond with User Schemas (user views)

Page 21: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2121

Partitioning (cont.)Partitioning (cont.) Advantages of Partitioning:Advantages of Partitioning:

Efficiency: Records used together are grouped togetherEfficiency: Records used together are grouped together Local optimization: Each partition can be optimized for Local optimization: Each partition can be optimized for

performanceperformance Security: data not relevant to users are segregatedSecurity: data not relevant to users are segregated Recovery and uptime: smaller files take less time to back Recovery and uptime: smaller files take less time to back

upup Load balancing: Partitions stored on different disks, reduces Load balancing: Partitions stored on different disks, reduces

contentioncontention Disadvantages of Partitioning:Disadvantages of Partitioning:

Inconsistent access speed: Slow retrievals across partitionsInconsistent access speed: Slow retrievals across partitions Complexity: Non-transparent partitioningComplexity: Non-transparent partitioning Extra space or update time: Duplicate data; access from Extra space or update time: Duplicate data; access from

multiple partitionsmultiple partitions

Page 22: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2222

Oracle 11g Horizontal Oracle 11g Horizontal Partitioning MethodsPartitioning Methods

Range partitioningRange partitioning Partitions defined by range of field valuesPartitions defined by range of field values Could result in unbalanced distribution of rowsCould result in unbalanced distribution of rows Like-valued fields share partitionsLike-valued fields share partitions

Hash partitioningHash partitioning Partitions defined via hash functionsPartitions defined via hash functions Will guarantee balanced distribution of rowsWill guarantee balanced distribution of rows Partition could contain widely varying valued fieldsPartition could contain widely varying valued fields

List partitioningList partitioning Based on predefined lists of values for the partitioning keyBased on predefined lists of values for the partitioning key

Composite partitioningComposite partitioning Combination of the other approachesCombination of the other approaches

Page 23: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2323

Designing Physical FilesDesigning Physical Files Physical File: Physical File:

A named portion of secondary memory allocated A named portion of secondary memory allocated for the purpose of storing physical recordsfor the purpose of storing physical records

Tablespace–named set of disk storage elements Tablespace–named set of disk storage elements in which physical files for database tables can be in which physical files for database tables can be storedstored

Extent–contiguous section of disk spaceExtent–contiguous section of disk space Constructs to link two pieces of data:Constructs to link two pieces of data:

Sequential storageSequential storage Pointers–field of data that can be used to locate Pointers–field of data that can be used to locate

related fields or recordsrelated fields or records

Page 24: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2424

Figure 5-6 DBMS terminology in an Oracle 11g environment

Page 25: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2525

File OrganizationsFile Organizations Technique for physically arranging records of a file Technique for physically arranging records of a file

on secondary storageon secondary storage Factors for selecting file organization:Factors for selecting file organization:

Fast data retrieval and throughputFast data retrieval and throughput Efficient storage space utilizationEfficient storage space utilization Protection from failure and data lossProtection from failure and data loss Minimizing need for reorganizationMinimizing need for reorganization Accommodating growthAccommodating growth Security from unauthorized useSecurity from unauthorized use

Types of file organizationsTypes of file organizations SequentialSequential IndexedIndexed HashedHashed

Page 26: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2626

Figure 5-7a Sequential file organization

If not sortedAverage time to find desired record = n/2

1

2

n

Records of the file are stored in sequence by the primary key field values

If sorted – every insert or delete requires resort

Page 27: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2727

Indexed File OrganizationsIndexed File Organizations Indexed File Organization: the storage of Indexed File Organization: the storage of

records either sequentially or nonsequentially records either sequentially or nonsequentially with an index that allows software to locate with an index that allows software to locate individual recordsindividual records

Index: a table or other data structure used to Index: a table or other data structure used to determine in a file the location of records determine in a file the location of records that satisfy some conditionthat satisfy some condition

Primary keys are automatically indexedPrimary keys are automatically indexed Other fields or combinations of fields can also Other fields or combinations of fields can also

be indexed; these are called secondary keys be indexed; these are called secondary keys (or nonunique keys)(or nonunique keys)

Page 28: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2828

Figure 5-7b Indexed file organization

uses a tree searchAverage time to find desired record = depth of the tree

Page 29: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 2929

Figure 5-7cHashed file organization

Hash algorithmUsually uses division-remainder to determine record position. Records with same position are grouped in lists.

Page 30: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3030

Figure 6-8 Join Indexes–speeds up join operations

a) Join index for common non-key columns

a) Join index for matching foreign key (FK) and primary key (PK)

Page 31: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3131

Page 32: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3232

Clustering FilesClustering Files In some relational DBMSs, related records In some relational DBMSs, related records

from different tables can be stored together from different tables can be stored together in the same disk areain the same disk area

Useful for improving performance of join Useful for improving performance of join operationsoperations

Primary key records of the main table are Primary key records of the main table are stored adjacent to associated foreign key stored adjacent to associated foreign key records of the dependent tablerecords of the dependent table

e.g. Oracle has a CREATE CLUSTER e.g. Oracle has a CREATE CLUSTER commandcommand

Page 33: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3333

Rules for Using IndexesRules for Using Indexes1.1. Use on larger tablesUse on larger tables2.2. Index the primary key of each tableIndex the primary key of each table3.3. Index search fields (fields frequently Index search fields (fields frequently

in WHERE clause)in WHERE clause)4.4. Fields in SQL ORDER BY and GROUP Fields in SQL ORDER BY and GROUP

BY commandsBY commands5.5. When there are >100 values but When there are >100 values but

not when there are <30 valuesnot when there are <30 values

Page 34: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3434

Rules for Using Indexes Rules for Using Indexes (cont.)(cont.)

6.6. Avoid use of indexes for fields with long Avoid use of indexes for fields with long values; perhaps compress values firstvalues; perhaps compress values first

7.7. If key to index is used to determine location If key to index is used to determine location of record, use surrogate (like sequence nbr) of record, use surrogate (like sequence nbr) to allow even spread in storage areato allow even spread in storage area

8.8. DBMS may have limit on number of indexes DBMS may have limit on number of indexes per table and number of bytes per indexed per table and number of bytes per indexed field(s)field(s)

9.9. Be careful of indexing attributes with null Be careful of indexing attributes with null values; many DBMSs will not recognize null values; many DBMSs will not recognize null values in an index searchvalues in an index search

Page 35: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3535

Query OptimizationQuery Optimization Parallel query processing–possible when Parallel query processing–possible when

working in multiprocessor systemsworking in multiprocessor systems Overriding automatic query optimization–Overriding automatic query optimization–

allows for query writers to preempt the allows for query writers to preempt the automated optimizationautomated optimization

Picking data block size–factors to consider Picking data block size–factors to consider include:include: Block contention, random and sequential row Block contention, random and sequential row

access speed, row sizeaccess speed, row size Balancing I/O across disk controllersBalancing I/O across disk controllers

Page 36: Chapter 5

Chapter 5 © 2011 Pearson Education, Inc.  Publishing as Prentice Hall© 2011 Pearson Education, Inc.  Publishing as Prentice Hall 3636

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic,

mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America.

Copyright © 2011 Pearson Education, Inc.  Publishing as Prentice Copyright © 2011 Pearson Education, Inc.  Publishing as Prentice HallHall