Physical Database Design CIT 381 - alternate keys - named constraints - indexes.

Physical Database Design

CIT 381

-alternate keys-named constraints-indexes

Constraints

We have seen primary key constraints and not null constraints. We can name the constraint:

CREATE TABLE Student (STUD_NUM integer,STUD_FNAME CHAR(10),STUD_LNAME CHAR(20),STUD_ADDRESS CHAR(30),STUD_DEPT_ID INTEGER,CONSTRAINT stud_pk

PRIMARY KEY(STUD_NUM),CONSTRAINT stud_ln

NOT NULL (STUD_LNAME))

Why name constraints?For easier control:

DROP CONSTRAINT stud_ln;

- easy to remove a constraint withoutrebuilding table

SET CONSTRAINTS stud_pk DEFERRED;

- this says do not enforce constraint untiltransaction is complete (Informix)

UNIQUEA way to specify alternate keys. Let’s add such a

constraint to the Student table - say the student name

forms (another, or candidate) key.

ALTER TABLE Student ADD CONSTRAINT stud_name_keyUNIQUE (STUD_FNAME, STUD_LNAME);

Also Foreign Key ConstraintALTER TABLE StudentADD CONSTRAINT stud_fk1FOREIGN KEY STUD_DEPT_IDREFERENCES Department (DEPT_ID);

Of course, these constraints can be declared when

the table is created (or added in the Relationship View

of Access).

Naming the constraint is optional.

Physical Design

There are four main aspects to physical design:

ER Model to Relational Model mapping

Denormalization Indexing Physical storage issues (such as fragmentation)

Relational Mapping Here we convert entity-relationship diagrams to relations (=tables)

Entities become tables Relationships become foreign keys, except,…

Many-to-many (non-specific) relationships become tables

Data types get set, depending on chosen DBMS (MySQL, Oracle, Access, etc.)

DenormalizationFrom ER Studio user guide

Denormalization is an unavoidable part of designing databases. No matter how elegant a logical design can appear on paper, it often breaks down in practice because of the complex or expensive queries required to make it work. Sometimes, the best remedy for the performance problems is to depart from the blueprint, the logical design. Indeed, denormalization is perhaps the most important reason for separating logical and physical designs - you need not compromise your blueprint while still addressing real-world performance problems.

Indexing

An index is a data structure associated with a table

allowing faster look-up access to that table.

- Usually they are a B-tree - Others: hash table (common), R-tree (not common)

- Note: in DB-speak, the plural of index is indexes,

not the usual indices.

Creating an indexCREATE INDEX stud_idx1ON Student (STUD_NUM);

This will create an index on the primary key.Usually this is done by default.

If you expect queries to look at that field in descending order, consider

CREATE INDEX stud_idx1ON Student (STUD_NUM DESC);

Secondary Indexes

If we expect many queries on the student last name

CREATE INDEX stud_idx2ON Student (STUD_LNAME);

… or if we have many queries on the (lastName, firstName) pair

CREATE INDEX stud_idx3ON Student (STUD_LNAME, STUD_FNAME);

If we did not have the UNIQUE constraint, we could have enforced it through the index:

CREATE UNIQUE INDEX stud_idx3ON Student (STUD_LNAME, STUD_FNAME);

B TreesThe most common indexing structure, using a tree structure:

- each node is set to be a disk block- hence smaller search keys increase fan-out

Root

17 24 30

2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

Use of Indexes• Speed up many sorts of queries

•Assist in computation of join operations

• Used in sorting a table (for ORDER BY or GROUP BY)

• Downsides: table updates now become slow- an insertion into a table requires insertionof search key into each of its indexes

• Indexes can use a lot of space - often more than the table

From ER Studio user guide

“One purpose of indexes is to improve performance by providing a more efficient mechanism for locating data. Indexes work like a card catalog in a library: instead of searching every shelf for a book, you can find a reference to the book in the card catalog, which directs you to the book’s specific location. Logical indexes store pointers to data so that a search of all of the underlying data is not necessary. Indexes are one of the most important mechanisms for improving query performance.”

“However, injudiciously using indexes can negatively affect performance. You must determine the optimal number of indexes to place on a table, the index type and their placement in order to maximize query efficiency.”

Index Number (from guide)

“While indexes can improve read (query) performance, they slow write (insert, update, and delete) performance. This is because the indexes themselves are modified whenever the data is modified. As a result, you must be judicious in the use of indexes. If you know that a table is subject to a high level of insert, update and delete activity, you should limit the number of indexes placed on the table. Conversely, if a table is basically static, like most lookup tables, then a high number of indexes should not impair overall performance.”

Index Type (from guide)

“Generally, there are two types of queries: point queries, which return a narrow data set, and range queries, which return a larger data set. For those databases that support them, clustered indexes are better suited to satisfying range queries, or a set of index columns that have a relatively low cardinality. Non-clustered indexes are well suited to satisfying point queries.”

Bulk Loading

To insert a large amount of data into a table

1. Drop all indexes2. Sort the data to be inserted3. Insert the data (sorting helps disk blocks line up)4. Rebuild indexes

reconstruction from scratch is often fasterthan one-by-one insertion

Fragmentation

Split the contents of the table …

into separate locations on disk

onto several disks

Problem: disk i/o is slowTwo types:

vertical fragmentationsome columns here, some there

horizontal fragmentationsome rows here, some there

Physical Placement

Put frequently joined tables on separate hard drives. This yields parallel i/o.

Alternately, very frequently joined tables should be merged (denormalized).

Note: about 80% of cpu cycles are spent performing joins.

From ER Studio guide

Two key concerns of every database administrator are free space management and data fragmentation. If you do not properly plan for the volume and growth of your tables and indexes, these two administrative issues could severely impact system availability and performance. Therefore, when designing your physical model, you should consider the initial extent size and logical partition size.

Physical Database Design CIT 381 - alternate keys - named constraints - indexes.

Documents

Transcript of Physical Database Design CIT 381 - alternate keys - named constraints - indexes.