Database design

Database design basics

Anoop K. [email protected]/

anoopbaabtetwitter.com/anoop_baabtein.linkedin.com/in/

anoopbaabte/+91 9746854752

A properly designed database provides us with access to up-to-date, accurate information.

The data is stored in the database in tables those may be interrelated or not.

Each row of the table is also called a record, and each column, is also called a field.

A record is a meaningful and consistent way to combine information of an entity

A field is a single item of information

For example

In a Products table, for instance, each row or record would hold information about one product. Each column or field holds some type of information about that product, such as its name or price.

A good database design is one that: Divides our information into subject-based tables to reduce redundant data

Provides Access with the information it requires to join the information in the tables together as needed

Helps support and ensure the accuracy and integrity of our information

Accommodates our data processing and reporting needs

The design processThe design process consists of the following steps:

Determine the purpose of the database

Find and organize the information required

Divide the information into tables

Turn information items into columns

Specify primary keys

Set up the table relationships

Refine the design

Apply the normalization rules

Determining the purpose of the databaseWrite down the following on a paper

What the database is proposed for – its’ purpose, what kind of data will be stored in it

how do we expect to use it – how many users will be connecting to the database often / simultaneously, what is the kind of application for which it is going to be used

who will use it – how each kind of user using the application will use the database

how big the database can get

Finding and organizing the required information

identify the details of each item to be stored in the database

the details can just be a blueprint, it need not be perfect and defined at first

consider the types of reports or mailings we might want to produce from the database

each piece of information should be broken down into its smallest useful parts - for example, In the case of a name, to make the last name readily available, we will break the name into two parts — First Name and Last Name.

Think about the questions we might want the database to answer - For instance, how many sales of our featured product did we close last month? Where do our best customers live? Who is the supplier for our best-selling product? Etc.

Dividing the information into tablesTo divide the information into tables, choose the major entities, or subjects - For example, after finding and organizing information for a product sales database, the preliminary list might look like this:

We can continue to refine this list until we have a design that works well.

Dividing the information into tables – contnd.When we first review the preliminary list of items, we might be tempted to place them all in a single table, instead of the four shown in the last illustration. Consider the following table

In this case, each row contains information about both the product and its supplier. Because we can have many products from the same supplier, the supplier name and address information has to be repeated many times. This wastes disk space.

Recording the supplier information only once in a separate Suppliers table, and then linking that table to the Products table, is a much better solution.

Dividing the information into tables – contnd.A second problem with this design comes about when we need to modify information about the supplier. For example, suppose we need to change a supplier's address. Because it appears in many places, we might accidentally change the address in one place but forget to change it in the others. Recording the supplier’s address in only one place solves the problem.

Always try to record each fact just once. If you find yourself repeating the same information in more than one place, such as the address for a particular supplier, place that information in a separate table.

Another problem comes when there is only one product supplied by Coho Winery, and we want to delete the product, but retain the supplier name and address information.

Deleting a product record should delete only the facts about the product, not the facts about the supplier. For this, we must split the one table into two: one table for product information, and another table for supplier information.

Dividing the information into tables – contnd.Once we have chosen the subject that is represented by a table, columns in that table should store facts only about the subject.

For instance, the product table should store facts only about products. Because the supplier address is a fact about the supplier, and not a fact about the product, it belongs in the supplier table.

Turning information items into columnsTo determine the columns in a table, decide what information we need to track about the subject recorded in the table.

For example, if the address column in a customer table contains customers’ addresses. Each record contains data about one customer, and the address field contains the address for that customer.

The following list shows a few tips for determining columns.

Don’t include calculated data - The calculations can be done when the data is retrieved.

Store information in its smallest logical parts - If we combine more than one kind of information in a field, it is difficult to retrieve individual facts later.

Specifying primary keysEach table should include a column or set of columns that uniquely identifies each row stored in the table

In database terminology, this information is called the primary key of the table

for a column to be identified as primary key, the values in the column should always be different for each record (we cannot have duplicate values in a primary key)

A primary key must always have a value. If a column's value can become unassigned or unknown (a missing value) at some point, it can't be used as a component in a primary key.

Often, an arbitrary unique number is used as the primary key. For example, you might assign each order a unique order number. The order number's only purpose is to identify an order. Once assigned, it never changes.

Specifying primary keys – contind. If you don’t have in mind a column or set of columns that might make a good

primary key, consider using a column that has the AutoNumber data type.

A combination of two or more columns can also be set as a primary key, if there is no chance of duplication of the combination (or if you don’t want the combination to be duplicated)

Creating the table relationshipsIn a relational database, we divide our information into separate, subject-based tables. We then use table relationships to bring the information together as needed.

Creating a one-to-many relationship

Consider this example: the Suppliers and Products tables in the product orders database. A supplier can supply any number of products. It follows that for any supplier represented in the Suppliers table, there can be many products represented in the Products table. The relationship between the Suppliers table and the Products table is, therefore, a one-to-many relationship.

Creating the table relationships - contndTo represent a one-to-many relationship in the database design, take the primary key on the "one" side of the relationship and add it as an additional column or columns to the table on the "many" side of the relationship.

In the previous example, the Supplier ID column in the Products table is called a foreign key. A foreign key is another table’s primary key.

Creating a many-to-many relationship

Consider the relationship between the Products table and Orders table.

A single order can include more than one product. On the other hand, a single product can appear on many orders. Therefore, for each record in the Orders table, there can be many records in the Products table. And for each record in the Products table, there can be many records in the Orders table.

Creating the table relationships - contndIn this situation keeping the order_id in the products table or keeping the product_id in the orders table (as in case of a one-to-many relationships) is an inefficient design because it leads to unwanted data redundancy.

The solution is to create a third table, often called a junction table, that breaks down the many-to-many relationship into two one-to-many relationships. We insert the primary key from each of the two tables into the third table. As a result, the third table records each occurrence or instance of the relationship.

Creating the table relationships - contndCreating a one-to-one relationship

For instance, suppose you need to record some special supplementary product information that you will need rarely or that only applies to a few products. Because you don't need the information often, and because storing the information in the Products table would result in empty space for every product to which it doesn’t apply, you place it in a separate table.

When we do identify such a relationship, both tables must share a common field.

When a one-to-one or one-to-many relationship exists, the tables involved need to share a common column or columns. When a many-to-many relationship exists, a third table is needed to represent the relationship.

Refining the designOnce we have the tables, fields, and relationships we need, we should create and populate our tables with sample data and try working with the information: creating queries, adding new records, and so on.

Doing this helps highlight potential problems.

See if we can use the database to get the answers we want. Create rough drafts of our forms and reports and see if they show the data we expect. Look for unnecessary duplication of data and, if found any, alter the design to eliminate it.

Below is a checklist for refining the database

Did you forget any columns?

Are any columns unnecessary because they can be calculated from existing fields?

Refining the design - contnd Are you repeatedly entering duplicate information in one of your tables?

Do you have tables with many fields, a limited number of records, and many empty fields in individual records?

Has each information item been broken into its smallest useful parts?

Does each column contain a fact about the table's subject?

Are all relationships between tables represented, either by common fields or by a third table?

Applying the normalization rulesWe can apply the data normalization rules (sometimes just called normalization rules) as the next step in our design. We use these rules to see if our tables are structured correctly.

We apply the rules in succession, at each step ensuring that our design arrives at one of what is known as the "normal forms.“

Five normal forms are widely accepted — the first normal form through the fifth normal form. We will examine the first three, because they are all that is required for the majority of database designs.

First normal formFirst normal form states that at every row and column intersection in the table there, exists a single value, and never a list of values. For example, you cannot have a field named Price in which you place more than one Price. If you think of each intersection of rows and columns as a cell, each cell can hold only one value.

Applying the normalization rules – contnd.Second normal formSecond normal form requires that each non-key column be fully dependent on the entire primary key, not on just part of the key. This rule applies when we have a primary key that consists of more than one column.

For example, suppose you have a table containing the following columns, where Order ID and Product ID form the primary key:

Order ID (primary key)Product ID (primary key)Product Name

This design violates second normal form, because Product Name is dependent on Product ID, but not on Order ID, so it is not dependent on the entire primary key. You must remove Product Name from the table. It belongs in a different table (Products).

Applying the normalization rules – contnd.Third normal formThird normal form requires that not only every non-key column be dependent on the entire primary key, but that non-key columns be independent of each other.

Another way of saying this is that each non-key column must be dependent on the primary key and nothing but the primary key. For example, suppose we have a table containing the following columns:

ProductID (primary key)NameSRPDiscount

Assume that Discount depends on the suggested retail price (SRP). This table violates third normal form because a non-key column, Discount, depends on another non-key column, SRP. Column independence means that you should be able to change any non-key column without affecting any other column. If you change a value in the SRP field, the Discount would change accordingly, thus violating that rule. In this case Discount should be moved to another table that is keyed on SRP.

Follow us @ twitter.com/baabtra

Like us @ facebook.com/baabtra

Subscribe to us @ youtube.com/baabtra

Become a follower @ slideshare.net/BaabtraMentoringPartner

Connect to us @ in.linkedin.com/in/baabtra

Thanks in advance.

www.baabtra.com | www.massbaab.com |www.baabte.com

http://twitter.com/baabtra

http://twitter.com/baabtra

http://facebook.com/baabtra

http://facebook.com/baabtra

https://www.youtube.com/user/baabtra

https://www.youtube.com/user/baabtra

http://www.slideshare.net/BaabtraMentoringPartner

http://www.slideshare.net/BaabtraMentoringPartner

http://www.baabtra.com/

http://www.massbaab.com/

http://www.baabte.com/

Contact Us

Emarald Mall (Big Bazar Building)Mavoor Road, Kozhikode,Kerala, India.Ph: + 91 – 495 40 25 550

NC Complex, Near Bus StandMukkam, Kozhikode,Kerala, India.Ph: + 91 – 495 40 25 550

Cafit Square,Hilite Business Park,Near Pantheerankavu,Kozhikode

Start up VillageEranakulam,Kerala, India.

Email: [email protected]

Database design

Software

Transcript of Database design