Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related...

Post on 13-Dec-2015

219 views 0 download

Tags:

Transcript of Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related...

Data Modeling Overview

By: Dave Wentzel

What we will accomplish

Review of DBMS Issues related to DBMS Entity Relationship Modeling

– Process flow– Model types– Component definition

Selecting entities and attributes Defining relationships

What we will accomplish

Defining Cardinality Selecting Primary Keys Review of recursive relationships, weak

entities, and ternary relationships Participation constraints Erwin Notation NULL issues The Physical Model

What we will accomplish

Generalization / Specialization Transaction processing Normalization Rules History issues

What is data? Data

– Raw facts. Can be described, observed, and measured.

Information– Data organized in a form that is useful for

decision making. The meaning behind the data.– New thing not previously observed that is

created based on the data. Knowledge

– Information that is used for decision making.

What is a Database?

Collection of interrelated data Data which can be visualized in a table

format Contains relationships between data Can be of any size and varying complexity Can be maintained manually or by

computer

D atabase

Data Base Management System (DBMS)

Collection of programs (software) that allows users to create and maintain a database

Supports data:– Definition - specification of data types,

structures, and constraints– Construction - storing of the data itself– Manipulation - updating & querying of the data

Defines itself. Contains a catalog which describes its data.

Components of a DBMS

Catalog– Maintains information about the data in the

database– Considered data about data (metadata)

Databases– Collection of related tables

Tables– Rows and columns containing data

Issues in DBMS Data independence Query optimization

– Improve efficiency– Faster responses

Transaction management– Sequence of operations that are treated as a unit– Once 1st step is completed, 2nd step must also be

completed otherwise 1st step is aborted (ROLLBACK mechanism)

Example: Transferring Bank Funds

Issues in DBMS continued

Transaction management – Concurrency– Recovery

Controlled redundancy– Goal of database design is to minimize

redundancy (duplicate data) Integrity constraints

– Includes business rules and data rules

Issues in DBMS continued

Security and privacy– Protect against unauthorized access

Data / database administration– Involves managing people, data, performance,

security, etc.

Entity Relationship Modeling

Person Account

T ransaction

Em ployee

Data Model

Tool for describing data, its relationships, semantics, and integrity constraints

Provides for data abstraction Hides details of data storage

Why use an ER Model?

Easy to use for modeling DB design Succinct representation of database layout Good communication tool among project

team members Most case tools support ER modeling Implementation independent

Categories of Data Models

Logical model – Conceptual data model– High level model– Closest view user has of the data

Physical model– Low level model– Defines how data is stored

Steps in Database Design

Mini World

RequirementsCollection and

Analysis

Functional Analysis

Functional Requirements

Database Requirements

API

Physical Design

TransactionImplementation

Application ProgramDesign

Logical Model

Data Model MappingDBMS Independent

DBMS Specific

High Level Trans-action Requirement

Internal Schema

Application Programs

ER Modeling composed of

Entity (table) Attribute (field) Relationship

– Binary Relationships– Cardinality of relationships

What is an entity?

Conceptual definition– Distinguishable object that exists

Operational definition– Business object that has properties we are

interested in storing Physical definition

– Set of related data forming a table composed of attributes (fields)

Entities

Primary THINGS of a business about which users need to record data

Objects about which the business is interested in tracking information

When an ER Diagram is translated into a relational model, the entities become the tables.

Selecting Entities

Nouns are candidate entities Possible classes of entities:

– People who carry out some function ( employees, students, customers)

– Places (cities, offices, routes)– Things which are tangible physical objects

(equipment, products, buildings)– Organizations (teams, suppliers, departments)

Selecting Entities Continued

Events which occur at a given date/time or have steps (employee promotions, project phases, account payments)

Concepts which are intangible ideas used to keep track of business activities (projects, accounts, complaints)

Questions to ask...

What things do we need to keep data about? What things are essential to the organization? What things do we talk about in the organization? What questions do we have that reports can help

answer? What information should the reports contain?

Naming entities

Use a SINGULAR noun Meaningful but intuitive Avoid names which may be misinterpreted within

the problem domain Follow organizational / industry trends Do not try to rename entities within an organization Avoid abused names such as Task, Form,

Operation, Schedule...

Is it an entity to worry about?

Decide if an entity is relevant to your problem domain by determining if it has attributes you need to track

If it does not have attributes you need to track, it is NOT a valid entity for your problem

Is it really an entity?

Can you define attributes for it? An attribute is a piece of information that we are interested in tracking about an entity. It is a property of an entity.

In general, if two objects differ by one attribute, they are separate entities.

Does it participate in a relationship? Two entities that are related somehow interact with one another.

Attributes

Properties of an object (entity) Each attribute has a data type (char, int,

datetime) Each attribute in an RDBMS (relational

database management system) has only one value at a time (atomic)

Categories of Attributes

Descriptive– Property of the entity that helps describe the

entity Identifying (key attributes)

– Property of the entity that helps uniquely identify the entity

– Normally short– If one does not exist it MUST be created– If creating a key, use a numeric/integer data

type

Types of Attributes

Atomic– Indivisible value– Most desired state

Composite– Can be divided into smaller parts– Need to convert into atomic

Types of Attributes Continued

Multi-valued– Multiple instances of an attribute– Normally create another entity

Derived– Can be determined by the value of another

attribute or attributes– In most cases, do NOT store derived attributes

Naming Attributes

Use a noun, adjective, or adverb Name should be unique database wide Use attribute names consistently Use singular names Define a naming convention for the

organization

Rules for Entity Analysis Every noun is a candidate for an entity Every entity should be relevant to the problem If an object has only one property of importance,

then it should be considered an attribute of another entity

If an object has only one data instance (1 row) then do not model as an entity

If an object needs a unique identifier then model it as an entity

Relationships

Way entities interact with one another An association between two or more

entities Depicts business interactions between

entities They DO NOT represent business flow

Relationships Continued

Number of entities associated through a relationship defines its degree (unary, binary, ternary, n-ary)

Cardinality defines the maximum number of entities that can participate in the relationship

How to Identify a Relationship

Ask what is the action or verb used to describe how one entity interacts with another

Three types of relations to consider:– Existence (Employee HAS Children)– Functional (Professor TEACHES Course)– Event (Customer PLACES Order)

Ignore verbs not important to the organization

More on Relationships

Relationships and cardinality constraints represent business rules

When naming a relationship use and active verb in the present tense

Relationships are read bi-directionally

Example notes: Together the customer and account tables form a

schema - structure / layout of a logical database design

Note the attributes. Order DOES NOT MATTER but convention puts primary key first.

No duplicates for attributes. No duplicate tuples (rows) Relationship - same attribute name ( or different

attribute name with same meaning, in 2 tables.

Cardinality Constraints

Express the MAXIMUM number of entities that can be associated with another entity via a relationship

Also known as mapping constraints Types:

– 1:1 (one to one)– 1:N (one to many)– N:M (many to many)

The Key to It All

Identifiers...

Attribute(s) which uniquely identify a record

An entity may have multiple identifiers Every entity MUST have at least one Can be made up of more then one attribute

Candidate vs. Primary Keys

Both are identifiers Candidate keys are all the identifiers from

which you can choose which uniquely identify the record

Primary key is the one candidate key which is selected to always uniquely identify the record

Selecting the Primary Key

In general we create a primary key however...

Choose the attribute most widely used in the query

Select the shorter data type If one does not exist, must create one Select a MINIMUM key if using compound

attributes (not recommended)

Key Requirements and Preferences Known at all times Can NOT be null Should not be changed Shorter is better Numeric / integer is better Avoid keys containing letters O, I, Z, S - can be

confused with numbers If key includes time, it should be in 24hr format Avoid carrying meaning

With this all said...

It is difficult to come up with a primary key based on real attributes which will not change over time (phone numbers, SSN, addresses, driver’s license numbers…)

In most cases it is best to create the primary key

In SQL Server can use the identity column which creates a sequential number

Primary Keys and Relationships

In a 1:1 relationship, the primary key of either one of the entities must migrate to the other entity

In a 1:N, the primary key of the 1 side must migrate to the entity on the N side

In a M:N, the keys of both entities are used to identify a new entity which resolves the M:N into two 1:N relationships

Foreign Key

When a key migrates to another entity it is called a Foreign Key

A foreign key CAN BE null if it is not part of an entity’s primary key

If the FK value is NOT null, then that value MUST exist in the table in which it is the primary key. This is called Referential Integrity (RI)

Recursive Relationships

An entity having a relationship with itself Same entity participates more than once in

a relationship type in different roles Same cardinality examples exist in

recursive relationships

Weak Entity Type

Entity that does not have a key attribute of its own

Identified by its relationship with another entity Created for multi-valued attributes and time

dependent attributes Weak entity has EXISTENCE dependence on

the parent. Only exists if the owner entity exists.

Primary Keys of Weak Entities

Can use the primary key of the owner entity along with a qualifier such as sequence number or date/time

Can create a surrogate key but make sure you migrate the key of the parent

Ternary Relationship

Relationship between 3 entities Differs from 3 binary relationships States that all three entities occur at the

same time Must be converted to binary relationships

Creating Binary Relationships from a Ternary Relationship

Participation Constraints

Specifies whether the existence of an entity depends on its being related to another entity via a relationship

Notes the minimum cardinality Total participation (mandatory) Partial participation (optional)

Identifying Participation Constraints

Can entity A exist without entity B?– If no, A has total participation in the

relationship– If yes, entity A has partial participation in the

relationship

Identifying Relationships In Erwin

An identifying relationship is a relationship between two tables in which an instance of a child table is identified through its association with a parent table, which means the child table is dependent on the parent table for its identity, and cannot exist without it. In an identifying relationship, one instance of the parent table is related to multiple instances of the child.

Non-Identifying Relationship In Erwin

A non-identifying relationship is a relationship between two tables in which an instance of the child table is not identified through its association with a parent table, which means the child table is not dependent on the parent table for its identity, and can exist without it. In a non-identifying relationship, one instance of the parent table is related to multiple instances of the child.

Optional Non-Identifying

In an optional non-identifying relationship, the columns that are migrated into the non-key area of the child table are not required in the child table. This means that nulls are allowed in the foreign key. ERwin draws an optional non-identifying relationship differently depending on the notation for your diagram

Mandatory Non-Identifying

In a mandatory non-identifying relationship, the columns that are migrated into the non-key area of the child table are required in the child table. This means that the foreign key cannot be null.

Erwin NotationCardinality Description

Identifying Non-Identifying

Nulls No Nulls

One to 0, 1, or M

To Null or Not to Null….

NULL means no value Two types of null values

– Unknown– None (does not exist or not applicable)

Null Examples

Employee

e# name salary spouse1 Bob 10,000 Mary2 Jack 20,000 Kate3 Mary 30,000 NULL4 Kelly NULL John

Questions:

• How many people make more than 15K?

• What is the average salary?

• Is Mary married?

Problems with NULL

Null values are ambiguous More programming is required to deal with

NULL values Try to use UNKNOWN or NONE if

applicable

Getting Physical…

Getting Physical…

Converting the logical data model into the physical data model

Things to do when converting

Identify data type– Is it a string (character field) or a number?– Use of varchar() or char()?– Dates are dates not strings

Identify data length– Consider growth over time and maximum size

requirements Identify value constraints (valid ranges, values,

etc.)

Things to do when converting

Follow proper naming conventions Determine indexes Consider combining 1:1 relationship

entities Roll-up generalization / specialization

hierarchies Add organizational attributes if any

Indexes

Index is a physical access structure Makes queries more efficient Things to consider when creating

– Create an index for each PK– Create an index for each FK– Create an index for each AK which will be used in

queries– Try to minimize number of indexes (update

overhead)

Specialization / Generalization

Specialization / Generalization

Inheritance / Abstraction Subclasses / Superclasses

Specialization / Generalization

Two processes resulting in the same model Specialization is top-down approach. Can a

high level entity be broken down? Generalization is bottom-up approach. Can

entities be combined at a higher level?

Example

Notes on Generalization/Specialization Key of subclass is always key of superclass Subclasses can participate in their own relationships Participation in a subclass can either be inclusive or

exclusive Exclusive subclasses should be defined by a type Multiple inheritance not allowed in most modeling tools When converting to physical could combine into one

entity

Database Operations

CRUD – Create (Insert)– Read– Update (Modify)– Delete

Transactions can not violate any integrity constraints

Several may be grouped into a transaction May propagate to maintain integrity constraints

If update violations occur

Cancel the operation (Restrict) Perform additional updates / deletes so the

violation is corrected (Cascade) Execute a user specified operation to

correct (Trigger) Perform the operation but inform the user

Normalization - What’s normal...

Normalization

Process to design a highly desirable relational schema using functional dependencies

Guidelines for relational database design which– Minimize redundancy– Avoid potential inconsistency– Help predict data behavior problems– Avoid update anomalies

Update Anomalies

Insert extra values Add redundant records Delete records not intended Change a fact more then once, possibly in

multiple tables Miss changing a fact which is repeated

multiple times

Normal Forms

First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fourth Normal Form Fifth Normal Form

# of Tables

Joins

First Normal Form A relation is in 1NF if it contains only scalar

(atomic) values– One value for an attribute– No repeating groups– No composite attributes– No multi-valued attributes

To convert to 1NF– Create 1 table for each repeating group by adding the

PK of the original table– Remove the repeating group from the original table

Example of Non-1NF w/ ConversionNon-1NF

Dname Dnumber DMGRSSN DlocationsResearch 5 333445555 {Bellaire, Sugarland, Houston}Administration 4 987654321 Stafford, VoorheesHeadquarters 1 888665555 Houston

1NF (note redundancy)

Dname Dnumber DMGRSSN DlocationsResearch 5 333445555 BellaireResearch 5 333445555 SugarlandResearch 5 333445555 HoustonAdministration 4 987654321 StaffordAdministration 4 987654321 VoorheesHeadquarters 1 888665555 Houston

Example of Non-1NFEmployeeProject - NON-1NF

SSN Ename Pnumber Hours123456789 Smith, John 1 32.5

2 7.5666885555 Narayan, Ramesh 3 40453223344 English, Joyce 1 20

2 20

Conversion

SSN Ename SSN Pnumber Hours123456789 Smith, John 123456789 1 32.5666885555 Narayan, Ramesh 123456789 2 7.5453223344 English, Joyce 666885555 3 40

453223344 1 20453223344 2 20

Second Normal Form

All attributes in the relation have a functional dependency on the complete PK

Each non-key attribute is uniquely defined by all components of the primary key

Example of Non-2NF w/ ConversionEmployeeProject

SSN Pnumber Hours Ename Pname Plocation FD1

FD2FD3

Conversion to 2NF

EP1SSN Pnumber Hours

EP2SSN Ename

EP3Pnumber Pname Plocation

Third Normal Form

Every non-key attribute (does not participate in the primary key) is mutually independent

Irreducibly dependent on the primary key

Example of Non-3NF w/ ConversionExample

LotsPropertyID# CountyName Lot# Area Price TaxRate

2NF

Lots1PropertyID# CountyName Lot# Area Price

Lots2CountyName TaxRate

3NF

Lots1APropertyID# CountyName Lot# Area

Lots1BArea Price

Maintaining History

Maintaining History can serve one of two purposes:– Tracking changes in the entity over time– Tracking record history in order to maintain inactive

records over time and maintain RI Tracking changes in an entity over time is very

difficult and requires significant storage Tracking inactive records is our standard here

and provides value to the end user

Examples of History…