Dwh lecture slidesweek7&8

17
Dr. Abdul Basit Siddiqui

Transcript of Dwh lecture slidesweek7&8

Dr. Abdul Basit Siddiqui

The need for ER modeling?

Problems with early COBOLian data processing systems.

Data redundancies

From flat file to Table, each entity ultimately becomes a Table in the physical schema.

Simple O(n2) Join to work with Tables

Why ER Modeling has been so successful?

Coupled with normalization drives out all the redundancy out of the database.

Change (or add or delete) the data at just one point.

Can be used with indexing for very fast access.

Resulted in success of OLTP systems.

Need for DM: Un-answered Qs

Lets have a look at a typical ER data model first.

Some Observations: All tables look-alike, as a consequence it is difficult to

identify:

Which table is more important ?

Which is the largest?

Which tables contain numerical measurements of the business?

Which table contain nearly static descriptive attributes?

Need for DM: Complexity of Representation

Many topologies for the same ER diagram, all appearing different.

Very hard to visualize and remember.

A large number of possible connections to any two (or more) tables

110

3

12

2

6

5

11 4

7

89

110

3

12

2

6

5

11

4

78

9

Need for DM: The Paradox

The Paradox: Trying to make information accessible using tables resulted in an inability to query them!

ER and Normalization result in large number of tables which are:Hard to understand by the users (DB programmers)

Hard to navigate optimally by DBMS software

Real value of ER is in using tables individually or in pairs

Too complex for queries that span multiple tables with a large number of records

ER vs. DMER

Constituted to optimize OLTP performance.

Models the micro relationships among data elements.

A wild variability of the structure of ER models.

Very vulnerable to changes in the user's querying habits, because such schemas are asymmetrical.

DMConstituted to optimize

DSS query performance.Models the macro

relationships among data elements with an overall deterministic strategy.

All dimensions serve as equal entry points to the fact table.

Changes in users' querying habits can be accommodated by automatic SQL generators.

How to simplify a ER data model?

Two general methods:

De-Normalization

Dimensional Modeling (DM)

What is DM?

A simpler logical model optimized for decision support.

Inherently dimensional in nature, with a single central fact table and a set of smaller dimensional tables.

Multi-part key for the fact tableDimensional tables with a single-part

PK.Keys are usually system generated

What is DM?

Results in a star like structure, called star schema or star join.

All relationships mandatory M-1.

Single path between any two levels.

Supports ROLAP operations.

Dimensions have Hierarchies

Items

Books Cloths

Fiction Text Men Women

MedicalEngg

Analysts tend to look at the data through Analysts tend to look at the data through dimension at a particular “level” in the dimension at a particular “level” in the

hierarchyhierarchy

The two Schemas

Star Snow-flake

“Simplified” 3NF (Retail)CITY DISTRICT

1

ZONE CITYDISTRICT DIVISION

MONTH QTR

STORE #STREET ZONE ...

WEEK MONTH

DATE WEEK

RECEIPT #STORE # DATE ...

ITEM #RECEIPT # ... $

ITEM #CATEGORYITEM #

DEPTCATEGORY

year

month

week

sale_header

store

sale_detail

item_x_catitem_x_splir

cat_x_dept

M

1M

1M1

M

1

1

M M

1

M

M M1 1

M1

1

M

YEAR QTR

1

M

quarter

SUPPLIER

DIVISIONPROVINCEM1

divisiondistrict

zone

Vastly Simplified Star Schema

RECEIPT#

STORE#

DATE

ITEM# M

Fact Table

ITEM#

CATEGORY

DEPT

SUPPLIER

Product Dim

M

Sale Rs.

M

STORE#

ZONE

CITY

PROVINCE

Geography Dim

DISTRICT

DATE

WEEK

QUARTER

YEAR

Time Dim

MONTH

.

.

.1

11

facts

DIVISION

The Benefit of Simplicity

Beauty lies in close correspondence with the business, evident even to

business users.

Features of Star Schema

Dimensional hierarchies are collapsed into a single table for each dimension. Loss of Information?

A single fact table created with a single header from the detail records, resulting in:

A vastly simplified physical data model!

Fewer tables (thousands of tables in some ERP systems).

Fewer joins resulting in high performance.

Some requirement of additional space.