DWH Spring 2011 Lecture Slides Week6&7

18
Dr . Abdul Basit Siddi qui FUIEMS (Lecture Slides Weeks # 6&7)

Transcript of DWH Spring 2011 Lecture Slides Week6&7

Page 1: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 1/18

Dr. Abdul Basit Siddiqui

FUIEMS

(Lecture Slides Weeks # 6&7)

Page 2: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 2/18

Page 3: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 3/18

The need for ER modeling?

y Problems with early COBOLian data processingsystems.

y Data redundancies

y From flat file to Table, each entity ultimately 

becomes a Table in the physical schema.

y Simple O(n2) Join to work with Tables

Data Warehousing - Spring2011FUIEMS

Page 4: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 4/18

Why ER Modeling has been so successful?

y Coupled with normalization drives out all theredundancy out of the database.

y Change (or add or delete) the data at just one point.

y Can be used with indexing for very fast access.

y Resulted in success of OLTP systems.

Data Warehousing - Spring2011FUIEMS

Page 5: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 5/18

Need for DM: Un-answered Qs

y Lets have a look at a typical ER data model first.

y Some Observations:

y All tables look-alike, as a consequence it is difficult to identify:

y Which table is more important ?

y Which is the largest?

y

Which tables contain numerical measurements of thebusiness?

y Which table contain nearly static descriptive attributes?

Data Warehousing - Spring2011FUIEMS

Page 6: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 6/18

Need for DM: Complexity of Representation

y Many topologies for the same ER diagram, all appearingdifferent.

y Very hard to visualize and remember.

y A large number of possible connections to any two (ormore) tables

Data Warehousing - Spring2011

110

3

12

2

6

5

11 4

7

89

1

10

3

12

2

6

5

11

4

78

9

FUIEMS

Page 7: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 7/18

Need for DM: The Paradox

y The Paradox: Trying to make information accessible using tablesresulted in an inability to query them!

y ER and Normalizationresult in large number of tables which are:y Hard to understand by the users (DB programmers)

y Hard to navigate optimally by DBMS software

y Real value of ER is in using tables individuallyor in pairs

y Too complex for queries that span multiple tables with a largenumber of records

Data Warehousing - Spring2011FUIEMS

Page 8: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 8/18

ER vs. DMER 

y Constituted to optimize OLTPperformance.

y Models the microrelationships among dataelements.

y  A wild variability of thestructure of ER models.

y  Very vulnerable to changes inthe user's querying habits,because such schemas areasymmetrical.

DMy Constituted to optimize DSS

query performance.

y Models the macrorelationships among dataelements with an overalldeterministic strategy.

y  All dimensions serve as equalentry points to the fact table.

y Changes in users' queryinghabits can be accommodatedbyautomatic SQL generators.

Data Warehousing - Spring2011FUIEMS

Page 9: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 9/18

ow to simplify a ER data model?

yTwo general methods:

y De-Normalization

y

Dimensional Modeling (DM)

Data Warehousing - Spring2011FUIEMS

Page 10: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 10/18

What is DM?

y A simpler logical model optimized for decisionsupport.

y Inherently dimensional in nature, with a single

central fact table and a set of smallerdimensional tables.

yMulti-part key for the fact table

y

Dimensional tables with a single-part PK.y Keys are usually system generated

Data Warehousing - Spring2011FUIEMS

Page 11: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 11/18

What is DM?

y Results in a star like structure, called star schema orstar join.

y

All relationships mandatory M-1.

y Single path between any two levels.

y

Supports ROLAP operations.

Data Warehousing - Spring2011FUIEMS

Page 12: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 12/18

Dimensions have Hierarchies

Data Warehousing - Spring2011

Items

Books Cloths

Fiction Text Men Women

MedicalEngg

 Analysts tend to look at the data through Analysts tend to look at the data through

dimension at a particular ³level´ in thedimension at a particular ³level´ in the

hierarchyhierarchy

FUIEMS

Page 13: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 13/18

The two Schemas

Data Warehousing - Spring2011

Star 

Snow-flake

FUIEMS

Page 14: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 14/18

Simplified 3NF (Retail)

Data Warehousing - Spring2011

&,7< ',675,&7

=21( &,7<',675,&7 ',9,6,21

0217+ 475

6725( 675((7 =21(

:((. 0217+

'$7( :((.

5(&(,37 6725( '$7(

,7(05(&(,37

,7(0 &$7(*25<,7(0

'(37&$7(*25<

\HDU

PRQWK

ZHHN

VDOHBKHDGHU

VWRUH

VDOHBGHWDLO

LWHPB[BFDWLWHPB[BVSOLU

FDWB[BGHSW

0

0

0

0

0 0

0

0 0

0

0

<($5 475

0

TXDUWHU

6833/,(5

',9,6,21 3529,1&(0

GLYLVLRQ

GLVWULFW

]RQH

FUIEMS

Page 15: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 15/18

Vastly Simplified Star Schema

Data Warehousing - Spring2011

RECEIPT#

STORE#

DATE 

ITEM# M 

Fact Table

ITEM#

CATEGORY 

DEPT 

SUPPLIER 

Product Dim

Sale Rs.

STORE#

ZONE 

CITY 

PROVINCE 

Geography Dim

DISTRICT 

DATE 

WEEK 

QUARTER 

YEAR 

Time Dim

MONTH 

.

.

.

1

11

facts

DIVISION 

FUIEMS

Page 16: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 16/18

The Benefit of Simplicity

Data Warehousing - Spring2011

Beauty lies in close

correspondence with thebusiness, evident even to

business users.

FUIEMS

Page 17: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 17/18

Features of Star Schema

Dimensional hierarchies are collapsed into a single tablefor each dimension. Loss of Information?

 A single fact table created with a single header from the

detail records, resulting in:

y A vastly simplified physical data model!

y Fewer tables (thousands of tables in some ERP systems).

y Fewer joins resulting in high performance.

y Some requirement of additional space.

Data Warehousing - Spring2011FUIEMS

Page 18: DWH Spring 2011 Lecture Slides Week6&7

8/6/2019 DWH Spring 2011 Lecture Slides Week6&7

http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 18/18

Quantifying space requirement

Quantifying use of additional space using star schema

There are about 10 million mobile phone users in Pakistan.Say the top company has half of them = 500,000

Number of days in 1 year = 365Number of calls recorded each day = 250,000 (assumed)Maximum number of records in fact table = 91 billion rows Assuming a relatively small header size = 128 bytesFact table storage used = 11 Tera bytes Average length of city name = 8 characters } 8 bytesT

otal number of cities with

teleph

one access = 170 (1 byte)Space used for city name in fact table using Star = 8 x 0.091 = 0.728 TBSpace used for city code using snow-flake = 1x 0.091= 0.091 TB Additional space used } 0.637 Tera byte i.e. about 5.8%

Data Warehousing - Spring2011FUIEMS