Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

14
Data Warehousing Data Warehousing (Kimball, Ch.5-12) (Kimball, Ch.5-12) Dr. Vairam Arunachalam Dr. Vairam Arunachalam School of Accountancy, MU School of Accountancy, MU

Transcript of Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Page 1: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Data WarehousingData Warehousing(Kimball, Ch.5-12)(Kimball, Ch.5-12)

Dr. Vairam ArunachalamDr. Vairam Arunachalam

School of Accountancy, MUSchool of Accountancy, MU

Page 2: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 2

AgendaAgenda

Value ChainValue Chain ““Clean” construction of DDWClean” construction of DDW Financial ServicesFinancial Services Subscription BusinessesSubscription Businesses InsuranceInsurance Factless fact tablesFactless fact tables Decision Points in DDW constructionDecision Points in DDW construction

Page 3: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 3

Value ChainValue Chain

Concept: integrated view of value-adding Concept: integrated view of value-adding components of business processcomponents of business process

Example on Demand side:Example on Demand side:– Finished Good inventoryFinished Good inventory– Manufacturing Shipments to Distribution Manufacturing Shipments to Distribution

CenterCenter– Distribution Center InventoryDistribution Center Inventory– Distribution Center Shipments to Retail StoresDistribution Center Shipments to Retail Stores– Retail Store InventoryRetail Store Inventory– Retail Store SalesRetail Store Sales

Page 4: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 4

Value Chain (contd.)Value Chain (contd.)

Example on Supply side:Example on Supply side:– Purchase OrdersPurchase Orders– ReceivingReceiving– (Raw) Materials Inventory(Raw) Materials Inventory– Process ControlProcess Control– BOMBOM– Finished Goods InventoryFinished Goods Inventory– Manufacturing PlansManufacturing Plans

Page 5: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 5

Value Chain (contd.)Value Chain (contd.)

Issues related to integration of value Issues related to integration of value chain information (I.e., drill-across):chain information (I.e., drill-across):– Shared dimensionsShared dimensions– Differences in physical dimension tablesDifferences in physical dimension tables– Common dimension tables as a solutionCommon dimension tables as a solution

Design Principle: Design Principle: – All constraints on dimensional attributes must All constraints on dimensional attributes must

evaluate to exactly the same set of evaluate to exactly the same set of dimensional entities from one db to another dimensional entities from one db to another in the value chainin the value chain

Page 6: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 6

Value Chain (contd.)Value Chain (contd.)

Dimensions with reduced detail (e.g., Dimensions with reduced detail (e.g., manufacturing lot nos. versus SKUs)manufacturing lot nos. versus SKUs)

Derived dimensions supporting Derived dimensions supporting aggregates (e.g., construction of aggregates (e.g., construction of derived roll-up product dimension and derived roll-up product dimension and fact tables)fact tables)

Page 7: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 7

““Clean” construction of Clean” construction of DDWDDW

Design principle:Design principle:– A master file, usually the source of unique A master file, usually the source of unique

identification, must be maintained on a identification, must be maintained on a regular basis. This needs QA on the p-key regular basis. This needs QA on the p-key and other fields.and other fields.

Snowflaking: the good (remember Snowflaking: the good (remember normalization?) and the bad (issue of normalization?) and the bad (issue of browsing performance) -- Fig.6.2browsing performance) -- Fig.6.2

Demographic minidimensions -- Fig.6-3Demographic minidimensions -- Fig.6-3

Page 8: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 8

““Clean” DDW (contd.)Clean” DDW (contd.)

Slowly changing dimensions (implications, Slowly changing dimensions (implications, pro and con):pro and con):– Type 1 (Overwriting old values; losing ability Type 1 (Overwriting old values; losing ability

to track history)to track history)– Type 2 (Creating an additional dimension Type 2 (Creating an additional dimension

record; segmenting history)record; segmenting history)– Type 3 (Creating new fields with new attribute Type 3 (Creating new fields with new attribute

values within original dimension record, while values within original dimension record, while keeping original attribute values; describing keeping original attribute values; describing history both backward and forward)history both backward and forward)

Page 9: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 9

Financial ServicesFinancial Services

Core fact tables: Household data Core fact tables: Household data warehouse (Fig.7.1)warehouse (Fig.7.1)

Dirty dimensionsDirty dimensions Semiadditive account balancesSemiadditive account balances Heterogeneous products (Fig.7.3):Heterogeneous products (Fig.7.3):

– Design principles:Design principles: create a core fact and core dimension tables for create a core fact and core dimension tables for

crossing types, and a custom fact and custom crossing types, and a custom fact and custom dimension tables for queryingdimension tables for querying

primary core facts duplicated in custom fact tablesprimary core facts duplicated in custom fact tables

Page 10: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 10

Subscription BusinessesSubscription Businesses

Accounting concept underlying payments Accounting concept underlying payments in advance (I.e., deferred revenues)in advance (I.e., deferred revenues)

Design principle:Design principle:– Combine transaction-grained fact table with a Combine transaction-grained fact table with a

monthly snapshot-grained fact table in order to monthly snapshot-grained fact table in order to get at transaction frequency/timing and earned get at transaction frequency/timing and earned income in a given periodincome in a given period

Cable TV sales transaction and sales Cable TV sales transaction and sales monthly snapshot databases (Figs.8.1 & monthly snapshot databases (Figs.8.1 & 8.2)8.2)

Page 11: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 11

InsuranceInsurance

Good illustration of several important Good illustration of several important concepts: concepts: – business processbusiness process– grain, dimensions (including degenerate and grain, dimensions (including degenerate and

dirty dimensions)dirty dimensions)– core & custom dimension and fact tablescore & custom dimension and fact tables– transaction & snapshot schemastransaction & snapshot schemas– heterogeneous productsheterogeneous products– slowly changing dimensionsslowly changing dimensions– minidimensionsminidimensions

Page 12: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 12

InsuranceInsurance

Initial policy transaction and snapshot Initial policy transaction and snapshot schemas (Figs.9.1 and 9.3) and claims schemas (Figs.9.1 and 9.3) and claims transaction and snapshot schemas transaction and snapshot schemas (Figs.9.2 and 9.4)(Figs.9.2 and 9.4)

Page 13: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 13

Factless Fact TablesFactless Fact Tables

Concept: no measured facts (still useful)Concept: no measured facts (still useful) Types: Types:

– event tracking (e.g., which hospital event tracking (e.g., which hospital procedures were performed most procedures were performed most extensively?)extensively?)

– coverage (e.g., which customers did coverage (e.g., which customers did notnot purchase any products?)purchase any products?)

Hospital patient procedure schema Hospital patient procedure schema (Fig.10.2)(Fig.10.2)

Page 14: Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.

Sep. 23, 1999 Dr. Vairam Arunachalam 14

Decision Points in DDW Decision Points in DDW constructionconstruction

1. Processes -> fact table identification1. Processes -> fact table identification

2. Grain of fact table2. Grain of fact table

3. Dimensions of fact table3. Dimensions of fact table

4. Facts4. Facts

5. Dimension attributes5. Dimension attributes

6. Slowly changing dimensions6. Slowly changing dimensions

7. Aggregations, heterogeneity, minidimensions, 7. Aggregations, heterogeneity, minidimensions, queriesqueries

8. Historical duration of db8. Historical duration of db

9. Timeframe for data extraction/loading into DW9. Timeframe for data extraction/loading into DW