Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
-
Upload
eugene-gallagher -
Category
Documents
-
view
216 -
download
0
Transcript of Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
Data WarehousingData Warehousing(Kimball, Ch.5-12)(Kimball, Ch.5-12)
Dr. Vairam ArunachalamDr. Vairam Arunachalam
School of Accountancy, MUSchool of Accountancy, MU
Sep. 23, 1999 Dr. Vairam Arunachalam 2
AgendaAgenda
Value ChainValue Chain ““Clean” construction of DDWClean” construction of DDW Financial ServicesFinancial Services Subscription BusinessesSubscription Businesses InsuranceInsurance Factless fact tablesFactless fact tables Decision Points in DDW constructionDecision Points in DDW construction
Sep. 23, 1999 Dr. Vairam Arunachalam 3
Value ChainValue Chain
Concept: integrated view of value-adding Concept: integrated view of value-adding components of business processcomponents of business process
Example on Demand side:Example on Demand side:– Finished Good inventoryFinished Good inventory– Manufacturing Shipments to Distribution Manufacturing Shipments to Distribution
CenterCenter– Distribution Center InventoryDistribution Center Inventory– Distribution Center Shipments to Retail StoresDistribution Center Shipments to Retail Stores– Retail Store InventoryRetail Store Inventory– Retail Store SalesRetail Store Sales
Sep. 23, 1999 Dr. Vairam Arunachalam 4
Value Chain (contd.)Value Chain (contd.)
Example on Supply side:Example on Supply side:– Purchase OrdersPurchase Orders– ReceivingReceiving– (Raw) Materials Inventory(Raw) Materials Inventory– Process ControlProcess Control– BOMBOM– Finished Goods InventoryFinished Goods Inventory– Manufacturing PlansManufacturing Plans
Sep. 23, 1999 Dr. Vairam Arunachalam 5
Value Chain (contd.)Value Chain (contd.)
Issues related to integration of value Issues related to integration of value chain information (I.e., drill-across):chain information (I.e., drill-across):– Shared dimensionsShared dimensions– Differences in physical dimension tablesDifferences in physical dimension tables– Common dimension tables as a solutionCommon dimension tables as a solution
Design Principle: Design Principle: – All constraints on dimensional attributes must All constraints on dimensional attributes must
evaluate to exactly the same set of evaluate to exactly the same set of dimensional entities from one db to another dimensional entities from one db to another in the value chainin the value chain
Sep. 23, 1999 Dr. Vairam Arunachalam 6
Value Chain (contd.)Value Chain (contd.)
Dimensions with reduced detail (e.g., Dimensions with reduced detail (e.g., manufacturing lot nos. versus SKUs)manufacturing lot nos. versus SKUs)
Derived dimensions supporting Derived dimensions supporting aggregates (e.g., construction of aggregates (e.g., construction of derived roll-up product dimension and derived roll-up product dimension and fact tables)fact tables)
Sep. 23, 1999 Dr. Vairam Arunachalam 7
““Clean” construction of Clean” construction of DDWDDW
Design principle:Design principle:– A master file, usually the source of unique A master file, usually the source of unique
identification, must be maintained on a identification, must be maintained on a regular basis. This needs QA on the p-key regular basis. This needs QA on the p-key and other fields.and other fields.
Snowflaking: the good (remember Snowflaking: the good (remember normalization?) and the bad (issue of normalization?) and the bad (issue of browsing performance) -- Fig.6.2browsing performance) -- Fig.6.2
Demographic minidimensions -- Fig.6-3Demographic minidimensions -- Fig.6-3
Sep. 23, 1999 Dr. Vairam Arunachalam 8
““Clean” DDW (contd.)Clean” DDW (contd.)
Slowly changing dimensions (implications, Slowly changing dimensions (implications, pro and con):pro and con):– Type 1 (Overwriting old values; losing ability Type 1 (Overwriting old values; losing ability
to track history)to track history)– Type 2 (Creating an additional dimension Type 2 (Creating an additional dimension
record; segmenting history)record; segmenting history)– Type 3 (Creating new fields with new attribute Type 3 (Creating new fields with new attribute
values within original dimension record, while values within original dimension record, while keeping original attribute values; describing keeping original attribute values; describing history both backward and forward)history both backward and forward)
Sep. 23, 1999 Dr. Vairam Arunachalam 9
Financial ServicesFinancial Services
Core fact tables: Household data Core fact tables: Household data warehouse (Fig.7.1)warehouse (Fig.7.1)
Dirty dimensionsDirty dimensions Semiadditive account balancesSemiadditive account balances Heterogeneous products (Fig.7.3):Heterogeneous products (Fig.7.3):
– Design principles:Design principles: create a core fact and core dimension tables for create a core fact and core dimension tables for
crossing types, and a custom fact and custom crossing types, and a custom fact and custom dimension tables for queryingdimension tables for querying
primary core facts duplicated in custom fact tablesprimary core facts duplicated in custom fact tables
Sep. 23, 1999 Dr. Vairam Arunachalam 10
Subscription BusinessesSubscription Businesses
Accounting concept underlying payments Accounting concept underlying payments in advance (I.e., deferred revenues)in advance (I.e., deferred revenues)
Design principle:Design principle:– Combine transaction-grained fact table with a Combine transaction-grained fact table with a
monthly snapshot-grained fact table in order to monthly snapshot-grained fact table in order to get at transaction frequency/timing and earned get at transaction frequency/timing and earned income in a given periodincome in a given period
Cable TV sales transaction and sales Cable TV sales transaction and sales monthly snapshot databases (Figs.8.1 & monthly snapshot databases (Figs.8.1 & 8.2)8.2)
Sep. 23, 1999 Dr. Vairam Arunachalam 11
InsuranceInsurance
Good illustration of several important Good illustration of several important concepts: concepts: – business processbusiness process– grain, dimensions (including degenerate and grain, dimensions (including degenerate and
dirty dimensions)dirty dimensions)– core & custom dimension and fact tablescore & custom dimension and fact tables– transaction & snapshot schemastransaction & snapshot schemas– heterogeneous productsheterogeneous products– slowly changing dimensionsslowly changing dimensions– minidimensionsminidimensions
Sep. 23, 1999 Dr. Vairam Arunachalam 12
InsuranceInsurance
Initial policy transaction and snapshot Initial policy transaction and snapshot schemas (Figs.9.1 and 9.3) and claims schemas (Figs.9.1 and 9.3) and claims transaction and snapshot schemas transaction and snapshot schemas (Figs.9.2 and 9.4)(Figs.9.2 and 9.4)
Sep. 23, 1999 Dr. Vairam Arunachalam 13
Factless Fact TablesFactless Fact Tables
Concept: no measured facts (still useful)Concept: no measured facts (still useful) Types: Types:
– event tracking (e.g., which hospital event tracking (e.g., which hospital procedures were performed most procedures were performed most extensively?)extensively?)
– coverage (e.g., which customers did coverage (e.g., which customers did notnot purchase any products?)purchase any products?)
Hospital patient procedure schema Hospital patient procedure schema (Fig.10.2)(Fig.10.2)
Sep. 23, 1999 Dr. Vairam Arunachalam 14
Decision Points in DDW Decision Points in DDW constructionconstruction
1. Processes -> fact table identification1. Processes -> fact table identification
2. Grain of fact table2. Grain of fact table
3. Dimensions of fact table3. Dimensions of fact table
4. Facts4. Facts
5. Dimension attributes5. Dimension attributes
6. Slowly changing dimensions6. Slowly changing dimensions
7. Aggregations, heterogeneity, minidimensions, 7. Aggregations, heterogeneity, minidimensions, queriesqueries
8. Historical duration of db8. Historical duration of db
9. Timeframe for data extraction/loading into DW9. Timeframe for data extraction/loading into DW