Creating the Dimensional Model

37
4 Creating the Dimensional Model

description

Creating the Dimensional Model. Data Warehouse Database Design Objectives. Easy to understand Optimum performance Adaptable. Data Warehouse Data Type. Description. Fact (detail, atomic, raw) base data. Business measures. Dimension data. Query drivers. Derived fact data. - PowerPoint PPT Presentation

Transcript of Creating the Dimensional Model

Page 1: Creating the Dimensional Model

4Creating the Dimensional Model

Page 2: Creating the Dimensional Model

Data Warehouse Database Design Objectives

• Easy to understand

• Optimum performance

• Adaptable

Page 3: Creating the Dimensional Model

Data Warehouse Data Types

Data Warehouse Data Type DescriptionFact (detail, atomic, raw) base data Business measuresDimension data Query driversDerived fact data Calculated dataSummary (aggregate) data Pre-calculated dataMetadata Warehouse map

Page 4: Creating the Dimensional Model

Designing the Dimensional Model

Phase III: Defining the dimensional model

Page 5: Creating the Dimensional Model

Star Dimensional Modeling

History(Dimension

table)

Customer(Dimension

table)

Product(Dimension

table)

Channel(Dimension

table)

Item_nbrItem_descQuantityDiscnt_priceUnit_priceOrder_amount…

(Fact table)

Order

Page 6: Creating the Dimensional Model

Advantages of Using a Star Dimensional Model

• Supports multidimensional analysis

• Design improves performance

• Optimizers yield better execution plans

• Parallels end-user perceptions

• Provides an extensible design

• Broadens the choices for data access tools

Page 7: Creating the Dimensional Model

Fact Tables

• Contain numerical metrics of the business

• Can hold large volumes of data

• Can grow quickly

• Can contain base, derived, and summarized data

• Are typically additive

• Are joined to dimension tables through foreign keys that reference primary keys in the dimension tables

Page 8: Creating the Dimensional Model

Factless Fact Tables

• Event tracking

• Coverage

Page 9: Creating the Dimensional Model

More on Factless Fact Tables

Emp_FKSal_FKAge_FKEd_FKGrade_FK

Grade dimensionGrade_PK

Education dimensionEd_PK

Employee dimensionEmp_PK

Salary dimensionSal_PK

Age dimensionAge_PK

PK = Primary Key & FK = Foreign Key

Page 10: Creating the Dimensional Model

Identify Base and Derived Measures

• Identify candidate facts

• Remove duplicate facts

• Discover and document the underlying calculations

• Cross reference base facts

• Obtain final derived fact approval

Page 11: Creating the Dimensional Model

Base and Derived Data

Payroll table

Derived dataBase data

Emp_FK Month_FK Salary Comm Comp101 05 1,000 0 1,000102 05 1,500 100 1,600103 05 1,000 200 1,200104 05 1,500 1,000 2,500

Page 12: Creating the Dimensional Model

Translating Business Measures into a Fact Table

Business measures

Facts

Business MeasuresNumber of ItemsAmountCostProfit

FactNumber of ItemsItem Amount

Item CostProfit

BaseBaseBaseDerived

Page 13: Creating the Dimensional Model

Fact Table Measures

Nonadditive:Cannot be added

along any dimension

Semiadditive: Added along some

dimensions

Additive: Added across all

dimensions

Page 14: Creating the Dimensional Model

Dimension Tables

• Contain descriptors of the business

• Contain relatively static data

• Contain textual and discrete data

• Are usually smaller than fact tables

• Are joined to a fact table through a foreign key reference

Page 15: Creating the Dimensional Model

Translating Business Dimensions into Dimension Tables

Dimension TablesBusiness Dimensions

Customer

Product

Store

Day

Source System Information

Page 16: Creating the Dimensional Model

Translating a Product Business Dimension into a

Product Dimension Table

Product_statusList_price

Business Dimension for Product

Product_Id (Natural Key)

Product_desc

Supplier_Id

Product_name

Category

Promotion_CodeWarehouse_location

Product_typeProduct_code

Catalog_Id

Business DimensionProductProductCategorySupplier

TypeWarehousePromotion

Catalog

Page 17: Creating the Dimensional Model

Date Dimension

• Should contain the attributes required by all fact

tables.

• Can be categorized into 4 groups

• Date formats

• Calendar date attributes : day, month name,

year

• Fiscal attributes : fiscal week, fiscal period

• Indicator columns : Boolean values such as it

is a national holiday

Page 18: Creating the Dimensional Model

Slowly Changing Dimensions

Where Product_key is a calculated number stored within the database

Business Dimension for Product

Product_Id (Natural Key)

Product_desc

Supplier_Id

Product_name

Category

Product_PK (surrogate key)

Product_typeProduct_code

Catalog_Id

Product_statusList_price

Valid_from_dateValid_to_date

Promotion_codeWarehouse_location

Business DimensionProductProductCategorySupplier

TypeWarehousePromotion

Catalog

Page 19: Creating the Dimensional Model

Types of Database Keys

• Primary keys (PKs)

• Foreign keys (FKs)

• Composite keys

• Surrogate keys

Page 20: Creating the Dimensional Model

Using Surrogate Keys

Advantages of surrogate keys include:

• Control over data

• Reduced fact table size

Avoid using the following as data warehouse keys:

• OLTP Production (natural) keys

• Smart keys (embedded meaning)

Product key: 38972

Surrogate key

Page 21: Creating the Dimensional Model

Surrogate Keys Example

Emp_FK Salesperson_ID Salesperson_Name Manager_ID Emp_Change_Date

1 1 Smith 200 030199

2 2 Jones 300 050599

3 3 Harvey 300 060599

22 1 Smith 400 061001

Surrogate Keys

Prod_FK Prod_ID Prod_Name Prod_Grouping Brand_Code Prod_Change_Date

1 073258 Coffee Hot YUBN 032200

2 073258 Coffee Hot MAXH 110100

3 011172 Pop Cold SCHW 061001

4 011173 Tea Hot RRSE 061001

Page 22: Creating the Dimensional Model

Adding a Surrogate Key

Channel, product, promotion, and time surrogate keys are added.

These can be used to build aggregate tables.

Business Dimensionfor Sales

Sales_channel_FK(surrogate key)

Sales_promotion_FK (surrogate key)

Units

Sales_product_FK (surrogate_key)

Cost

Discount

Margin

Sales_amt

Sales_time

Page 23: Creating the Dimensional Model

Bracketed Dimensions

• Enhance performance and analytical capabilities

• Create groups of values for attributes with many unique values, such as income ranges and age brackets

• Minimize the need for full table scans by pre-aggregating data

Page 24: Creating the Dimensional Model

Bracketing Dimensions

Customer_PKBracket_FK

Bracket_PK

Customer_PKBracket_FK

Bracket dimension

Customer dimension

Income fact

Bracket_PK Income (10Ks) Marital Status Gender Age

1 60-90 Single Male <21

2 60-90 Single Male 21-35

3 60-90 Single Male 35-55

4 60-90 Single Male >55

5 60-90 Single Female <21

6 60-90 Single Female 21-35

Page 25: Creating the Dimensional Model

Models for Hierarchical Data

• Analytical activities using hierarchies are supported through different models:

– Business

– Multiple

– Multiple time

• Hierarchical data is stored in dimension tables.

• Dimensions can contain one or more hierarchies.

Page 26: Creating the Dimensional Model

Identifying Analytical Hierarchies

Store dimension

Store IDStore DescLocationSizeTypeDistrict IDDistrict DescRegion IDRegion Desc

Business hierarchies describe organizational structure and logical parent-child relationships within the data.

Region

District

Store

Organization hierarchy

Page 27: Creating the Dimensional Model

Multiple Hierarchies

Store IDStore DescLocationSizeTypeDistrict ID District DescRegion IDRegion DescCity IDCity DescCounty IDCounty DescState IDState Desc

Region

City

Store

Organization hierarchy

Store dimension

State

District

Store

Geography hierarchy

County

Page 28: Creating the Dimensional Model

Multiple Time Hierarchies

Fiscal year

Fiscal quarter

Fiscal month

Fiscal time hierarchy

Fiscal week

Calendar year

Calendar quarter

Calendar month

Calendar time hierarchy

Calendar week

Page 29: Creating the Dimensional Model

Store 5Store 1 Store 2

Region 2

District 2 District 4

Drilling Up and Drilling Down

Store 4

Group

Market Hierarchy

Region 1

District 1

Store 6Store 3

District 3

Page 30: Creating the Dimensional Model

Region

District

Drilling Across

Stores > 20,000 sq. ft.

Group

Market hierarchy

Region

District

Store Store City

City

City hierarchy

Page 31: Creating the Dimensional Model

Documenting the Granularity of Dimensions

• Is an important design consideration

• Determines the level of detail

• Is determined by business needs

Low-level grain (Transaction-level data)

High-level grain (Summary data)

Page 32: Creating the Dimensional Model

Defining Time Granularity

Fiscal Time Hierarchy

Current dimension grain

Fiscal Year

Fiscal Quarter

Fiscal Month

Fiscal Week

Day Future dimension grain

Page 33: Creating the Dimensional Model

History_PK

. . . .

A Star Dimensional Model

Denormalized Model

Customer

HistoryOrder

History_FKCustomer_FKProduct_FKChannel_FK

Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt…

Product

Channel

Channel_PK

. . . .

Customer_PK

. . . .

Product_PK

. . . .

Page 34: Creating the Dimensional Model

Star Dimensional Model Characteristics

• The model is easy for users to understand.

• Primary keys represent a dimension.

• Nonforeign key columns are values.

• Facts are usually highly normalized.

• Dimensions are completely denormalized.

• Fast response to queries is provided.

• Performance is improved by reducing table joins.

• End users can express complex queries.

• Support is provided by many front-end tools.

Page 35: Creating the Dimensional Model

Snowflake Model

Web

History_PK

. . . .

Customer

HistoryOrder

History_FKCustomer_FKProduct_FKChannel_FK

Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt…

Product

Channel

Channel_PK

Web_PKChannel_desc

Customer_PK

. . . .

Product_PK

. . . .

Web_PK

Web_url

Page 36: Creating the Dimensional Model

Constellation Configuration

Atomic fact

Page 37: Creating the Dimensional Model

Updating the Meta Data

• Dimensions and attributes detail (Primary key, attribute definition, and so on)

• Facts and measures detail (Measure description, additivity, and so on)

• Data source definitions (Business owner, platform, description, and so on)

• Source to target data mappings (Data type, length, target column description, and so on)