SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

39
Blog:www.Rafael-Salas.com Email:[email protected] @RafSalas

description

Dimensional

Transcript of SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Page 1: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Blog:www.Rafael-Salas.com

Email:[email protected]

@RafSalas

Page 2: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

About Rafael

DW BI Professional– 12 years

SQL Server MVP – 4 years

Architect/Consultant @ Quaero, CSG

Systems

Live in Charlotte, NC

Page 3: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Quaero is Hiring! DB Engineer

5+ years of database support

Expertise on SQL Server 2005 and 2008 database environment is a must

Expertise on ETL skills including SSIS packages, stored procedures and T-SQL.

Ability to work directly and effectively with clients.

Experience working in complex production database environments

Experience in implementing data hygiene and customer matching routine is plus.

Excellent written and verbal communication skills

Experience in scripting language and XML a plus.

[email protected]

Page 4: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Agenda

The Stage: Kimball‟s Data Warehouse

Lifecycle overview

Dimensional Modeling Basics

Dimensional Design Process: 4 steps

More About Dimension tables

More About Fact Tables

Page 5: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

What To Expect?

Introduction to dimensional modeling

concepts, terminology and design

guidelines

Not an advanced dimensional modeling

class

No demos, but lots of slides

Questions welcome at anytime

Page 6: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

The Stage: Kimball’s DW

Lifecycle

Kimball DW Lifecycle is one of the most

popular data warehousing

methodologies

First Lifecycle book published in 1996,

latest in 2010

Dimensional model or “star schema” is

today‟s dominant “theme” in leading BI

field

Page 7: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Kimball DW Lifecycle

Fundamentals

Enterprise data warehouse framework

Business Driven

Iterative approach

Dimensional Model for data delivery

Intuitive DB model to end users

Fast query performance

Page 8: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Dimensional Modeling in the DW

Lifecycle

Page 9: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Dimensional Modeling

Logical model design technique

Intuitive DB structures to end users

Fast query performance

Divides the world in

Facts

Dimensions

Also known as “Star Schema”

Page 10: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Reviewing Star Schema Benefits

Transforms normalized data into a simpler model

Delivers high-performance queries

SQL Server offers Star Join Query Optimization

Uses mature modeling techniques that are widely supported by many BI tools

Requires low maintenance as the data warehouse design evolves

Page 11: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Introducing the Star Schema

Page 12: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Facts

A measurement of a business event

Numeric values

Additive, semi-additives, non-additives

Normalized data structures

Page 13: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Fact Table Anatomy

Dimension keys (FKs)

Facts

Page 14: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Dimensions

Context of the facts

Descriptive attributes

Who, what, where, when, how…

Query Constraining and result set

labeling

Denormalized data structures

e.g., Geography, Customer, Time,

Product

Page 15: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Dimension Denormalization

Denormalization of

Customer

Page 16: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Before You Start Modeling

DW Bus Matrix

DW High level architecture

Page 17: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Dimensional Design Process: 4

steps

Business Requirements

• Bus Matrix

Data Reality

• Initial Data

Profiling

Step 1: Choose the business process

Step 2: Declare the grain

Step 3: Identify Dimensions

Step 4: Identify Facts

Page 18: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

High Level Dimensional Model

Grain = one row per General Ledger

Journal line

Applied

Date

P and L

Unit

Vendor

Client

GL

Account

Number

Record

ed Date

GL Journal

Line

GL

Transacti

on Detail

= Fact

= Dimension

GL Main

Account

Period

Ending

Date

P and L

Unit

Vendor

Client

GL

Account

Number

GL Balance

Grain = one row per GL Account per

budget period

Page 19: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Detailed Dimensional Model

Page 20: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

More About Dimensions

Surrogate Keys

Conformed Dimensions

Slowly Changing Dimensions (SCD)

Role-Playing Dimensions

Date Dimension

Page 21: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Surrogate keys

“A meaningless key, ideally integer number, to be used as

the primary key of dimensions”

Better query performance

Creating row versioning is easier

No risk of key collision for multi-source DW

Avoid overhead of using transactional keys

Flexibility when inserting pre-defined rows

Page 22: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Conformed Dimensions

Shared dimensions across the enterprise

Deliver a consistent interpretation for all business process involved

Allow for drill across fact tables

ETL work is done only once

Applied

Date

P and L

Unit

Vendor

Client

GL

Account

Number

Record

ed Date

GL Journal

Line

GL

Transacti

on Detail

GL

Main

Account

Period

Ending

Date

P and L

Unit

Vendor

Client GL

Account

Number

GL Balance

Page 23: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Slowly Changing Dimensions (SCD)

How do the dimensions have to

respond to data changes?

Common types SCD Type 1

SCD Type 2

SCD Type 3

SCD Type 6

Page 24: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Slowly Changing Dimensions

(SCD) Type 1

Override previous value

Best when tracking history is not required

1 row per natural key

Simplest approach for handling data

changes

Insert…else…update

SQL Server 2008 T-SQL 'Merge‟

SSIS SCD Transformation

Page 25: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Slowly Changing Dimensions

(SCD) Type 1

Customer Key Customer

Code Customer First

Name Customer Last

Name ETL Insert

Date ETL Update

Date

12345 YFG-FDS Jane Ross 02/24/2008

Customer Dimension

Last name changes

Customer Key Customer Code

Customer First

Name Customer Last

Name ETL Insert

Date ETL Update

Date

12345 YFG-FDS Jane Smith 02/24/2008 09/09/2008

Existing row is updated!

Page 26: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Slowly Changing Dimensions

(SCD) Type 2

Insert a new row

Best for tracking changes in attribute values

Use effective dates to represent row lifespan

If row does not exists then insert …else

expire current version and insert new one.

Page 27: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Slowly Changing Dimensions

(SCD) Type 2

Customer Dimension

A new row is inserted!

Existing row is expired!

Customer

Key Customer

Code Customer First

Name Customer

Last Name Start Date End Date Current

row

12345 YFG-FDS Jane Ross 02/24/2008 12/31/2099 Y

Customer Dimension

Last name change

Customer

Key Customer

Code Customer First

Name Customer

Last Name Start Date End Date Current

row

12345 YFG-FDS Jane Ross 02/24/2008 09/08/2008 N

67843 YFG-FDS Jane Smith 09/09/2008 12/31/2099 Y

Page 28: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Role-Playing Dimensions

Same physical dimension plays distinct

logical roles in a fact table

Implemented through views or query aliases

Date Dimension

playing 4 roles

Page 29: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Date Dimension

Grain should not be lower than daily

Hour: 8,736 rows per year

Minute: 525,600 rows per year

Second: A way too many…

Surrogate key rule exception: intelligent

key is recommended (integer value:

20081011)

Time of day, if required, in fact table

(most cases)

Page 30: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

More about Facts

3 Type of fact tables:

Transaction

Periodic snapshot

Accumulating snapshots

Page 31: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Transaction Fact Tables

Records events in a point in

time

Represent transaction

activity

The most common type of

fact tables

Only inserts (most cases)

Store facts at the most atomic level possible

Page 32: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Periodic Snapshot Fact Tables „Snapshots‟ taken in a

regular basis

regardless of activity

Stores 1 row per time

period

Complement of

transactional fact tables

Only Inserts (most

cases)

Page 33: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Accumulating Snapshot Fact

Tables

Captures activity for processes with defined beginning and end

1 row per event lifetime

Fact row is updated at each milestone

Least frequently used Fact table type

Page 34: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Accumulating Snapshot Fact

Tables Appl.Key Start Date Complete

date

Transm.Da

te

Process

date

1 20080215 -1 -1 -1

Appl.Key Start Date Complete

date

Transm.Da

te

Process

date

1 20080215 20080217 20080217 -1

Appl.Key Start Date Complete

date

Transm.Da

te

Process

date

1 20080215 20080217 20080217 20080219

Insert

Update

Update

T1

T2

T3

Page 35: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Dimensional Modeling Myths

It fits only as departmental solution

Limited extensibility potential

It only provides aggregated data

It only supports many-to-one

relationships

It is waste of disk space

Page 36: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Risks

High Profile Success (and failure!) is visible to Management

Business Driven Hard for technologists

Technology Focus Let‟s build it and users will come

Dashboards not a good starting point

Data Quality and integration

Complexity Tackling too much at once

Page 37: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

SQL Server and Dimensional

modeling

SSAS

SSIS

SCD transformation ETL

Relational Engine

T-SQL Merge ETL

Start join optimization Query

performance

Page 38: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Want to learn more?

Kimball Method:

The Data Warehouse Lifecycle Toolkit. 2nd

edition. 2008

Dimensional Modeling advanced

techniques

The Data Warehouse Toolkit. 2nd edition.

2002

SQL Server 2008 BI/DW:

www.microsoft.com/bi/

Page 39: SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Blog:www.Rafael-Salas.com

Email:[email protected]

@RafSalas