Techniques, Tips, Mistakes & Lessons Learntweb/@… ·  · 2015-06-05Techniques, Tips, Mistakes &...

57
Data Warehouse Design @ UOW Data Warehouse Design @ UOW Techniques, Tips, Mistakes & Lessons Learnt Techniques, Tips, Mistakes & Lessons Learnt

Transcript of Techniques, Tips, Mistakes & Lessons Learntweb/@… ·  · 2015-06-05Techniques, Tips, Mistakes &...

Data Warehouse Design @ UOWData Warehouse Design @ UOW

Techniques, Tips, Mistakes & Lessons Learnt

Techniques, Tips, Mistakes & Lessons Learnt

AgendaAgenda

Background of the Performance Indicators Team

Techniques & Technical Design Process

Mistakes & Lessons Learnt

Background of the Performance Indicators Team

Techniques & Technical Design Process

Mistakes & Lessons Learnt

BackgroundBackgroundThe Performance Indicators Project (PIP) was formed in October 2006

Vision:

A transformed University of Wollongong

that gives all decision-makers access to

accurate, relevant

and shared information,

in a quick and secure manner,

that allows them to

plan, monitor, analyse & manage

the performance of the university

The Performance Indicators Project (PIP) was formed in October 2006

Vision:

A transformed University of Wollongong

that gives all decision-makers access to

accurate, relevant

and shared information,

in a quick and secure manner,

that allows them to

plan, monitor, analyse & manage

the performance of the university

Techniques & Technical Design Process

Techniques & Technical Design Process

Technical Design Process

Techniques adopted at UOW

Technical Design Process

Techniques adopted at UOW

Technical Design ProcessTechnical Design ProcessReview Business RequirementsReview Business Requirements

Technical Design ProcessTechnical Design ProcessReview Business Requirements

Gain Access to source systems

Research source system

Create Source ER Diagram

Review Business Requirements

Gain Access to source systems

Research source system

Create Source ER Diagram

Technical Design ProcessTechnical Design ProcessReview Business Requirements

Gain Access to source systems

Research source system

Create Source ER Diagram

Create Logical Dimensional Model

Review Business Requirements

Gain Access to source systems

Research source system

Create Source ER Diagram

Create Logical Dimensional Model

Logical Dimensional ModelLogical Dimensional ModelResearch Publications Data Mart

Technical Design ProcessTechnical Design ProcessReview Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Review Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Technical Design ProcessTechnical Design ProcessReview Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Document Cube Model

Review Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Document Cube Model

Document Cube ModelDocument Cube Model

Technical Design ProcessTechnical Design ProcessReview Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Document Cube Model

Create Metrics Dictionary

Review Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Document Cube Model

Create Metrics Dictionary

Metrics DictionaryMetrics Dictionary

Technical Design ProcessTechnical Design ProcessReview Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Document Transformer Model

Create Metrics Dictionary

Create Business Glossary

Review Business Requirements

Gain Access to source systems

Research source system

Document Source ER Diagram

Document Logical Dimensional Model

Document Physical Dimensional Model

Document Transformer Model

Create Metrics Dictionary

Create Business Glossary

Business GlossaryBusiness Glossary

Start Development!!!Start Development!!!

Now that all the paper work is done we can finally start developmentNow that all the paper work is done we can finally start development

Techniques Adopted at UOWTechniques Adopted at UOW

DimensionsConformed DimensionsOutrigger tables

FactsTypes of Fact tablesGrainBridge Tables

DimensionsConformed DimensionsOutrigger tables

FactsTypes of Fact tablesGrainBridge Tables

DimensionsDimensions

Represent characteristics of an objectSurrogate KeyBusiness KeysDescriptorsHierarchies & RollupsOther Attributes

Represent characteristics of an objectSurrogate KeyBusiness KeysDescriptorsHierarchies & RollupsOther Attributes

Student Key (PK)

Student Number (BK)

Student Surname

Student First Name

Conformed DimensionsConformed Dimensions

One copy of a dimension shared across subject areas

One copy of a dimension shared across subject areas

Organisational Structure Dimension

Finance Facts

Research Facts

Staff Facts

Student Facts

Conformed DimensionsConformed Dimensions

ChallengeMultiple source systems using different keys to represent the same thing

SolutionWear the pain – Mapping filesBusiness buy in – Sell the advantages

ChallengeMultiple source systems using different keys to represent the same thing

SolutionWear the pain – Mapping filesBusiness buy in – Sell the advantages

How to Avoid Snowflaking Dimensions

How to Avoid Snowflaking Dimensions

At times there will be a logical relationship between dimensions that may cause a star schema to snowflake

At UOW we had the example of the relation between Organisational Structure Dimension and Cost Centre Dimension

At times there will be a logical relationship between dimensions that may cause a star schema to snowflake

At UOW we had the example of the relation between Organisational Structure Dimension and Cost Centre Dimension

The Solution - Outrigger TablesThe Solution - Outrigger Tables

Cost Centre Dimension

Organisation Structure Dimension

Financial Facts

Cost Centre Dimension Organisation Structure Dimension

Financial Facts

Fact TablesFact Tables

Fact tables (in star schemas) typically hold a heap of surrogate keys joining back to dimensions with numerical data representing some type of measurement

Fact tables (in star schemas) typically hold a heap of surrogate keys joining back to dimensions with numerical data representing some type of measurement

Date (FK)

Student key (FK)

Subject key (FK)

EFTSL

Types of Fact tablesTypes of Fact tables

TransactionNumber of publications

Periodic SnapshotMonthly FTE

Accumulating SnapshotNot currently used at UOW

TransactionNumber of publications

Periodic SnapshotMonthly FTE

Accumulating SnapshotNot currently used at UOW

Grain of Fact tablesGrain of Fact tables

Need to clearly Identify the level of detail within the fact tableOne grain per fact table at the lowest level of detail for flexibility

Need to clearly Identify the level of detail within the fact tableOne grain per fact table at the lowest level of detail for flexibility

The problem with grain?The problem with grain?

ScenarioThe lowest atomic level of detail needs to be broken up even further

Examplea single publication can be broken down even further to author percentages for a publication

ScenarioThe lowest atomic level of detail needs to be broken up even further

Examplea single publication can be broken down even further to author percentages for a publication

Methods to Resolve GrainMethods to Resolve Grain

Create a bridging tableRatio the FactCreate a bridging tableRatio the Fact

Bridging TablesBridging Tables

Publications Fact

Date

Publication Key

Organisational Key

Number of Publications

Bridge Table

Publication Key

Author Key

Author Percentage

Author Dimension

Author Key

Author Name

Author DOB

Ratio FactsRatio Facts

Publications Fact

Date

Publication Key

Organisational Key

Author Key

Author Percentage

Number of Publications

Author Dimension

Author Key

Author Name

Author DOB

How Does it Change the Grain?How Does it Change the Grain?

Date Publication key Organisation key Number of Publications

01/01/08 128 WFACEDU100 1

Date Publication key

Organisation key

Author key

Author Percent

Number of Publications

01/01/08 128 WFACEDU100 103456 30 .3

01/01/08 128 WFACEDU100 9421 50 .5

01/01/08 128 WFACEDU100 89632 20 .2

UOW Data Warehousing …The Mistakes

UOW Data Warehousing …The Mistakes

Date DimensionDate Dimension

An important dimension to get correct – it will be used in every data mart in data warehouse

Almost everything we want to measure or record will have a date associated with it

An important dimension to get correct – it will be used in every data mart in data warehouse

Almost everything we want to measure or record will have a date associated with it

The Date Dimension is a database table in the data warehouse that allows you to “Roll Up” Facts via a Date Hierarchy

The Date Dimension is a database table in the data warehouse that allows you to “Roll Up” Facts via a Date Hierarchy

What is The Date Dimension?What is The Date Dimension?

What is The Date Dimension?What is The Date Dimension?

Simplified Date Dimension

What is The Date Dimension?What is The Date Dimension?

1..1

1..n

Date Dimension

Publications FactDate Publication

SkeyStaff SkeyOrg Unit Skey

01/01/2008

13901/01/2008

01/01/2008

239

349

Publication Count

1

1

1

Dest Points Count

0

0

1

Date Dimension UsageDate Dimension Usage

The ProblemThe Problem

Date Publication SkeyStaff SkeyOrg Unit Skey

01/01/2008

13901/01/2008

239

349

Publication Count

1

1

1

Dest Points Count

0

0

1

?

• Not all fact data always has a valid date

• Sometimes we want to report on facts that have no date

UOW’s MistakeUOW’s Mistake

We designed our date dimension and fact tables with a DATE data type as the joining key

We designed our date dimension and fact tables with a DATE data type as the joining key

1..1

1..n

Date Dimension

Publications Fact

We have no way of representing unknown dates within our date hierarchy

Since our date dimension must join via a valid date, and we must report on records with no valid date, we are forced pick a date with business logic to represent these situations

For example 01/01/1950 is the date we have used for Unknown Dates

We have no way of representing unknown dates within our date hierarchy

Since our date dimension must join via a valid date, and we must report on records with no valid date, we are forced pick a date with business logic to represent these situations

For example 01/01/1950 is the date we have used for Unknown Dates

UOW’s MistakeUOW’s Mistake

Design the dimension with the flexibility to allow non-date records for representing the exceptions

Every dimension should be able to represent:

UnknownNot Yet DeterminedNot Applicable

Design the dimension with the flexibility to allow non-date records for representing the exceptions

Every dimension should be able to represent:

UnknownNot Yet DeterminedNot Applicable

The SolutionThe Solution

Date Dimension

Publications Fact

1..1

1..n

The SolutionThe Solution

Date YearQuarterMonth

2008Quarter 1January20080101

INTEGER VARCHAR2 VARCHAR2 VARCHAR2

-1 UnknownUnknownUnknown

-2 Not Yet Determined

Not Yet Determined

Not Yet Determined

-3 Not ApplicableNot ApplicableNot Applicable

Date Skey

4

INTEGER

1

2

3

Design the dimension with the flexibility to represent the exceptions

Every dimension should be able to represent:

UnknownNot Yet DeterminedNot Applicable

Design the dimension with the flexibility to represent the exceptions

Every dimension should be able to represent:

UnknownNot Yet DeterminedNot Applicable

Lesson LearntLesson Learnt

Many issues to resolve in creating conformed dimensions with clean hierarchies

Unbalanced HierarchiesHierarchies with infinite possible levelsFact data at different levels of hierarchyFact data which could be at any level of hierarchy

Many issues to resolve in creating conformed dimensions with clean hierarchies

Unbalanced HierarchiesHierarchies with infinite possible levelsFact data at different levels of hierarchyFact data which could be at any level of hierarchy

Issues With HierarchiesIssues With Hierarchies

The ProblemThe Problem

Different source systems don’t always implement the same hierarchy exactly the same way

Different source systems don’t always implement the same hierarchy exactly the same way

The ProblemThe Problem

Fact Data at different levels of a hierarchyFact Data at different levels of a hierarchy

Publications Fact Data

RFCD Dimension

The ProblemThe Problem

Facts data that could be at any level of a hierarchyFacts data that could be at any level of a hierarchy

RFCD Dimension

UOW’s MistakeUOW’s Mistake

Designed RFCD Dimension for Publications data mart without considering how it could be used in the future, and how different systems could implement this hierarchy

Forced to totally redesign and reimplimentdimension in order to be conformed across new data marts

Designed RFCD Dimension for Publications data mart without considering how it could be used in the future, and how different systems could implement this hierarchy

Forced to totally redesign and reimplimentdimension in order to be conformed across new data marts

The SolutionThe Solution

Division Code Discipline DescDiscipline CodeDivision Desc

Genetics270200Bio Sciences270000

Subject Code

270201

Subject Desc

Gene Expression

RFCD Skey

1

• “Dense” Balance the hierarchy

Division Code Discipline DescDiscipline CodeDivision Desc

L1_270000

GeneticsL2_270200Bio SciencesL1_270000

GeneticsL2_270200Bio Sciences

Unknown Bio SciencesL2_270000Bio Sciences

Subject Code

270201

270200

270000

Subject Desc

Gene Expression

Unknown Bio Sciences

Unknown Genetics

L1_270000

RFCD Skey

1

3

2

The SolutionThe Solution

Facts joining at different levels of a hierarchyFacts joining at different levels of a hierarchy

Publications Fact Data

RFCD Dimension

The SolutionThe Solution

Facts joining at any level of a hierarchyFacts joining at any level of a hierarchy

RFCD Dimension

Lesson LearntLesson Learnt

Don’t design a dimension without considering how it might be used by other data marts in the future

Know your data - anticipate issues with systems implementing common hierarchies in different ways

Don’t design a dimension without considering how it might be used by other data marts in the future

Know your data - anticipate issues with systems implementing common hierarchies in different ways

Questions?Questions?

Any Questions?

Contact information:

Brad Dixon – [email protected] Thomas – [email protected]

Any Questions?

Contact information:

Brad Dixon – [email protected] Thomas – [email protected]