Agenda
Introduction
DWH Definitions
DWH Architecture
DWH Design Process
Types of Fact Tables
Types of Dimensions
Types of Data Marts
Introduction
Information is a very powerful asset that can provide significant benefits to any organization.
Organizations have vast amounts of data but is difficult to access and use.
Data is in many formats, exists on different platforms, and resides in different file and database structures.
Introduction
In order to make a valuable decisions, an organization has to write hundreds of programs to extract, prepare, and integrate data for analysis and reporting.
Instead of doing that you need to implement data warehouse and BI system to get more insights from the data you own.
As BI tools help you do extract, transform, load and integrate heterogeneous data sources into your DWH easily and efficiently.
Data Warehouse Definitions
Bill Inmon Definition
Data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process.
Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.
Ralph Kimball Definition
A data warehouse is a copy of transaction data specifically structured for query and analysis.
Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in dimensional model.
Data Warehouse Bill Inmon
Explained
Subject-Oriented A data warehouse is used to analyze a particular subject
area.
Integrated A data warehouse integrates data from multiple data
sources.
Time-Variant Historical data is kept in a data warehouse.
Non-volatile Once data is in the data warehouse, it will not change.
Historical data in a data warehouse should never be altered.
Data Warehouse vs. Database
Database Data Warehouse
Application Oriented Subject Oriented
Entity R Diagram Star/Snowflake Schema
Thousands of rows Millions/Billions
MB to GB 10’s of GB/TB’s
Transaction Throughput Response Time
Detailed Summarized
Operational Processing Informational Processing
Data Warehouse Stages
Stage 1
Offline
Operational
Databases
Stage 2
Offline Data
Warehouse
Stage 3
Real Time Data
Warehouse
Stage 4
Integrated Data
Warehouse
Data Warehouse Design
Process
Identify Subject Area of Interest
Indentify the Dimensions of this SA
Identify the Key Performance Indicators
“KPIs” and Measures of this SA
Data Warehouse Modeling
Concepts
Dimension A category of information.
○ Time dimension, Product Dimension.
Attribute A unique level within a dimension.
○ Month is an attribute in the Time Dimension.
Hierarchy The specification of levels that represents relationship between
different attributes within a dimension. ○ Year → Quarter → Month → Day.
Fact Table a table that contains the measures of interest, along with FK of
Dimension Tables connected to the fact. ○ sales amount would be such a measure.
Data Warehouses Data Models -
RK Star Schema
Star schema design where the fact table sits in the middle
and is connected to the dimension lookup tables like a star.
Each dimension is represented as a single table.
Fact Table
Time
Dimension
Customer
Dimension
Product
Dimension
Store
Dimension
Data Warehouses Data Models -
RK Snowflake Schema
Time Dimension that consists of 2 different hierarchies:
1. Year → Month → Day
2. Week → Day
Snowflake Schema is the normalized version of Star
Schema.
Fact Table
Time
Dimension
Customer
Dimension
Product
Dimension
Store
Dimension
Types of Measures
Additive are facts that can be summed up through all of the
dimensions in the fact table.
Semi-Additive are facts that can be summed up for some of the
dimensions in the fact table, but not the others.
Non-Additive are facts that cannot be summed up for any of the
dimensions present in the fact table.
Types of Fact Table
Cumulative describes what has happened over a period of time.
Snapshot describes the state of things in a particular instance of
time.
Factless Fact Table is a fact table that does not have any measures
Type of Dimensions
Conformed Dimension is a dimension that has exactly the same meaning and content when
being referred from different fact tables.
Junk Dimension A junk dimension is a single table with a combination of different and
unrelated attributes to avoid having a large number of foreign keys in the
fact table.
Role Playing Dimension
Slowly Changing Dimensions this applies to cases where the attribute for a record of the dimension
varies over time.
Rapidly Changing Dimensions A dimension attribute that changes frequently is a Rapidly Changing
Attribute.
Types of Loading Dimension
Tables Conventional (Slow)
All the constraints and keys are validated against the data before, it is
loaded, this way data integrity is maintained.
Direct (Fast) All the constraints and keys are disabled before the data is loaded. Once
data is loaded, it is validated against all the constraints and keys. If data
is found invalid or dirty it is not included in index and all future processes
are skipped on this data.
Types of Loading Dimension
Tables Conventional (Slow)
All the constraints and keys are validated against the data before, it is
loaded, this way data integrity is maintained.
Direct (Fast) All the constraints and keys are disabled before the data is loaded. Once
data is loaded, it is validated against all the constraints and keys. If data
is found invalid or dirty it is not included in index and all future processes
are skipped on this data.
Data Marts Types
Independent Data Mart Created from operational systems and have separate physical data-
store
Logical Data Mart Exists as a subset of data warehouse
Build over data warehouse logically
Dependent Data Mart Created from a data warehouse to a separate physical data-store
Build over data warehouse physically
Data ModelingConceptual, Logical, And Physical Data Models
Conceptual Model identifies the highest-level relationships between the different
entities.
Logical Model describes the data in as much detail as possible, without regard
to how they will be physical implemented in the database.
Physical Model represents how the model will be built in the database.
Top Related