Post on 29-Mar-2015
description
Principles Of Dimensional Modeling
Design Requirements
Design of the DW must directly reflect the way the managers look at the business
Should capture the measurements of importance along with parameters by which these parameters are viewed
It must facilitate data analysis, i.e., answering business questions
3
What is Dimensional Modeling (DM)?
DM is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access.
Can be implemented using a relational or a multidimensional DBMS, with some restrictions.
It is different from ER modeling Every dimensional model is composed of one table with a
multipart key, called the fact table, and a set of smaller tables called dimension tables.
Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table.
This characteristic "star-like" structure is often called a star schema.
ER Modeling
A logical design technique that seeks to eliminate data redundancy
Illuminates the microscopic relationships among data elements
Perfect for OLTP systems Responsible for success of transaction processing in
Relational Databases
Problems with ER Model
ER models are NOT suitable for DW? End user cannot understand or remember an ER
Model Many DWs have failed because of overly complex ER
designs Not optimized for complex, ad-hoc queries Data retrieval becomes difficult due to normalization Browsing becomes difficult
ER vs Dimensional Modeling
ER models are constituted to Remove redundant data (normalization) Facilitate retrieval of individual records having
certain critical identifiers Thereby optimizing OLTP performance
Dimensional model supports the reporting and analytical needs of a data warehouse system.
7
Comparison between the ER & Dimensional Model
Dimensional Modeling ER Model
Support adhoc querying for business analyses and complex analyses
Support for OLTP
The data model is multidimensional
The data model has two dimensions
It is asymmetric It is symmetric
Permit redundancy Removes redundancy
It is extensible, application is not changed
If the model is modified ,applications are modified
It can be done independent of expected query patterns
It is variable in structure and very vulnerable to changes in the user’s querying habits
Easy and understandable Hard for people to visualize
Models the business practically Models the micro relationships among data elements
8
Dimension Modeling Concepts
Design goals :user understanability,Query performance,resilience to change
Components of DM: Fact Tables Dimension Tables
9
Inside Dimension table
Dimensional table key Large no. of attributes Textual attributes Attributes not directly related Flattened table,not normalized Ability to drill down/roll up Multiple hierarchies Less number of records
10
Inside Fact Table
Concatenated fact table key Grain/level of data identified Fully-additive-all dimensions Semi-additive-some dimensions Large no. of records Few attributes Sparsity of data Degenerate dimensions
11
Factless Fact Table
Some fact tables have no measured facts Useful to describe events and
coverage ,tables contain information that something has/has not happened
Often used to represent many-to-many relationships
The only thing they contain is concatenated key
12
Star Schema keys
Primary keys Surrogate keys Foreign keys
Modeling Design Process
1. Identify the Business Process Source of “measurements”
2. Identify the Grain What does 1 row in the fact table represent
or mean?
3. Identify the Dimensions Descriptive context, true to the grain
4. Identify the Facts Numeric additive measurements, true to the
grain
Step 1 - Identify the Business Process
This is a business activity typically tied to a source system.
Not to be confused with a business department or function. An Orders dimensional model should support the activities of both Sales and Marketing.
“If we establish departmentally bound dimensional models, we’ll inevitably duplicate data with different labels and terminology.”
Step 2 - Identify the Grain
The level of detail associated with the fact table measurements.
A critical step necessary before steps 3 and 4. Preferably it should be at the most atomic
level possible. “How do you describe a single row in the fact
table?”
Step 3 - Identify the Dimensions
The list of all the discrete, text-like attributes that emanate from the fact table.
They are the “by” words used to describe the requirements.
Each dimension could be though of as an analytical “entry point” to the facts.
“How do business people describe the data that results from the business process?”
Step 4 - Identify the Facts
Must be true to the grain defined in step 2. Typical facts are numeric additive figures. Facts that belong to a different grain belong in
a separate fact table. Facts are determined by answering the
question, “What are we measuring?” Percentages and ratios, such as gross margin,
are non-additive. The numerator and denominator should be stored in the fact table.
18
Advantages of star schema
Easy for users to understand Optimizes navigation Most suitable for query processing
19
DM:Advanced Topics
Slowly Changing dimensionsType 1 changes: Correction of errorsIs used when
the old value of the attribute has no significance or can be discarded.
Easy and Fast
Type 2 changes: preservation history Partitions history so that fact tables properly
reflect original values. Requires use of Surrogate Keys Causes table growth due to additional history rows Users must be aware of the added complexity Effective Dates used secondary to cleaner fact joins
20
Type 3 changes: tentative soft revisions Additional attribute used to capture changes.
Used less frequently then Type 1 or 2. Relate to tentative changes in the source systems. Used to compare performances. Ability to track forward and backward
21
Large Dimensions Rapidly changing dimensions
22
Snowflake Schema
Snowflaking is a method of normalizing the dimension tables in STAR schema
Advantages: Small savings in storage space Normalized structures are easier to update and maintainDisadvantage: Schema less intuitive and end users are put off by
complexity Ability to browse through the contents difficult Degraded query performance because of additional joins
23
Star Schema
24
Flattened Star
25CSE 5331/7331
F'07
Normalized Star
26CSE 5331/7331
F'07
Snowflake Schema
27
Snowflake Schema
Star Schema
Joins: Higher number of Joins Fewer Joins
Ease of Use:
More complex queries and hence less easy to understand
Less no. of foreign keys and hence lesser query execution time
Query Performance:More foreign keys-and hence more query execution time
Less no. of foreign keys and hence lesser query execution time
Ease of maintenance/change:
No redundancy and hence more easy to maintain and change
Has redundant data and hence less easy to maintain/change
Type of Data warehouse:
Good to use for small data warehouses/datamarts
Good for large data warehouses
Dimension table:
It may have more than one dimension table for each dimension
Contains only single dimension table for each dimension
DimTable Normalization:
3 Normal Form2 Normal Denormalized Form
28
Fact Constellation schema
It is shaped like constellation of stars For each star schema or snowflake schema it is possible to
construct a fact constellation schema This schema is more complex than star or snowflake architecture,
which is because it contains multiple fact tables allows dimension tables to be shared amongst many fact tables. solution is very flexible, however it may be hard to manage and
support. The main disadvantage of the fact constellation schema is a more
complicated design because many variants of aggregation must be considered
Different fact tables are explicitly assigned to the dimensions, which are for given facts relevant. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level.
29
Dimensional Model Star Schema
30
Snow-Flake Schema in Dimensional Modeling
31
Fact Constellation Schema