01 Presentation DWH

89
Lnt Infotech Use Only An Introduction An Introduction to to Data Warehousing Data Warehousing

Transcript of 01 Presentation DWH

Page 1: 01 Presentation DWH

Lnt Infotech Use Only

An Introduction An Introduction to to

Data WarehousingData Warehousing

Page 2: 01 Presentation DWH

Lnt Infotech Use Only

Objectives

• Data Warehouse Overview• Data Warehouse ,OLTP & ODS• Data Warehouse Architecture • Data Models in Data Warehousing• Slowly changing dimensions• Surrogate Keys

Page 3: 01 Presentation DWH

Lnt Infotech Use Only

A producer wants to know….

Which are our lowest/highest margin

customers ?

Which are our lowest/highest margin

customers ?

Who are my customers and what products are they buying?

Who are my customers and what products are they buying?

Which customers are most likely to go to the competition ?

Which customers are most likely to go to the competition ?

What impact will new products/services

have on revenue and margins?

What impact will new products/services

have on revenue and margins?

What product prom--otions have the biggest

impact on revenue?

What product prom--otions have the biggest

impact on revenue?

What is the most effective distribution

channel?

What is the most effective distribution

channel?

Page 4: 01 Presentation DWH

Lnt Infotech Use Only

Data, Data everywhereyet ...

• I can’t find the data I need– data is scattered over the network– many versions, subtle differences

• I can’t get the data I need– need an expert to get the data

• I can’t understand the data I found– available data poorly documented

• I can’t use the data I found– results are unexpected– data needs to be transformed from one form to other

Page 5: 01 Presentation DWH

Lnt Infotech Use Only

What are the users saying...

• Data should be integrated across the enterprise

• Summary data has a real value to the organization

• Historical data holds the key to understanding data over time

• What-if capabilities are required

Page 6: 01 Presentation DWH

Lnt Infotech Use Only

Data Warehouse

• A data warehouse is a Subject-oriented Integrated Time-varying Non-volatile

collection of data that is used primarily in organizational decision making

-- Bill Inmon, Building the Data Warehouse 1996

Page 7: 01 Presentation DWH

Lnt Infotech Use Only

What is Data Warehousing

A process of transforming data into information and making it available to users in a timely enough manner to make a difference

[Forrester Research, April 1996]Data

Information

Page 8: 01 Presentation DWH

Lnt Infotech Use Only

Need for Data Warehousing

• Better business intelligence for end-users• Reduction in time to locate, access, and analyze

information• Consolidation of disparate information sources• Strategic advantage over competitors• Faster time-to-market for products and services• Replacement of older, less-responsive decision

support systems• Reduction in demand on IS to generate reports

Page 9: 01 Presentation DWH

Lnt Infotech Use Only

Evolution of Data Warehousing

1960 - 1985 : MIS Era

• Unfriendly

• Slow

• Dependent on IS programmers

• Inflexible

• Analysis limited to defined reports

Focus on ReportingFocus on Reporting

Page 10: 01 Presentation DWH

Lnt Infotech Use Only

Evolution of Data Warehousing

1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data

• SQL as interface not scalable

• Cannot handle complex analysis

Focus on Online QueryingFocus on Online Querying

Page 11: 01 Presentation DWH

Lnt Infotech Use Only

Evolution of Data Warehousing

1990 - 20xx : Analysis Era

• Trend Analysis

• What If ?

• Moving Averages

• Cross Dimensional Comparisons

• Statistical profiles

• Automated pattern and rule discovery

Focus on Online AnalysisFocus on Online Analysis

Page 12: 01 Presentation DWH

Lnt Infotech Use Only

Data Warehousing Concepts and Terms

Some terms that are of great importance in understanding of data warehousing concepts are

Operational Data : It is the data that is used to run a business. This data is what is typically stored, retrieved and updated by Online Transaction Processing (OLTP) system. Operational data is stored in a relational database, but may be stored in legacy, hierarchical or flat file formats as well.

Informational Data: It is stored in a format that makes analysis much easier. Analysis can be in the form of decision support(queries), report generation, executive information systems, and more in-depth statistical analysis. Informational data is created from the wealth of operational data that exists in the business. Informational data is what makes up a data warehouse.

Page 13: 01 Presentation DWH

Lnt Infotech Use Only

OLTP Systems Vs Data Warehouse

RememberBetween OLTP and Data Warehouse systems

users are different

data content is different,

data structures are different

hardware is different

Understanding The Differences Is The KeyUnderstanding The Differences Is The Key

Page 14: 01 Presentation DWH

Lnt Infotech Use Only

Capacity Planning

Pro

cessin

g P

ow

er

Time of day

Processing Load Peaks During the Beginning and End of DayProcessing Load Peaks During the Beginning and End of Day

Page 15: 01 Presentation DWH

Lnt Infotech Use Only

OLTP Vs Data Warehouse

Characteristic OLTP Data Warehouse

Orientation Transaction Analysis

Data Content Current values Summarized, Archived, Derived,

Usually historical values

Data Structure Optimized for transactions

Highly Normalized

Optimized for complex queries

Often De-Normalized

Page 16: 01 Presentation DWH

Lnt Infotech Use Only

OLTP Vs Data Warehouse

Characteristic OLTP Data Warehouse

Data Access Record at a time Data set at a time

Access Frequency Read/Update/Delete Read / Aggregate

Concurrent Users Many Few

Data Stability Dynamic Static until refreshed

Data Organization By Application By Subject

Usage Predictable, repetitive Adhoc, Heuristic

Support Day-to-day operations Managerial needs

Response time Few seconds Several seconds to minutes

Page 17: 01 Presentation DWH

Lnt Infotech Use Only

Do we need a separate database ?

• OLTP and data warehousing require two very differently configured systems

• Isolation of Production System from Business Intelligence System

• Significant and highly variable resource demands of the data warehouse

• Cost of disk space no longer a concern• Production systems not designed for query

processing

Page 18: 01 Presentation DWH

Lnt Infotech Use Only

Why Separate Data Warehouse?

Performance

special data organization, access methods, and implementation methods are needed to support multidimensional views and operations typical of OLAP

Complex OLAP queries would degrade performance for operational transactions

Concurrency control and recovery modes of OLTP are not compatible with OLAP analysis

Page 19: 01 Presentation DWH

Lnt Infotech Use Only

Why Separate Data Warehouse?

Function missing data: Decision support requires historical

data which operational DBs do not typically maintain

data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources: operational DBs, external sources

data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled.

Page 20: 01 Presentation DWH

Lnt Infotech Use Only

Operational Data Store - Definition

B

A

C

Operational

DSS

Data Warehouse

ODS

Page 21: 01 Presentation DWH

Lnt Infotech Use Only

Operational Data Store - Definition

A subject oriented, integrated,

volatile, current valued data store containing only corporate

detailed data

Data stored only for current period. Old

Data is either archived or moved to

Data Warehouse

Can I see credit report from

Accounts, Sales from

marketing and open order report from

order entry for this customer

Identical queries may give different results

at different times. Supports analysis requiring current

data

Data from multiple sources is integrated

for a subject

Page 22: 01 Presentation DWH

Lnt Infotech Use Only

Operational Data Store

• The ODS applies only to the world of operational systems.

• The ODS contains current valued and near current valued data.

• The ODS contains almost exclusively all detail data• The ODS requires a full function, update, record

oriented environment.

Page 23: 01 Presentation DWH

Lnt Infotech Use Only

Different kinds of Information Needs

• Current

• Recent

• Historical

• Current

• Recent

• Historical

Is this medicine available in stock

What are the tests this patient has completed so far

Has the incidence of Tuberculosis increased in last 5 years in Southern region

Page 24: 01 Presentation DWH

Lnt Infotech Use Only

OLTP Vs ODS Vs DWH

Characteristic OLTP ODS Data Warehouse

Audience Operating Personnel

Analysts Managers and analysts

Data access Individual records, transaction driven

Individual records, transaction or analysis driven

Set of records, analysis driven

Data content Current, real-time

Current and near-current

Historical

Data Structure Detailed Detailed and lightly summarized

Detailed and Summarized

Data organization

Functional Subject-oriented Subject-oriented

Type of Data Homogeneous Homogeneous Vast Supply of very heterogeneous data

Page 25: 01 Presentation DWH

Lnt Infotech Use Only

OLTP Vs ODS Vs DWH

Characteristic OLTP ODS Data Warehouse

Data redundancy Non-redundant within system; Unmanaged redundancy among systems

Somewhat redundant with operational databases

Managed redundancy

Data update Field by field Field by field Controlled batch

Database size Moderate Moderate Large to very large

Development Methodology

Requirements driven, structured

Data driven, somewhat evolutionary

Data driven, evolutionary

Philosophy Support day-to-day operation

Support day-to-day decisions & operational activities

Support managing the enterprise

Page 26: 01 Presentation DWH

Lnt Infotech Use OnlyFigure 3. Reasons for moving data outside the operations systems

•Different performance requirements

•Combine data from multiple applications

•Data is mostly non-volatile

•Data saved for a long time period

Order processing

•2 second response time

•Last 6 months orders

DataWarehouse•Last 5 years data

•Response time 2 secondsto 60 minutes

•Data is not modified

Product Price/inventory

•10 second response time

•Last 10 price changes

•Last 20 inventory transactions

Marketing

•30 second response time

•Last 2 years programs

Logical Transformation of operational data

Page 27: 01 Presentation DWH

Lnt Infotech Use Only

Logical Transformation of operational data

Figure 5. Data warehouse entities align with the business structure

•No data model restrictions of the source application

•Data warehouse model has business entities

DataWarehouse

Product Price/inventory

MarketingCustomerProfile

Productprice

Order processing

Available Inventory

Customerorders

Productprice

Marketing programs

Productprice

ProductInventory

Product Price changes

Customers

Products

Product Inventory

Product Price

Orders

Page 28: 01 Presentation DWH

Lnt Infotech Use Only

Logical Transformation of operational data

Figure 6. Transformation of the operational state information

• Operational state information is not carried to the data warehouse

• Data is transferred to the data warehouse after all state changes

Order ProcessingSystem Data

WarehouseDaily closed orders

Order

Up

Inventory

Dow

n

Weekly inventory snapshot

Editor:

Please add Open,Backorder, Shipped,Closed to the arrowaround the order

Inventory snapshot 1

Inventory snapshot 2

Orders (Closed)

Page 29: 01 Presentation DWH

Lnt Infotech Use Only

Advantages of Data Warehouse

• Time saving : The Warehouse has enabled employee to shift their time from collecting information to analyzing it & that helps the company make better business decisions.

• Efficiency : A DW provides, in one central repository, all the metrics necessary to support decision making throughout the queries & reports.

• Complete documentation : A typical DW objective is to store all the information including history

Page 30: 01 Presentation DWH

Lnt Infotech Use Only

Advantages of Data Warehouse

• Data Integration : Primary goal of all DW is to

integrate data because :

a) This is a primary deficiency in current decision

support systems.

b) Data content in one file is at a different level of

granularity than that in another file.

c) Same data in one file is updated at a different time

period than that in another file.

Page 31: 01 Presentation DWH

Lnt Infotech Use Only

Limitation of Data Warehouse

• High cost of building and on-going maintenance ($ 3 - 5 million).

• Complexity : Since it has to integrate all the data & transaction

system database and hence requires more time to design &

build (average DW requires approx. 3 years to implement).

• Answer to these limitations is Data Marts

Page 32: 01 Presentation DWH

Lnt Infotech Use Only

Data Marts

• Subject or Application Oriented Business View of Warehouse– Quick Solution to a specific Business Problem– Finance, Marketing, Sales etc.– Smaller amount of data used for Analytic Processing

A Logical Subset of The Complete Data WarehouseA Logical Subset of The Complete Data Warehouse

Page 33: 01 Presentation DWH

Lnt Infotech Use Only

Data Marts

Marketing Data Mart Finance Data Mart Sales Data Mart

Current Level of Detail

( Data Warehouse)

Page 34: 01 Presentation DWH

Lnt Infotech Use Only

Data Mart Appeal

What is the appeal of the Data Mart?

Why do departments find it convenient to do their decision support processing in their own data mart?

What is wrong with the data warehouse as a basis for standard decision support making?

There are several factors leading to the popularity of the data mart.

Page 35: 01 Presentation DWH

Lnt Infotech Use Only

Data Mart Appeal

As Data warehouses grow,

The competition to get inside the data warehouse grows fierce. More and more departmental decision support processing is done inside the data warehouse to the point where resource consumption becomes a real problem

Data becomes harder to customize

The cost of doing processing in the data warehouse increases as the volume of data increases

The department can build the data mart on its own budget, thereby making all the technological decision it wants

Page 36: 01 Presentation DWH

Lnt Infotech Use Only

Summary of Data Mart Appeal

• While DW was designed to manage bulk supply of

data from its suppliers(I.e. operational systems), and

to handle the organization and storage of this data,

the “retail stores” or “Data Marts” could be focussed

on packaging & presenting selections of data to end-

users, often to meet specialized needs.

Page 37: 01 Presentation DWH

Lnt Infotech Use Only

Data Warehouse and Data Mart

Data Warehouse Data Marts

Scope •Application Neutral•Centralized, Shared•enterprise

•Specific Application Requirement•department•Business Process Oriented

Data Perspective

•Historical Detailed data•Some summary

•Detailed (some history)•Summarized

Subjects •Multiple subject areas •Single Partial subject•Multiple partial subjects

Page 38: 01 Presentation DWH

Lnt Infotech Use Only

Data Warehouse and Data Mart

Data Warehouse Data Marts

Data Sources •Many•Operational/ External Data

•Few•Operational, external data

Implement Time Frame

•9-18 months for first stage•Multiple stage implementation

•4-12 months

Characteristics •Flexible, extensible•Durable/Strategic•Data orientation

•Restrictive, non extensible•Short life/tactical•Project Orientation

Page 39: 01 Presentation DWH

Lnt Infotech Use Only

Data Warehouses or Data Marts

For companies interested in changing their corporate cultures or

integrating separate departments, an enterprise wide approach makes

sense.

Companies that want a quick solution to a specific business

problem are better served by a standalone data mart.

Some companies opt to build a warehouse incrementally, data mart

by data mart.

A Logical Subset of The Complete Data WarehouseA Logical Subset of The Complete Data Warehouse

Page 40: 01 Presentation DWH

Lnt Infotech Use Only

Warehouse or Mart First ?

Data Warehouse First Data Mart first Expensive Relatively cheap

Large development cycle Delivered in < 6 months

Change management is difficult Easy to manage change

Difficult to obtain continuous corporate support

Can lead to independent and incompatible marts

Technical challenges in building large databases

Cleansing, transformation, modeling techniques may be incompatible

Page 41: 01 Presentation DWH

Lnt Infotech Use Only

Data Warehousing Model

Operational Data

Distributed data

External market data

ETL

Data Mining

DSS Tools Data Warehouse

OLAP Tools

Data Marts

Page 42: 01 Presentation DWH

Lnt Infotech Use Only

Typical Data Warehouse Architecture

OperationalSystems/Data

Select

Extract

Transform

Integrate

Maintain

Data Preparation

Middleware/API

Data Warehouse

Metadata

EIS /DSS

Query Tools

OLAP/ROLAP

Web Browsers

Data Mining

DataMarts

Multi-tiered Data Warehouse without ODSMulti-tiered Data Warehouse without ODS

Page 43: 01 Presentation DWH

Lnt Infotech Use Only

Typical Data Warehouse Architecture

OperationalSystems/Data

Select

Extract

Transform

Integrate

Maintain

Data Preparation

DataMarts

Data Warehouse

Metadata

ODS

Metadata

Select

Extract

Transform

Load

Data Preparation

Multi-tiered Data Warehouse with ODSMulti-tiered Data Warehouse with ODS

Page 44: 01 Presentation DWH

Lnt Infotech Use Only

Application of Data Warehousing

• OLAP

• Data Mining

Page 45: 01 Presentation DWH

Lnt Infotech Use Only

Commonly used Terms in OLAP

Measure: The entity in numeric figure that tells about the business.

Dimension: A category of information that describes the measure. For e.g The time dimension.

Attribute: A unique level within a dimension, For e.g Month is an attribute within the time dimension.

Hierarchy: The specification of levels that represents relationship between different attributes within a hierarchy. For example: one possible hierarchy in the Time dimension is

Year-- Quarter--Month--Day

Page 46: 01 Presentation DWH

Lnt Infotech Use Only

This is a common use of Data warehouse that involves real time access and analysis of multi-dimensional data such as sales information.

The term OLAP has been invented in the recent years to represent the opposite of OLTP(Online Transaction Processing System). Key characteristics of OLAP include

• Large data volumes

• Drill down along many dimensions

• Dynamic viewing and analysis of the data from a wide variety of perspectives and through complex formulae

OLAP : Online Analytical Processing

Page 47: 01 Presentation DWH

Lnt Infotech Use Only

Online Analytical Processing

OLAP EXAMPLE:

An example OLAP database may be comprised of sales data which has been aggregated by region, product type, and sales channel. A typical OLAP query might access a multi-year sales database in order to find all product sales in each region for each product type.

After reviewing the results, an analyst might further refine the query to find sales volume for each sales channel within region/product classifications.

As a last step the analyst might want to perform year-to-year or quarter-to-quarter comparison for each sales channel. This whole process must be carried out on-line with rapid response time so that the analysis process is undisturbed.

Page 48: 01 Presentation DWH

Lnt Infotech Use Only

Q4Time

Q1 Q2 Q3

ProductGrapes

Apples

Melons

Cherries

Pears

LocationAtlanta

DenverDetroit

SalesSales

•Introduction to Cubes

ProductGrapes

Apples

Melons

Cherries

Pears

ProductGrapes

Apples

Melons

Cherries

Pears

LocationAtlanta

DenverDetroit

SalesSales

Page 49: 01 Presentation DWH

Lnt Infotech Use Only

Online Analytical Processing

OLAP database servers support common analytical operations including: “slicing and dicing”, drill down and Consolidation.

“Slicing and Dicing” Slicing and dicing refers to the ability to look at the database from different view points. One slice of the sales database might show all sales of product type within a region. Another slice might show all sales by sales channel within each product type. Slicing and dicing is often performed along a time axis in order to analyze trends and find patterns.

Drill-Down: OLAP database servers can also go in the reverse direction and automatically display detail data which comprises consolidated data. This is called drill-downs. Consolidation and drill-down are an inherent property of OLAP servers.

Consolidation: Involves the aggregation of data such as simple rollups, like for example sales officers can be rolled-up to districts and districts rolled-up to regions.

Page 50: 01 Presentation DWH

Lnt Infotech Use Only

Data Mining

Data Mining is also called as “Knowledge Discovery in Databases (KDD)”

Data Mining also refers to “using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it; it is the hidden information in the data that is useful.

Page 51: 01 Presentation DWH

Lnt Infotech Use Only

Applications of Data Mining

Data mining has varied fields of applications some of which are listed below:

RETAIL/ MARKETING

Identify buying patterns from customers

Find associations among customer demographic characteristics

Predict response to mailing campaigns

BANKING

Detect patterns of fraudulent credit card use

Identify loyal customers

Determine credit card spending by customer groups

Find hidden correlations between different financial indicators

Page 52: 01 Presentation DWH

Lnt Infotech Use Only

Who uses Data Warehouse

• Managers use sales data to improve forecasting & planning for brands, product lines & business areas.

• Retail purchasing managers use DW to track fast-moving lines & ensure an adequate supply of high demand products.

• Financial analyst use warehouses to manage currency & exchange exposures, oversee cash flow & monitor capital expenditures.

Page 53: 01 Presentation DWH

Lnt Infotech Use Only

Questions

Page 54: 01 Presentation DWH

Lnt Infotech Use Only

Introduction Introduction to to

Data ModelingData Modeling

Page 55: 01 Presentation DWH

Lnt Infotech Use Only

Objectives

• At the end of this lesson, you will know :– Data Modeling for Data Warehouse– What are dimensions and facts– Star Schema and Snowflake Schemas– Factless Tables– Some modeling tools

Page 56: 01 Presentation DWH

Lnt Infotech Use Only

Data Modeling for Data Warehouse

• How to structure the data in your data warehouse ?• Process that produces abstract data models for one

or more database components of the data warehouse• Modeling for Warehouse is different from that for

Operational database– Dimensional Modeling, Star Schema Modeling or

Fact/Dimension Modeling

Page 57: 01 Presentation DWH

Lnt Infotech Use Only

Modeling Techniques

• Entity-Relationship Modeling – Traditional modeling technique– Technique of choice for OLTP– Suited for corporate data warehouse

• Dimensional Modeling– Analyzing business measures in the specific business

context– Helps visualize very abstract business questions– End users can easily understand and navigate the data

structure

Page 58: 01 Presentation DWH

Lnt Infotech Use Only

Entity-Relationship Modeling - Basic Concepts

• The ER modeling technique is a discipline used to illuminate the microscopic relationships among data elements.

• The highest art form of ER modeling is to remove all redundancy in the data.

• Created databases that cannot be queried !!!!!

Page 59: 01 Presentation DWH

Lnt Infotech Use Only

An Order Processing ER Model

Order Header

Order Details

Customer TableFK

Item TableFK

Salesrep tableCity

Sales District

Sales Region

Sales Country Product Brand

Product Category

FK

Page 60: 01 Presentation DWH

Lnt Infotech Use Only

Entity-Relationship Modeling - Basic Concepts

• Entity– Object that can be observed and classified by its properties

and characteristics– Business definition with a clear boundary– Characterized by a noun– Example

• Product

• Employee

Page 61: 01 Presentation DWH

Lnt Infotech Use Only

Entity-Relationship Modeling - Basic Concepts

• Relationship– Relationship between entities - structural interaction and

association– described by a verb – Cardinality

• 1-1

• 1-M

• M-M

– Example : Books belong to Printed Media

Page 62: 01 Presentation DWH

Lnt Infotech Use Only

Entity-Relationship Modeling - Basic Concepts

• Attributes– Characteristics and properties of entities– Example :

• Book Id, Description, book category are attributes of entity “Book”

– Attribute name should be unique and self-explanatory– Primary Key, Foreign Key, Constraints are defined on

Attributes

Page 63: 01 Presentation DWH

Lnt Infotech Use Only

Entity-Relationship Modeling – Why Not ?

• End users cannot understand or remember an ER model.

• No graphical user interface (GUI) that takes a general ER model and makes it usable by end users.

• Softwares cannot usefully query a general ER model. • Use of the ER modeling technique defeats the basic

allure of data warehousing, namely intuitive and high-performance retrieval of data.

Page 64: 01 Presentation DWH

Lnt Infotech Use Only

Dimensional Modeling - Basic Concepts

• Represents the data in a standard, intuitive framework that allows for high-performance access;

• Schema designed to process large, complex, adhoc and data intensive queries.

• No concern for concurrency, locking and insert/update/delete performance

• Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables.

• This characteristic "star-like" structure is often called a star join.

Page 65: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

Page 66: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Example

Page 67: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema with Sample Data

Page 68: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

time_keyproduct_keystore_keydollars_soldunits_solddollars_cost

time_keyday_of_weekmonthquarteryearholiday_flag

product_keydescriptionbrandcategory

store_keystore_nameaddressfloor_plan_type

Store Dimension

Product Dimension

Sales Fact

Time Dimension

Page 69: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

The previous example shows a STAR Schema

The reason for this name is that your query takes on the shape of a star.

The Fact table is the body of the star and the dimension tables are the points of the star.

In the star schema design, a single object (the fact table) sits in the middle and is radially connected to the other surrounding tables(dimension tables) and looks like a STAR.

Page 70: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

FACT TABLES

The fact table is where the numerical measurements of the business

are stored.

Typically represents a business transaction, or event that can be used

in analyzing business process

Sparse

Access control to sensitive information is maintained in fact tables

These tables can be very large; as much as several billion of rows .

Page 71: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

Dimension Tables

The dimension tables are where the textual descriptions of the dimensions of the business are stored.

Dimension tables are designed especially for selection and grouping.

There is no access control on these tables, all users can view this information

These tables are much smaller than the Fact tables, may contain 10,000 rows of data.

Page 72: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

• Dimension Tables Each dimension table has a single-part primary key that

corresponds exactly to one of the components of the multipart key in the fact table.

Dimension tables, most often contain descriptive textual information

Determine contextual background for facts Examples :

• Time

• Location/Region

• Customers

Page 73: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

• The database consists of a single fact table and a single

table for each dimension.

• Each tuple in the fact table consists of a pointer (foreign

key ) to each of the dimension tables.

• Each dimension table consists of columns that

correspond to attributes of the dimension.

Page 74: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

• A key role for dimension table attributes is to serve as the source of constraints in a query or to serve as row headers in the user’s answer set.

• For example : A typical answer set returned from a query looks like this :

Brand Dollar Sales Unit SalesAxon 780 263

Framis 1044 509

Widget 213 444

Zapper 95 39

Page 75: 01 Presentation DWH

Lnt Infotech Use Only

Star Schema Architecture

• This query seeks to find all the product brands (collection of individual products)that were sold in the first quarter of 1995 and present the total dollar sales as well as the number of units.Thus both the dimension attributes the product and time have been used for providing row headers (product brands) and providing constraints (first quarter of 1995) respectively.

Page 76: 01 Presentation DWH

Lnt Infotech Use Only

Components of a Star Schema

Employee_DimEmployee_DimEmployee_DimEmployee_DimEmployeeKeyEmployeeKey

EmployeeID...EmployeeID...

EmployeeKey

Time_DimTime_DimTime_DimTime_DimTimeKeyTimeKey

TheDate...TheDate...

TimeKeyProduct_DimProduct_DimProduct_DimProduct_Dim

ProductKeyProductKey

ProductID...ProductID...

ProductKey

Customer_DimCustomer_DimCustomer_DimCustomer_DimCustomerKeyCustomerKey

CustomerID...CustomerID...

CustomerKeyShipper_DimShipper_DimShipper_DimShipper_Dim

ShipperKeyShipperKey

ShipperID...ShipperID...

ShipperKey

Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey

TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey

RequiredDate...RequiredDate...

TimeKey

CustomerKeyShipperKey

ProductKeyEmployeeKey

Multipart KeyMultipart KeyMultipart KeyMultipart Key

MeasuresMeasuresMeasuresMeasures

Dimensional KeysDimensional KeysDimensional KeysDimensional Keys

Page 77: 01 Presentation DWH

Lnt Infotech Use Only

Fact Table & Dimension Tables

• Fact Tables• Numerical Measurements of

business are stored in Fact Tables.

• Dimensional Tables• Dimensions are attributes

about facts.

• Dimensional Tables• Dimensions are attributes

about facts.

• Fact Tables• Numerical Measurements of

business are stored in Fact Tables.

Page 78: 01 Presentation DWH

Lnt Infotech Use Only

Dimension Hierarchies

• For each dimension, the set of associated attributes can be structured as a hierarchy

storesType

city region

customer city state country

Page 79: 01 Presentation DWH

Lnt Infotech Use Only

Dimension Hierarchies

store storeId cityId tId mgrs5 sfo t1 joes7 sfo t2 freds9 la t1 nancy

city cityId pop regIdsfo 1M northla 5M south

region regId namenorth cold regionsouth warm region

sType tId size locationt1 small downtownt2 large suburbs

Page 80: 01 Presentation DWH

Lnt Infotech Use Only

Snowflake Schema

Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing

the dimension tables

Page 81: 01 Presentation DWH

Lnt Infotech Use Only

Snowflake Schema

• Dimension tables are normalized by decomposing at the attribute level

• Each dimension has one key for each level of the dimension’s hierarchy

• Good performance when queries involve aggregation• Complicated maintenance and metadata, explosion in

number of table.• Makes user representation more complex and

intricate

Page 82: 01 Presentation DWH

Lnt Infotech Use Only

Snowflake schema - Example

FactTable

DimTable

DimTable

DimTable

DimTable

Page 83: 01 Presentation DWH

Lnt Infotech Use Only

Using a Snowflake Schema

Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey

TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKeyRequiredDate...RequiredDate...

Product_Brand_IDProduct_Brand_IDProduct_Brand_IDProduct_Brand_IDProduct BrandProduct Brand

Product Category IDProduct Category ID

Product_Category_IDProduct_Category_IDProduct_Category_IDProduct_Category_IDProduct CategoryProduct Category

Product Category IDProduct Category ID

Product_DimProduct_DimProduct_DimProduct_DimProductKeyProductKey

Product NameProduct Name

Product SizeProduct Size

Product Brand IDProduct Brand ID

Page 84: 01 Presentation DWH

Lnt Infotech Use Only

Conformed Dimensions

• Dimension that means the same thing with every possible fact table that it can be joined with

• Conformed dimensions most essential – For the Bus Architecture– Integrated function of the Data Warehouse

• Some common dimensions are :– Customer– Product– Location– Time

Page 85: 01 Presentation DWH

Lnt Infotech Use Only

Surrogate Keys

• All tables (facts and dimensions) should not use production keys but Data Warehouse generated surrogate keys– Productions keys get reused sometimes– In case of mergers/acquisitions, protects you from different

key formats– Production systems may change their systems to generalize

key definitions– Using surrogate key will be faster– Can handle Slowly Changing dimensions well

Page 86: 01 Presentation DWH

Lnt Infotech Use Only

Slowly Changing Dimensions

Certain kinds of dimension attribute changes need to be Certain kinds of dimension attribute changes need to be handled differently in Data Warehousehandled differently in Data Warehouse

• Type I - Overwrite

– e.g. Name Correction, Description changes

• Type II - Partition History– Packing change, Customer movement– Create a new dimension record with new surrogate key

• Type III - Organizational changes– Sales Force Reorganization

– Show by sales broken by new and old organizations

– Need to create an old and a new field

Page 87: 01 Presentation DWH

Lnt Infotech Use Only

Factless Fact Tables

• For Event Tracking e.g. attendance

Date_Key

Student_Key

Course_Key

Teacher_Key

Facility_Key

DateDimension

CourseDimension

FacilityDimension

StudentDimension

TeacherDimension

Page 88: 01 Presentation DWH

Lnt Infotech Use Only

Examples of Data Modeling Tools

• ERWIN– Supports Data Warehouse design as a modeling technique

• Powersoft WarehouseArchitect– Module of Power Designer specifically for DW Modeling

• Oracle Designer– Can be extended for Warehouse modeling

• Others like Infomodeler, Silverrun are also used

Page 89: 01 Presentation DWH

Lnt Infotech Use Only

Questions