Datawarehousing Concepts | 7.0 9/7/2015 Datawarehousing Concepts.

25
Datawarehousing Concepts | 7.0 03/16/22 Datawarehousing Concepts

Transcript of Datawarehousing Concepts | 7.0 9/7/2015 Datawarehousing Concepts.

Datawarehousing Concepts | 7.0 04/19/23

Datawarehousing Concepts

© 2 04/19/23Datawarehousing Concepts | 7.0

Objectives

The participants will be able to:

Discuss the basic concepts of Data warehousing

Explain the business need for decision support system

Define the Data warehouse features like KPI, fact, dimension

Describe the architecture of Data warehouse

Describe the terms OLTP and OLAP and explain the difference between them

Describe Entity Relationship Diagram with the help of an example

Describe classical star schema

Explain different variations of classical star schema

© 3 04/19/23Datawarehousing Concepts | 7.0

Topics

Business need for decision support system

Datawarehouse definitions

Features of Datawarehouse

Entity Relationship diagram

Classical Star Schema

Different variations of Classical Star Schema

© 4 04/19/23Datawarehousing Concepts | 7.0

A Decision support system needs to meet the following demands made by decision makers: Immediate, single-point access to all relevant information regardless of source

Coverage of all business processes.

High quality of information not only in terms of Data content, but also in terms of the ability to evaluate Data flexibly.

High quality decision-making support: The Data warehouse must be developed and structured on the basis of requirements of operative and strategic management.

Short implementation time with less resources: As well being quick to implement, a Data warehouse must enable simple and quick access to relevant Data.

Business need for decision support system

© 5 04/19/23Datawarehousing Concepts | 7.0

Data warehousing is a tool dedicated to the delivery of information which advances decision making, improves business practices, and empowers business users.

Integrating Data from multiple sources, internal and external.

Providing subject-oriented views of the business through current and historical Data.

Providing a platform for consistent Data repository to analyze different sources of information.

Datawarehouse Definitions

© 6 04/19/23Datawarehousing Concepts | 7.0

Data Extraction & Loading Gathering Data from operational systems (ERP / Legacy)

Cleansing Data

Aggregating the Data

Data Warehouse Optimized for performance

Storing historical Data

Building the schema : Star Schema

The OLAP Cube Multi-dimensional modeling

front-end access tools

Datawarehouse Definitions

© 7 04/19/23Datawarehousing Concepts | 7.0

Fact: The information that business users want to know

The performance measures of the business

Facts are numbers, percentages

Sales volume, sales quantity etc. can be considered as facts

Dimension: How the Data needs to be viewed, like by Sales Organization, Distribution Channel etc.

A Data Model based on: Business Objectives

Business Strategy

Facts and Dimension

© 8 04/19/23Datawarehousing Concepts | 7.0

Key Performance Indicators (KPI)

Internal Process Measures

Innovation and Learning Measures

Customer Measures

Financial Measures

% Sales of New Products

Customers Acquired

Customer Satisfaction

Market Share

ROI and ROA

Revenue Growth

Product Time to Market

Unit Manufacturing Cost

Days Supply to inventory

New Product Introduction

Mgmt Skills

Employee Turnover

© 9 04/19/23Datawarehousing Concepts | 7.0

InvoicingSystems

Purchasing Systems

General Ledger

Ext. Data Sources

Other Int.Systems

Source Data

Data Extraction Integration

and Cleansing Processes

Purchasing

Marketing and Sales

Corporate Information

Product Line

Location

Summation

Functional Area

Translate

Attribute

Calculate

Derive

Synchronize

Summarize

Segmented Data Subsets

Summarized Data

Custom Developed

Applications

Query AccessTools

DataMining

StatisticalPrograms

Data Marts

Extract Operational Data Store Transformation

ApplicationsDataWarehouse

Generic Data warehouse Architecture

© 10 04/19/23Datawarehousing Concepts | 7.0

Distinction between the Operative/inoperative environment

© 11 04/19/23Datawarehousing Concepts | 7.0

OLTP Systems compared to OLAP Systems

OLTP Systems OLAP Systems

Target Efficiency through automation of business processes

Generation of knowledge (competitive advantage)

Priorities High availability, higher Data volume

simple use, flexible Data access

View of Data detailed frequently aggregated

Database operations add, change, delete (refresh) and read

read

Typical Data structures relational (flat tables, high normalization)

multi-dimensional structures

Integration of Data from various modules/applications

minimal comprehensive

© 12 04/19/23Datawarehousing Concepts | 7.0

OLTP Systems compared to OLAP Systems…contd(1)

OLTP Systems OLAP Systems

Dataset Dynamic, short lived

( 60-90 days )

Static; historical ( 2+ years )

Application oriented Subject oriented

Purpose Day-to-day operations Planning & knowledge based functions

Highly structured repetitive processing

Highly unstructured analytical processing

User base Mostly operational community

Mostly managerial community

© 13 04/19/23Datawarehousing Concepts | 7.0

OLAP, MOLAP, ROLAP, HOLAP

OLAP OLAP :: On Line Analytical ProcessingOn Line Analytical Processing

MOLAPMOLAP: Multidimensional OLAP A multidimensional Database and an analytical engine e.g. EssBase from Arbor Software

ROLAPROLAP: Relational OLAP Analytical engine that front-ends a relational DB: Data stored in relational DBMS and

build multidimensional views of the Data

HOLAP: Hybrid OLAP A combination of relational OLAP and multidimensional OLAP

© 14 04/19/23Datawarehousing Concepts | 7.0

Entity Relationship Diagram

© 15 04/19/23Datawarehousing Concepts | 7.0

Developing an ERD Developing an ERD requires an understanding of the system and its components.

Consider a hospital: Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be assigned a single doctor, but in rare cases they will have two.

Healthcare assistants also attend to the patients, a number of these are associated with each ward.

Initially the system will be concerned solely with drug treatment. Each patient is required to take a variety of drugs a certain number of times per day and for varying lengths of time.

The system must record details concerning patient treatment and staff payment. Some staff are paid part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to grade).

The system will also need to track what treatments are required for which patients and when and it should be capable of calculating the cost of treatment per week for each patient (though it is currently unclear to what use this information will be put).

Building an Entity Relationship Diagram

© 16 04/19/23Datawarehousing Concepts | 7.0

Building an Entity Relationship Diagram…contd(1)

© 17 04/19/23Datawarehousing Concepts | 7.0

Customer ID

Customer name

City

Region

Time ID

Month

Quarter

Year

Material Name

Customer ID

Material IDTime ID

Sales Volume

Sales Quantity

Customer dimension

Fact

Time dimension

Material dimension

Classical Star Schema

Material ID

Material Group

© 18 04/19/23Datawarehousing Concepts | 7.0

Dimension Tables

Customer Dimension Table

Material Dimension Table

Time Dimension Table

Customer id Customer name

City Region

C100 David London North

C200 Peter Paris West

Material id Material name

Material Group

…..

M1111 Hard Disc Hardware …..

M2222 Keyboard Software ….

Time id Month Quarter Year

07.01.2004 01.2004 Q1/2004 2004

05.08.2004 08.2004 Q3/2004 2004

© 19 04/19/23Datawarehousing Concepts | 7.0

Fact Table

Fact Table

Time id Customer id Material id Sales Volume

Quantity

07.01.2004 C100 M1111 50,000 100

07.01.2004 C100 M2222 3,000 60

07.01.2004 C200 M1111 100,000 250

07.01.2004 C200 M2222 10,000 250

05.08.2004 C100 M1111 25,000 50

05.08.2004 C200 M2222 300 6

…. …. …. …. ….

© 20 04/19/23Datawarehousing Concepts | 7.0

Classical Star Schema

Customer Dimension Table Material Dimension Table

Fact Table

Time Dimension Table

Customer id Customer name

C100 David

C200 Peter

Material id Material name …..

M1111 Hard Disc …..

M2222 Keyboard ….

Time id Month ….

07.01.2004 01.2004 ….

05.08.2004 08.2004 ….

Time id Customer id Material id Sales Volume Quantity

07.01.2004 C100 M1111 50,000 100

07.01.2004 C100 M2222 3,000 60

…. …. …. …. ….

© 21 04/19/23Datawarehousing Concepts | 7.0

Multidimensional Analysis of Data

© 22 04/19/23Datawarehousing Concepts | 7.0

Multidimensional Analysis of Data Contd..

© 23 04/19/23Datawarehousing Concepts | 7.0

Material Name

Customer dimension

Fact

Time dimension

Material dimension

Snowflake Schema

Material ID

Customer ID

Material IDTime ID

Sales Volume

Sales Quantity

Material Group

Material ID

Customer Name

Customer ID

City

Customer ID

Region

Month

Time ID

Quarter

Year

© 24 04/19/23Datawarehousing Concepts | 7.0

Summary of Datawarehousing and Modeling

Datawarehouse reflects subject oriented view of Data suitable for analysis purpose.

Datawarehouse provides high quality information to support decision making in an organization.

KPIs are set of measures derived from strategies, goals and objectives.

Facts are numeric measures, dimensions are a perspective by which a fact is viewed.

Generic Datawarehouse architecture consists of source system, extraction, transformation and loading, storing Data and analysis.

OLTP is best suitable for transactional systems( for insert/update/delete), whereas OLAP is most suited for analytical purpose (executing adhoc queries)

© 25 04/19/23Datawarehousing Concepts | 7.0

Summary of Datawarehousing and Modeling…contd(1)

Classical star schema consists of a single fact table surrounded by large demoralized dimension tables.

Dimension tables are linked relationally with the fact table by way of foreign key or primary key relationships.

Multi dimensional modeling represents a dimensional view of Data suitable for analysis

Snow flake schema is a type of star schema where dimension tables are normalized to eliminate redundancy but increases number of table joins.