Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

26
Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR

Transcript of Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Page 1: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Data Warehousing – An Introductory Perspective

DWCC BBSRDWCC BBSR

Page 2: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Agenda

Why Data Warehouse Definition and Architecture Terminology

Page 3: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

The Business Need

Business DecisionsAre not made byRolling Dices

We Don’t know

What we don’t know

I think…. errrr,

I guess so

Page 4: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Current Business Environment

Competitive Ever Changing Chaotic Global Urgency to make decisions Competitive advantages stems from well informed

decisions Based on an understanding of:

Your Products Your Customers Preferences The Competition Your own company strengths

Page 5: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

The Value Pyramid

Each layer providesValue en route to atargeted business Outcome

Business Outcomes

Actions

Decisions

Knowledge

Information

Data

Increased revenueIncreased productivity

Reduced costsCompetitive advantage

Page 6: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Definitions

A collection of integrated, subject oriented databases designed to support the DSS function where each unit of data is relevant at some moment of time (Inmon 1991)

A copy of transaction data specifically structured to Query and Analysis (Kimball 1996)

Data Warehouse is NOT a specific technology It is a series of processes, procedures and tools that

help the enterprise understand more about itself, its products, its customers and the market it services.

It is NOT possible to purchase a Data Warehouse But, it is possible to build one.

Page 7: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

FEATURES

•Non Volatile - Used mainly for reporting purpose and it is independent of transactional data.

•Subject Orientation- All relevant data is stored together. Ex: Sales, Finance, Marketing, Customer data etc.

•Historical data- Can contain data of several years depending on company requirements.

sachin_kambhoj:

sachin_kambhoj: sachin_kambhoj

:

sachin_kambhoj:

Page 8: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Subject Orientation.

Operational Datawarehouse

AUTO

HEALTH

LIFE

CASUALTY

Customer

Policy

Premium

Claims

Applications Subjects

Page 9: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Goals and Applications

Goals of a Data Warehouse Provide reliable, High performance access Consistent view of Data: Same query, same data. All users

should be warned if data load has not come in. Slice and dice capability Quality of data is a driver for business re-engineering.

Data Warehousing Applications: Customer Profitability Analysis Customer satisfaction and retention Buyer behavior. Pricing, Promotion Analysis Market research Inventory optimization

Page 10: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

OLTP v/s Data Warehouse

OLTP system runs the business, Data Warehouses tell you how to run the business

Characteristic OLTP Data Warehouse

Orientation Transaction Analysis

Data Access Record at a time Set at a time

Updates Frequent & Unscheduled

Periodic & Scheduled

Response time Seconds required Minutes acceptable

Concurrent users Many Few

Availability Guaranteed As needed

Data structures Highly normalized Often de-normalized

Data nature Current historical

Page 11: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

If most of your business needs are To report on data in a single transaction processing system All the historical data you need are in the system Data in the system is clean Your hardware can support reporting against the live system data The structure of the system data is relatively simple Your firm does not have much interest in end user adhoc

query/report tools

Data warehousing may not be for your business!!

Page 12: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Modeling Constructs

Entity Relationship Diagram Star schema Snow flake schema

Within the implementation of a warehouse, several of these constructs may be integrated to form an optimal design

Page 13: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Entity Relationship Diagram

• Based on set theory and SQL

• Highly normalized

• Optimized for update and fast transaction turnaround

• Not suited for querying in a data warehouse environment

• diagrams like these are very difficult for users to visualize and memorize.

Page 14: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Star Schema

A central fact table surrounded by a number of dimension tables.

Dimensions are business entities on which calculations are

done. They can be numeric or alphanumeric. Example: Product table comprising brand name, category, packaging type, size.

Facts are numerical measurements of business with

respect to dimensions.They are numeric and additive (summable across any combination)

e.g. A sales fact table could contain time, product and store

key along with dollars sold, units sold, dollars cost.

Page 15: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Snow Flake Schema

Normalized version of the star schema with the addition of normalized dimension tables.

Normalization helps to reduce redundancy in the dimension tables, but affects performance and user comprehension.

Page 16: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

DW TerminologyGranularity Granularity (or grain) defines the level of detail stored in the

physical warehouse Low granularity indicates lot of detail while high granularity

indicates less detail. Example: A commercial airline is building a data warehouse. What

will the granularity be? Choice A: Each record represents a flight Choice B: Each record represents the customer on a flightThere is no correct answer. To a large extent, the granularity

depends on the business User’s exploitation needs.However, you should be aware that the granularity of data

affects Volumes of Data, Data Maintenance, Indexing Level of Data Exploration Query and Reporting constraints

Page 17: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

DW TerminologyMetadata At all levels of the data warehouse, information is

required to support the maintenance and use of the data warehouse.

Metadata is data about data.There are two views of Metadata Business – are warehouse attributes and properties

for use by business users Technical – describe data flow from Operational

systems into the data warehouseOLAP Online Analytical processing Tool(s) for Analytical Reporting including Graphical

capabilities.

Page 18: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

DW Terminology OLAP Tools available for exploring the information built

in a DW : Multi-dimensional On-line Analytical Processing

(MOLAP) The data from data warehouse is queried and dumped

periodically on to a server on local network to a data storage called Multi-dimensional Database (MDDB) provided by the OLAP tool. This MDDB forms a Data Mart which is then used for querying and reporting.

Relational On-Line Analytical Processing (ROLAP) Refers to the ability to conduct OLAP analysis directly

against a relational warehouse without any constraints on the number of dimensions, database size, analytical complexity, or number and type of users.

Hybrid On-line Analytical Processing (HOLAP) An environment with a combination of MOLAP and ROLAP

data storage. Summarized information is typically stored in an MDDB and detailed data is stored in a Relational environment.

Page 19: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Terminology

Data Mart- Contains Data about a specific subject. Eg. Official data, Customer data, Campaign data etc.

Metadata- Data about data. Describes the data stored in Data warehouse.

Data Cubes- Central object of data containing information in a multidimensional structure.

Data Cleansing- Regular cleaning of data.

ETL- Extraction, Transformation and Loading of Data.

Data Mining- A mechanism which uses intelligent algorithms to discover patterns, clusters and models from data.

Page 20: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Stages

Heterogeneous Source Systems

Operational

StagingArea Data

Warehouse

Business Intelligence

External

Legacy

Data Mining

Query & Reporting

OLAP

Extraction, Transformation & Loading (ETL)

Page 21: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

A Typical Data Warehouse

Data Warehouse

Detailed Data

DataMart

DataMart

DataMart

Summarized DataMeta Data

Facilitates in firing queries on detailed data.

Data marts contain data specific to a subject.

Page 22: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

MOLAP/ROLAP/HOLAP

Query Tool by MDD Vendor

CustomLoader

Data Warehouse(RDBMS)

OLAP Engine

MDD Proprietary API

Cubes

MDD Proprietary API

Rows

SQL

Rows

MDD

Database

Storage

Periodic,

Manual

Data

Load

Page 23: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

OLAP Terminology

Analytical technique whereby the user navigates from the most summarized to the most detailed level.

Regio

n

Month

Product

RegionState

District

Location

Page 24: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

OLAP Terminology Rotation Or Dicing

Regio

n

M

O

N

T

h

Product

Mon

th

P

R

O

D

U

C

t

Region

Page 25: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

OLAP Terminology

SlicingReg

ion

M

O

N

T

h

Product

Page 26: Data Warehousing – An Introductory Perspective DWCC BBSR DWCC BBSR.

Products and Vendors

Data Warehouses Oracle Sybase DB2

OLAP tools Oracle Express Hyperion Essbase

Data Mining Oracle Darwin IBM Intelligent Data Miner

Querying & Reporting Oracle Discoverer Business Objects