Business Intelligence: Data Warehouses

Post on 15-Jan-2015

136 views 1 download

Tags:

description

A basic introduction to data warehouses, their uses, and their benefits.

Transcript of Business Intelligence: Data Warehouses

Business Intelligence

Michael Lamont

lamont@post.harvard.edu

Platforms

Implementation of BI platform requires

lots of important choices:

Type of platform

Software tools & technologies

IT usually takes lead on technology and

platform decisions

Important for business managers to

participate in decision making – they’ll

actually be using the platform

Platforms

BI platforms capture raw operational

data and convert it to useful info

Process used by a platform can be

simple or complex

Data warehouse is most common BI

platform

Data warehouses have several distinct

components that work together

BI Platform

Data Sources

Operational Systems

Organizations usually have dozens of

operational systems that support day-to-

day transactions

Line-of-business apps:

Human resources

Enterprise Resource Planning

Supply chain

Point-Of-Sale

Operational Systems

Efficient at supporting transactional

processes

Not so good for business analysis

Not really able to use data from multiple

sources

BI Platform

Data Sources

Data

Warehouse

Data Warehouse

Collective repository of data from a

company’s operational systems

Data warehouse feeds data into series

of subject-specific databases called

data marts

Some “data warehouse” platforms are

really just a collection of data marts

BI Platform

Data Sources

Data

Warehouse

HR

Sales

Finance

Data Marts

Data Marts

Data marts are subject-specific

HR

Sales

Finance

Marketing

Etc

Definition of “subject” varies from

company to company depending on

needs

Data Marts

Examples of data marts in a single

company:

Support Sales dept’s analysis of

performance and margins

Let HR dept analyze headcount and

absence trends

Data Sharing

Data warehouses shouldn’t be collection

of independent silos of data

Silos of data are what operational systems

already give you

A good data warehouse makes it easy to

normalize measures and dimensions

Ensures dimensions & measures have same

meanings across company

Support metrics calculations across data feeds

Data Sharing

Operational systems can’t calculate many useful metrics because they can’t integrate/share data

Calculating revenue per employee requires data from Sales and HR data silos

Easy to calculate these metrics in a data warehouse with shared data and dimensions

More shared data = more powerful analysis

Data Integration

Integrating data into a common

warehouse is hardest part of BI process

Each operational system creates

mountains of data in incompatible

formats

Extract, Transform, Load processes

load data from operational systems into

data warehouse.

BI Platform

Data Sources

Data

Warehouse

HR

Sales

Finance

Data Marts ETL Processes

Data Integration

Business managers/analysts aren’t

usually involved in technical details of

ETL

Participate in defining business rules for

how data is integrated

Data integration rules determined by:

Type of analysis to be performed

How well data supports requirements

Data Analysis

Analysis processes responsible for

assembling charts, graphs, etc and

delivering them to business users

Software packages used for these tasks

are called front-end tools

Harvest info from data warehouse

Present to users in visual formats

Data Analysis

More advanced analysis tools can be

used to explain behavior or uncover

hidden trends

Goal of analysis process is to help

decision makers by giving them useful

data

Reporting & Analysis

Piece of BI that business users are most

familiar with

Primary purpose: put data in hands of

business users

Reporting & analysis processes need to

assemble data into formats that hold

meaning for business users

Reporting & Analysis

Multidimensional analysis designed to

make data understandable/useful to

business users

Tabular grids excellent way to

consolidate & present data

Also important to graphically chart data

Graphs and tables work together to give

business users different perspectives on

data

Graphics Example

Tenure Sick

Days

10 8.04

8 6.95

13 7.58

9 8.81

11 8.33

14 9.96

6 7.24

4 4.26

12 10.84

7 4.82

5 5.68

Tenure Sick

Days

10 9.14

8 8.14

13 8.74

9 8.77

11 9.26

14 8.1

6 6.13

4 3.1

12 9.13

7 7.26

5 4.74

Tenure Sick

Days

10 7.46

8 6.77

13 12.74

9 7.11

11 7.81

14 8.84

6 6.08

4 5.39

12 8.15

7 6.42

5 5.763

Tenure Sick

Days

8 6.58

8 5.76

8 7.71

8 8.84

8 8.47

8 7.04

8 5.25

19 12.5

8 5.56

8 7.91

8 6.89

Dept 1 Dept 2 Dept 3 Dept 4

Avg Tenure: 9 years

Avg Sick Days: 7.5

Graphics Example

0

5

10

15

0 5 10 15

Dept 1

0

5

10

0 5 10 15

Dept 2

0

5

10

15

0 5 10 15

Dept 3

0

5

10

15

0 10 20

Dept 4

Business Users

Power Analysts

Information Consumers

Information Users

Business Users

Information Users

Information Users

Require standard reports

Can be short or extensive

Usually contains charts and tables

Want consistent report formats

No need to “slice and dice” data

Static or very simple dynamic reports

Printed

MS Office document formats (PPT, XLS)

Business Users

Information Consumers

Information Consumers

Want to perform dynamic data queries

Not experts in database design or query

tools

Want to be able to pivot and nest data

inside intuitive interface

Interactive ad hoc tools can provoke info

users to cross the line into info

consumer territory

Business Users

Power Analysts

Power Analysts

Use the full analytical power of the

system to do free-form ad hoc analysis

Knows the details of database design

and query tool software

Creates reports for others

Smallest of the three groups of users

Front-End Tools

Present data from warehouse to

business users as reports and

interactive data views

Can be grouped into two categories:

Reporting tools

Data exploration

Front-End Tools

Reporting paradigm:

Excellent at producing tabular reports

Lots of mature and stable packages

Web interfaces for wide-scale deployment

Strong printing/scheduling capabilities

Multidimensional data exploration:

Excellent for dealing with OLAP cubes

Support interactive ad hoc analysis

Graphical charts and views

Front-End Tools

Competitive market space

Wide range of available features and

functionality

Front-End Tools

Remember: features aren’t benefits

Advanced analysis features useful to

power analysts, but not info users

Invest time to figure out broader BI

objectives and needs of users

Select solution providers based on your

objectives and needs

Data Warehouses

Primary task: support reporting &

analysis

Warehouse design & content driven by

business needs

Business people determine what info they

need to make better decisions faster

IT implements warehouse to fit business

needs

Data Warehouses

Business & IT need to be aligned on

business requirements

Subject Oriented

Data warehouses organize data into

subject-specific data marts

Data marts are NOT silos of data

Data marts gather data from multiple

operational systems to support analyses

Ex: product line profitability

Data in the warehouse is shared by the

data marts

Consistent Data

Warehouses provide consistent data by

using the same dimensions and

measures for all data

Consistent - data to be analyzed has

same definitions across entire company

Achieving data consistency requires

both integration and organizational

decisions

Consistent Data

Data from multiple operational systems

has to be integrated into one common

data set for analysis

Problem: Different systems may have

subtly different definitions of “discount”

Solution: Data warehouse

integrates/transforms data based on

consistent business rule

Consistent Data

Problem: Source data has different

dimension structures

Solution: Warehouse defines uniform

dimension designs

Consistent data requires standardized

measure & dimension definitions

Everyone in company needs to “speak

the same language” for dimensions &

measures

Cleansed Data

Cleansed data – data that has been

validated by business & structural rules

Storing cleansed data is a key priority

for data warehouses

Data from operational systems is usually

uncleansed “dirty data”

Types of Dirty Data

Missing

Information not entered into an order

tracking system

Incorrect

One Walmart reporting it sold 50K razor

blades in an hour

Data entry errors

Booston, MA

Subtle issues like double-counting

Cleansed Data

ETL processes use business rules to

load valid data and cleanse/reject invalid

data

Historical Data

Warehouses let you analyze data over

specific time periods

Provides users with “snapshots” of data

from operational systems

Warehouse data is static, unlike

operational systems

Warehouse data refreshed at regular

time intervals

Historical Data

Data warehouses are non-volatile

Historical data lets analysts identify

trends and exceptions

Ex: comparing year-over-year sales on a

quarterly basis

Fast Delivery of Data

Warehouse has to provide data to users

quickly and efficiently

Database technology and structures

need to be fast & efficient

Two types of databases in common

usage:

OLAP (OnLine Analytical Processing)

RDBMS (Relational DataBase Management

Systems)

OLAP Databases

Benefits of OLAP:

Native support of multidimensional analysis

Fast data retrieval

Pre-process data as much as possible

Ideal for fast retrieval of aggregated data

OLAP is usually a good candidate for

data marts

OLAP Databases

Important recent developments:

Much easier to design OLAP databases

Acquisition costs are extremely low

SMBs can now use technology that was

only available to large enterprises a few

years ago

OLAP & Relational Databases

Relational databases often store

underlying data supplied to OLAP

database

RDBMS stores detailed data, OLAP

stores summarized data views

Example: Sales data mart

Relational stores daily sales data

OLAP stores and manages summarized

sales data by customer, product, region, etc.

Relational Databases

Relational databases can host data

marts without OLAP

Use their own set of dimensions &

measures to support analysis

Requires sophisticated front-end tools

that can quickly assemble relational data

into multidimensional formats

Conclusions

Data warehouse architecture is flexible,

effective decision support platform

Warehouse helps organize and deliver

data to decision makers

Brings BI to life through data marts, DB

technology, ETL tools, and analysis tools

Helps business managers make better

decisions faster

Michael Lamont

lamont@post.harvard.edu