Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: [email protected]@mdx.ac.uk.

34
Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk

Transcript of Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: [email protected]@mdx.ac.uk.

Page 1: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

1

BIS4435 – Data Warehousing

Dr. Nawaz Khan

E-mail: [email protected]

Page 2: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

2Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing: Reading Assignment

Reading Suggestion:

Connolly, T.M., and Begg, C.E., Database Systems: A Practical Approach to Design, Implementation and Management, Addison

Wesley, 4th Edition, ISBN: 0321210255(chapters 31-33)

Global campus materials on OASIS: http://oasis.mdx.ac.uk/ (unit 9)

More Reading: Fundamentals of Database Systems. R. Elmasri and S. B.

Navathe, 4th Edition, 2004, Addison-Wesley, ISBN 0-321-12226-7: Chapter 28

Data Warehousing, Data Mining, and OLAP, Alex Berson and Stephen J. Smith, McGraw-Hill, 1997, ISBN 0-07-006272-2: Chapters 6, 7

Page 3: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

3Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing

Outline Definition Compare with operational systems Architecture Design issues - star schema Relation with DM Summary

Page 4: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

4Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingDefinition

What is a data warehouse?

DW is an environment + facilities

Bring scattered data query data

plant1 plant2 planti …... plantn

warehouseFinished product

…...

query/delivery

Page 5: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

5Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingDefinition

What is a data warehouse?

FinancialDepartment

Human ResourceDepartment …...

R&DDepartment

DW

Data transformation

Access tool

users

Operationalsystems

Page 6: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

6Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Compare with operational systems

Operational DB systems focus on day-to-day business - data structured around

events run in OLTP environment support large number of transactions require quick respond - small, focused DB

DW systems focus on business needs and requirements - data organised

around trends and patterns in events run in off-line environment support complex queries, ad hoc and static reports - based

on historical data

Page 7: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

7Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Compare with operational systems

system

feature Operational DW

Size Small Large - history of business

Performance Speed - essential Better information

Content Small work areas Cross-functional subjects

Tools Restrict standard reporting tools

Various flexible transform/present data as intelligence

Page 8: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

8Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Compare with operational systems

Separation of an Operational and DW System Minimises impact of reporting and complex query processing

on operational systems Preserves operational data for re-use Manages data based on time, historical data available to

users Provides a data store that can be modified to conform to the

way the user views the data Unifies data, one version

Page 9: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

9Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Compare with operational systems

The need for DW Consistent and quality data Cost reduction More timely data access Improved performance and productivity Two distinct types of reporting still require

Operational systems derive notification style reports DW systems generate general information reports

Page 10: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

10Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Overall Architecture Operational data and processing is separated from data

warehouse processing DW is a central information repository surrounded by a

number of components - environment

Data Warehouse

Data Mart

Access Tools

InformationDelivery System

Data Transformation

Operational DataMetadata

Page 11: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

11Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

The DW DW database is a cornerstone of the environment It is implemented on RDBMS technology It should support large size, ad hoc query, user view

Data Transformation Significant effort on extracting data from operational system and

putting it in a suitable format into DW system Functionality

Removing unwanted data from operational databases Converting to common data names and definitions Calculating summaries and derived data Establishing defaults for missing data Accommodating source data definition changes

Difficulties Database heterogeneity Data heterogeneity

Page 12: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

12Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

12 Rules for Data Warehouse

Data warehouse and operational environments are separated

Data are integrated Contains historical data Represent snapshot data at a given point in time Data are subject oriented Data are read-only Data warehouse life cycle is data driven Contain summarised data Read-only transactions involve Involves data transformation Meta data component is very critical Ensure optimum use of data by end users.

Page 13: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

13Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Meta Data - data that describes the data warehouse Description of the data model Description of the database design Definition of the system managing the data items A map of the data location in the DW, including its origin,

how it is transformed/aggregated, where it went Specific database design definitions Data element definitions, including rules for derivations and

summaries

Page 14: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

14Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Meta Data (cont.) Information Directory - metadata that helps users to

interactively access to DW and understand content, find data

A gateway to the DW environment Support easy distribution and replication of its content Searchable by business-oriented key words Act as launch platform for user data access and analysis

tools Support information sharing Support a variety of scheduling options Support distribution of query results Provide interface to other applications Support end-user monitoring of the status of the DW

environment

Page 15: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

15Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Access Tools

Production reporting tool - Generate regular operational reports or support high volume batch jobs

Report writer - designed for end users Managed query tool - a meta-layer between end-user and

database, provides point-click creation of SQL, formats the query results into easy-to-read reports, or on-screen presentation

Query and reporting tools

Managed query tool Reporting tool

Production reporting tool Desktop report writer

Page 16: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

16Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Access Tools (cont.) Application development tools - graphical data access

environment EIS tools - high level summarisation OLAP - multidimensional DB Data mining tools Data visualisation tools - display complex relationships and

patterns, techniques include 3-D imaging & sound, virtual reality

Page 17: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

17Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Data Marts Data store that is subsidiary to a data warehouse of

integrated data It is created for the use of a dedicated group of users for a

subject area It can be placed on the DW database In most instances, data mart is separated from the DW

database and put on a separate database server Dependent data mart - data content is from the DW Independent data mart - alternative to the DW

simple & inexpensive to build inconsistent - each has its own assumptions overlapping in data content, connectivity and management scalability problem

Page 18: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

18Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Data Marts (cont.) Data integration issue - Ralph Kimball For any two data marts, common dimensions must conform

to the equality or roll-up rule

TimePeriod Sales Products

month

week

day

Page 19: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

19Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data WarehousingArchitecture

Information Delivery System It distributes data from warehouse to other DW and end-user

products such as spreadsheets and local DBs (via Internet) Delivery is based on time or event Users receive report or an analytical view of data are not

aware of location and maintenance

Page 20: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

20Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Architecture Blueprint - mission, goals, objectives Logical Architecture

Enterprise Mission, Plan, Process

Data Architecture(data)

Application Architecture(tools)Technology Architecture

(hardware, software & network)

Page 21: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

21Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Star Schema

It is used to model the data in a DW from decision-makers view of the business and operational aspects of the business

It defines the join paths for accessing the facts of business It allows user to filter, aggregate, drill down & slice and dice

the business fact

Time Dimension

LocationDimension

Age GroupDimension

Product Dimension

OtherDimension

salesrevenues

Page 22: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

22Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Star Schema - 3 logical entities Measure entities – centre

Dimension entities – point

Category (detail) entities - extended from point

Page 23: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

23Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Measure entities – centre

Focus of the users’ query activity

Factual information -> business intelligence

Synonymous names are used - measures, analysis, indicators

Quantitative data - numerical information

Data contained in measure entities grows large over time

Month Branch Product Sales forecast Sales actual Variance199901199901199901…

ABCXYZPQR

COLACOLACOLA

200000150000125000

190000015500001050000

-1000050000-20000…

Page 24: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

24Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Dimension entities – point

Allow users to browse measurement data from different angles - time, location, product ...

Minimize the rows of data within a measure entity - filter

All Location

Canada France Germany USA

Eastern Area Western Area

North-east Central South-eastRegion Region Region

Location dimension

Country

Area

Region

Page 25: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

25Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Category (detail) entities - extended from point

Provide detailed information of a category within a dimension

Textual/qualitative information

All Clients(Dimension)

RegionClientState

….

CLIENT_KEYCOMPANY_NAMEADDRESSPOST_CODECONTRYNAMEPHONE

CLIENT DETAILS

Page 26: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

26Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Forming Star Schema: Star schema can be formed based on an information package, which is constructed during data gathering process

All TimePeriods

AllLocations

All Products All AgeGroups

All Econ.Classes

AllGenders

Year5

Country20

Classification8

Age Group8

Class10

Gender3

Quarter20

Area80

Group40

Month60

Region400

Product200

District2,000Store

200,000Measures/Facts:Forecast Sales, Budget Sales, Actual Sales, Forecast Variance (calc.), Budget Variance (calc.)

Dimension

Category

Page 27: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

27Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Forming Star Schema: Define measure entity lowest category within each dimension along with each of

the measures/facts defines measure entity give a name to reflect the business purpose put in the centre of the star schema in a rectangle box

SalesAnalysis

Page 28: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

28Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Forming Star Schema: Define dimension entity each column of an information package defines the

dimension entity place on the periphery of the star in a diamond shaped box consider relationship to the measure entity “measures based

on dimension” Time

Sales Analysis

AgeGenderE-class

Location Product

Page 29: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

29Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

BIS4227 - Online Database Systems

Data Warehousing Design issues - star schema

Forming Star Schema: Define category entity examine each individual cell of an information package to

determine if it qualifies as a category detail entity category entities become extensions of dimension entities add to star schema in stop sign box

All TimePeriods

AllLocations

All Products All AgeGroups

All Econ.Classes

AllGenders

Year5

Country20

Classification8

Age Group8

Class10

Gender3

Quarter60

Area80

Group40

Month60

Region400

Product200

District2,000Store

200,000

Store

Product

CustomerCategory

Dimension

Page 30: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

30Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Design issues - star schema

Forming Star Schema:

Sales Analysis

Time Produc

t

ProductDetails

Gender

Age

Econ.Class

Customer details

Location

Store details

Page 31: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

31Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehouse Example: Operational Data

Page 32: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

32Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehouse Example: Star Schema

Page 33: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

33Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehouse Example: Summary Report

Page 34: Lecture 9 1 BIS4435 – Data Warehousing Dr. Nawaz Khan E-mail: n.x.khan@mdx.ac.uk@mdx.ac.uk.

Lecture 9

34Dr. Nawaz Khan, School of Computing ScienceE-mail: [email protected]

Data Warehousing Relation with DM

They have the same purpose - decision support DW assembles, formats, and organises historical data to

answer user query as it is - depends on content of DW DW will not attempt to extract further information neither

will it predict trends and patterns from data DM will extract previously unknown and useful information

as well as predict trends and patterns DM can be performed on DW and/or traditional DB DM: next lecture