1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data...

28
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Warehouse Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data...

1

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

Introduction to Data Warehouse

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

2

Outline

• Why Data Warehouse?– Problems, causes and data warehouse

solutions

• What is Data Warehouse? – Characteristics and components

• Current Practices of Data Warehouse

3

Why Data Warehouse?

• Knowledge Management Problems (Drowning in data, starving for knowledge)1. Can’t access data (easily)

E.g., data from different branches, years, functional areas, etc.

2. Give me only what’s important (knowledge)E.g., Regions and products that have upward sales trends

over the last five years.

3. I need to reduce data to what’s important by slicing and dicing.E.g., by branch, product, year, etc.

4

Why Data Warehouse?

4. Data inconsistency and poor data qualityE.g., the 2001 PC sales amount in SLC from the CFO and the

SLC Account Manager are not the same.

5. Need to improve the practices of making informed decisions.E.g., Did the VP for Marketing decide on the advertising

budgets for branches in the SW region based on their sales performances over the last five years?

6. Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to

wait for the MIS Department to generate sales performance reports and analyses.

5

Why Data Warehouse?

• ROI Problems7. Can I get more value out of my data?

Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time.

8. Can I do this cost-effectively?Options: federated (interoperable) databases vs. a data

warehouse

9. Can I easily scale up or change how I get knowledge out of my data?E.g., Add more regions, functional areas or years in sales

performance analyses.

6

Causes for the Problems

Cause 1: Isolated databases distributed in an enterprise

Sales

CRM

Inventory

A Root cause for problems 1, 4, 5, 6, 7, 8 and 9

7

Why Data Warehouse

• Cause 1: Isolated databases distributed in an enterprise

Sales

CRM

Inventory

Ad hoc access solutions cannot alleviate the problems

8

Why Data Warehouse

• Cause 2: Historical data is archived in offline storage systems

Sales

Another Root cause for problems 1, 4, 5, 6, 7, 8 and 9

Archive

Historical Sales Data

9

Why Data Warehouse

• Cause 2: Historical data is archived in offline storage systems

Sales

Ad hoc accesses are slow and inconvenient

Archive

Historical Sales Data

10

Cause 3: Metadata for Transaction DB systems

is Not User Friendly

Student Course

Instructor Dependent

Under-graduate Graduate

IS-A

Take

HasSSN

Address

Name

Phone

Major Major Minor

SSN

Rank

Name

C-Name C-No

Name

Relation

Sex

Grade

M M

M

1

1

11

12

Why Data Warehouse

• Cause 4: Query and programming languages are even less user friendly– DESB students’ academic grades and

GPAs since the freshman year– Sales amount distribution by product

category, customer state and year– Slicing and dicing – SQL statements???– Report/screen interface codes???

13

Why Data Warehouse

• Cause 5: Transaction databases are optimized (normalized) to process transactions but not to answer decision support queries– Bad query performance to join the

normalized tables – Heavy transaction processing

workload

14

What is Data Warehouse

Designed to solve problems associated with current database practices:

• Isolated, distributed databases

Sales

CRM

Inventory

Extract, replicate, integrate, cleanse & load

Data Warehouse

15

Why Data Warehouse

• Historical data is archived in offline storage systems

Sales Archiv

e

Historical Sales Data

Data Warehouse

Integrate Historical Data with Current Data

16

What is Data Warehouse• Causes 3, 4 and 5: Hard-to-understand metadata, and query and

programming languages; poor decision support query performances

• Solution: In data warehouse, organize data in subject –oriented way rather than process-oriented way – dimensional modeling.

17

Dimensional Modeling (Star Schema)

AcademicPerformance

. Grade

. Name

. Rank

Instructor

. Name

. UG/PG

. Major

Student

Course. Number. Title

Semester. Year. Length. Start date

18

Dimensional Modeling (Star Schema)

Sales. Qty. Amt

. Name

. State

. City

Branch

. Name

. Category

Product

Customer. Name. State. City

Time. Year. Quarter. Month

19

One System for Multiple Uses

DatabaseManagement

System (DBMS)

Database

ApplicationProgram

Interactive Queries/

Transactions

Database System

ApplicationProgram

Metadata

20

Two Worlds -> Two Systems

Operational DSS

OperationalApplication

OperationalApplication

OperationalApplication

Data warehouse

ExecutiveInformationSystem

DecisionSupportSystem(DSS)

Reporting

OLTPDBs

21

What is Data Warehouse

• Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process.

• 1. Subject-oriented means the data warehouse focuses on the high-level entities of business such as sales, products, and customers. This is in contrast to database systems, which deals with processes such as placing an order.

22

What is Data Warehouse

2. Integrated means the data is integrated from distributed data sources and historical data sources and stored in a consistent format.

3. Time-variant means the data associates with a point in time (i.e., semester, fiscal year and pay period)

4. Non-volatile means the data doesn’t change once it gets into the warehouse.

23

OLAP Data Warehouse

OLTP DB

Purpose Decision Support Transaction Processing

Data Model

Dimensional Normalized Relational

Time Span Historical and Current Data

Current Data

Query processing

Scan a substantial subset of data

Scan a small set of data

Operation Read-only Read & Update

Characteristics of Data Warehouse

24

Data Warehouse and Data Mart

• Data warehouse – defined by its decision support purpose and other characteristics– Other characteristics: subject-oriented,

integrated

• Data mart – a data warehouse for a more limited business scope (e.g., a department, etc.)

• A data warehouse may be built from several data marts

25

Source System (Legacy)

extract

extract

extract

Storage:

Flat files (fastest); RDBMS; Other

Processing:

Clean; Prune; Combine; Remove duplicates; households; standardize; conform dimensions; store awaiting replications; archive; export to data marts

No user query services

Populate, replicate, recover

Data Mart #1: OLAP ( ROLAP and/or MOLAP) query services; dimensional! Subject oriented; locally implemented; user group driven; may store atomic data; may be frequently refreshed; conform to DW Bus

Data Mart #2

Data Mart #3

Populate, replicate, recover

Populate, replicate, recover

Ad Hoc Query Tools

Report Writers

End User Applications

feed

feed

feed

feed Models: forecasting; scoring; allocating; data mining; other downstream Systems; other parameters; special UI

Data Staging Area The Data Warehouse Presentation Servers End User

Data Access

Uploaded cleaned dimensions

Uploaded model results

Basic Elements of a Data Warehouse System

DW BUS

DW BUS

Conformed dimensions and facts

Conformed dimensions and facts

Relational

Flat filesSpreadsheets

ERPLegacy

26

Current Practice of DW*

• Expected DW market value in 2002 was projected to have grown to $113.5 billion.

• Average DW development cost is $1.5 million and average maintenance cost is $0.5 million.

• DW development time ranges from 1 to 3 yrs.

* Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001

27

Current Practice of DW*

• Sponsorship for the DW projectSponsor PercentageVP of a business unit 39.8CIO 26.9Business unit manager 16.7CEO 11.1Other 25.0

* Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001

28

Current Practice of DW*

• DW Benefits– Less effort to produce better

information– Better decisions– Improvement of business processes– Support for accomplishments of

strategic business objectives• Return on Investments and Cost of

Ownership?* Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001