1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data...
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of 1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data...
1
ACCTG 6910Building Enterprise &
Business Intelligence Systems(e.bis)
ACCTG 6910Building Enterprise &
Business Intelligence Systems(e.bis)
Introduction to Data Warehouse
Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business
Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business
2
Outline
• Why Data Warehouse?– Problems, causes and data warehouse
solutions
• What is Data Warehouse? – Characteristics and components
• Current Practices of Data Warehouse
3
Why Data Warehouse?
• Knowledge Management Problems (Drowning in data, starving for knowledge)1. Can’t access data (easily)
E.g., data from different branches, years, functional areas, etc.
2. Give me only what’s important (knowledge)E.g., Regions and products that have upward sales trends
over the last five years.
3. I need to reduce data to what’s important by slicing and dicing.E.g., by branch, product, year, etc.
4
Why Data Warehouse?
4. Data inconsistency and poor data qualityE.g., the 2001 PC sales amount in SLC from the CFO and the
SLC Account Manager are not the same.
5. Need to improve the practices of making informed decisions.E.g., Did the VP for Marketing decide on the advertising
budgets for branches in the SW region based on their sales performances over the last five years?
6. Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to
wait for the MIS Department to generate sales performance reports and analyses.
5
Why Data Warehouse?
• ROI Problems7. Can I get more value out of my data?
Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time.
8. Can I do this cost-effectively?Options: federated (interoperable) databases vs. a data
warehouse
9. Can I easily scale up or change how I get knowledge out of my data?E.g., Add more regions, functional areas or years in sales
performance analyses.
6
Causes for the Problems
Cause 1: Isolated databases distributed in an enterprise
Sales
CRM
Inventory
A Root cause for problems 1, 4, 5, 6, 7, 8 and 9
7
Why Data Warehouse
• Cause 1: Isolated databases distributed in an enterprise
Sales
CRM
Inventory
Ad hoc access solutions cannot alleviate the problems
8
Why Data Warehouse
• Cause 2: Historical data is archived in offline storage systems
Sales
Another Root cause for problems 1, 4, 5, 6, 7, 8 and 9
Archive
Historical Sales Data
9
Why Data Warehouse
• Cause 2: Historical data is archived in offline storage systems
Sales
Ad hoc accesses are slow and inconvenient
Archive
Historical Sales Data
10
Cause 3: Metadata for Transaction DB systems
is Not User Friendly
Student Course
Instructor Dependent
Under-graduate Graduate
IS-A
Take
HasSSN
Address
Name
Phone
Major Major Minor
SSN
Rank
Name
C-Name C-No
Name
Relation
Sex
Grade
M M
M
1
1
12
Why Data Warehouse
• Cause 4: Query and programming languages are even less user friendly– DESB students’ academic grades and
GPAs since the freshman year– Sales amount distribution by product
category, customer state and year– Slicing and dicing – SQL statements???– Report/screen interface codes???
13
Why Data Warehouse
• Cause 5: Transaction databases are optimized (normalized) to process transactions but not to answer decision support queries– Bad query performance to join the
normalized tables – Heavy transaction processing
workload
14
What is Data Warehouse
Designed to solve problems associated with current database practices:
• Isolated, distributed databases
Sales
CRM
Inventory
Extract, replicate, integrate, cleanse & load
Data Warehouse
15
Why Data Warehouse
• Historical data is archived in offline storage systems
Sales Archiv
e
Historical Sales Data
Data Warehouse
Integrate Historical Data with Current Data
16
What is Data Warehouse• Causes 3, 4 and 5: Hard-to-understand metadata, and query and
programming languages; poor decision support query performances
• Solution: In data warehouse, organize data in subject –oriented way rather than process-oriented way – dimensional modeling.
17
Dimensional Modeling (Star Schema)
AcademicPerformance
. Grade
. Name
. Rank
Instructor
. Name
. UG/PG
. Major
Student
Course. Number. Title
Semester. Year. Length. Start date
18
Dimensional Modeling (Star Schema)
Sales. Qty. Amt
. Name
. State
. City
Branch
. Name
. Category
Product
Customer. Name. State. City
Time. Year. Quarter. Month
19
One System for Multiple Uses
DatabaseManagement
System (DBMS)
Database
ApplicationProgram
Interactive Queries/
Transactions
Database System
ApplicationProgram
Metadata
20
Two Worlds -> Two Systems
Operational DSS
OperationalApplication
OperationalApplication
OperationalApplication
Data warehouse
ExecutiveInformationSystem
DecisionSupportSystem(DSS)
Reporting
OLTPDBs
21
What is Data Warehouse
• Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process.
• 1. Subject-oriented means the data warehouse focuses on the high-level entities of business such as sales, products, and customers. This is in contrast to database systems, which deals with processes such as placing an order.
22
What is Data Warehouse
2. Integrated means the data is integrated from distributed data sources and historical data sources and stored in a consistent format.
3. Time-variant means the data associates with a point in time (i.e., semester, fiscal year and pay period)
4. Non-volatile means the data doesn’t change once it gets into the warehouse.
23
OLAP Data Warehouse
OLTP DB
Purpose Decision Support Transaction Processing
Data Model
Dimensional Normalized Relational
Time Span Historical and Current Data
Current Data
Query processing
Scan a substantial subset of data
Scan a small set of data
Operation Read-only Read & Update
Characteristics of Data Warehouse
24
Data Warehouse and Data Mart
• Data warehouse – defined by its decision support purpose and other characteristics– Other characteristics: subject-oriented,
integrated
• Data mart – a data warehouse for a more limited business scope (e.g., a department, etc.)
• A data warehouse may be built from several data marts
25
Source System (Legacy)
extract
extract
extract
Storage:
Flat files (fastest); RDBMS; Other
Processing:
Clean; Prune; Combine; Remove duplicates; households; standardize; conform dimensions; store awaiting replications; archive; export to data marts
No user query services
Populate, replicate, recover
Data Mart #1: OLAP ( ROLAP and/or MOLAP) query services; dimensional! Subject oriented; locally implemented; user group driven; may store atomic data; may be frequently refreshed; conform to DW Bus
Data Mart #2
Data Mart #3
Populate, replicate, recover
Populate, replicate, recover
Ad Hoc Query Tools
Report Writers
End User Applications
feed
feed
feed
feed Models: forecasting; scoring; allocating; data mining; other downstream Systems; other parameters; special UI
Data Staging Area The Data Warehouse Presentation Servers End User
Data Access
Uploaded cleaned dimensions
Uploaded model results
Basic Elements of a Data Warehouse System
DW BUS
DW BUS
Conformed dimensions and facts
Conformed dimensions and facts
Relational
Flat filesSpreadsheets
ERPLegacy
26
Current Practice of DW*
• Expected DW market value in 2002 was projected to have grown to $113.5 billion.
• Average DW development cost is $1.5 million and average maintenance cost is $0.5 million.
• DW development time ranges from 1 to 3 yrs.
* Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
27
Current Practice of DW*
• Sponsorship for the DW projectSponsor PercentageVP of a business unit 39.8CIO 26.9Business unit manager 16.7CEO 11.1Other 25.0
* Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
28
Current Practice of DW*
• DW Benefits– Less effort to produce better
information– Better decisions– Improvement of business processes– Support for accomplishments of
strategic business objectives• Return on Investments and Cost of
Ownership?* Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001