8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
1/38
Introduction toData Warehousing
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
2/38
The Importance of Data
Warehousing Provide a single version of the truth
Improve decision making
Support key corporate initiatives such asperformance management, B2C and B2Be-commerce, and customer relationshipmanagement
Estimated to be a $113.5 billion market in2002 for systems, software, services, andin-house expenditures (Palo AltoManagement Group)
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
3/38
A Simple Definition
A data warehouse is a collection of
data created to support decision-
making applications.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
4/38
Data Warehouse
Characteristics Subject oriented -- data are organized
around sales, products, etc.
Integrated -- data are integrated toprovide a comprehensive view
Time variant -- historical data aremaintained
Nonvolatile -- data are not updated byusers
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
5/38
Another Definition
Data warehousing is the entire
process of data extraction,
transformation, and loading of data tothe warehouse and the access of the
data by end users and applications.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
6/38
Data Mart
A data mart stores data for a limited number of
subject areas, such as marketing and sales data. It is
used to support specific applications.
An independent data mart is created directly from
source systems.
A dependent data mart is populated from a data
warehouse.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
7/38
Operational Data Store
An operational data store consolidates data from
multiple source systems and provides a near real-
time, integrated view of volatile, current data.
Its purpose is to provide integrated data for
operational purposes. It has add, change, and delete
functionality.
It may be created to avoid a full blown ERP
implementation.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
8/38
Prod
Mkt
HR
Fin
Acctg
Data Sources
Transaction Data
IBM
IMS
VSAM
Oracle
Sybase
ETL Software Data Stores Data AnalysisTools and
Applications
Users
Other Internal Data
ERP SAP
Clickstream Informix
Web Data
External Data
Demographic Harte-
Hanks
S
T
A
GI
NG
AR
EA
O
P
ER
AT
IO
NA
L
D
AT
A
ST
OR
E
Ascential
Extract
Sagent
SAS
Clean/Scrub
TransformFirstlogic
Load
Informatica
Data MartsTeradataIBM
DataWarehouse
MetaData
Finance
Marketing
Sales
Essbase
Microsoft
ANALYSTS
MANAGERS
EXECUTIVES
OPERATIONAL
PERSONNEL
CUSTOMERS/
SUPPLIERS
SQL
Cognos
SAS
Queries,Reporting,
DSS/EIS,
Data Mining
Micro Strategy
Siebel
Business
Objects
Web
Browser
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
9/38
Two Data Warehousing
Strategies Enterprise-wide warehouse, top
down, the Inmon methodology
Data mart, bottom up, the Kimballmethodology
When properly executed, both result
in an enterprise-wide datawarehouse, but with differentarchitectures
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
10/38
The Data Mart Strategy The most common approach
Begins with a single mart and architected marts
are added over time for more subject areas Relatively inexpensive and easy to implement
Can be used as a proof of concept for datawarehousing
Can perpetuate the silos of informationproblem
Can postpone difficult decisions and activities
Requires an overall integration plan
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
11/38
The Enterprise-wide Strategy A comprehensive warehouse is built
initially
An initial dependent data mart is builtusing a subset of the data in thewarehouse
Additional data marts are built usingsubsets of the data in the warehouse
Like all complex projects, it is expensive,time consuming, and prone to failure
When successful, it results in anintegrated, scalable warehouse
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
12/38
Data Sources and Types Primarily from legacy, operational
systems
Almost exclusively numerical data at thepresent time
External data may be included, oftenpurchased from third-party sources
Technology exists for storing unstructureddata and expect this to become moreimportant over time
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
13/38
Extraction, Transformation,
and Loading (ETL) Processes
The plumbing work of datawarehousing
Data are moved from source totarget data bases
A very costly, time consuming part
of data warehousing
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
14/38
Data Extraction Often performed by COBOL routines
(not recommended because of high
program maintenance and noautomatically generated meta data)
Sometimes source data is copied to thetarget database using the replicationcapabilities of standard RDMS (not
recommended because of dirty data inthe source systems)
Increasing performed by specialized ETLsoftware
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
15/38
Sample ETL Tools DataStage from Ascential Software
SAS System from SAS Institute
Power Mart/Power Center fromInformatica
Sagent Solution from Sagent
Software Hummingbird Genio Suite from
Hummingbird Communications
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
16/38
Reasons for Dirty Data Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys, Non-Unique Identifiers Data Integration Problems
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
17/38
Data Cleansing Source systems contain dirty data that
must be cleansed
ETL software contains rudimentary datacleansing capabilities
Specialized data cleansing software isoften used. Important for performingname and address correction andhouseholding functions
Leading data cleansing vendors includeVality (Integrity), Harte-Hanks (Trillium),and Firstlogic (i.d.Centric)
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
18/38
Data Staging Often used as an interim step between data
extraction and later steps
Accumulates data from asynchronous sources
using native interfaces, flat files, FTP sessions,or other processes
At a predefined cutoff time, data in the stagingfile is transformed and loaded to the warehouse
There is usually no end user access to thestaging file
An operational data store may be used for datastaging
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
19/38
Data Transformation Transforms the data in accordance
with the business rules and
standards that have beenestablished
Example include: format changes,
deduplication, splitting up fields,replacement of codes, derivedvalues, and aggregates
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
20/38
Data Loading Data are physically moved to the
data warehouse
The loading takes place within aload window
The trend is to near real time
updates of the data warehouse asthe warehouse is increasingly usedfor operational applications
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
21/38
Meta Data Data about data
Needed by both information technology
personnel and users IT personnel need to know data sources
and targets; database, table and columnnames; refresh schedules; data usagemeasures; etc.
Users need to know entity/attributedefinitions; reports/query tools available;report distribution information; help deskcontact information, etc.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
22/38
Database Vendors High end (i.e., terabyte plus)
vendors include IBM (DB2) and
NCR-Teradata (Teradata) Oracle (8i) and Microsoft (SQL
Server 7) are major players for
smaller databases
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
23/38
On-line Analytical
Processing (OLAP) A set of functionality that facilitates
multidimensional analysis
Allows users to analyze data in waysthat are natural to them
Comes in many varieties -- ROLAP,
MOLAP, DOLAP, etc.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
24/38
ROLAP Relational OLAP
Uses a RDBMS to implement and OLAP
environment Typically involves a star schema to
provide the multidimensional capabilities
OLAP tool manipulates RDBMS star
schema data
Called slowlap by MOLAP vendors
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
25/38
MOLAP Multidimensional OLAP
Uses a MDDBS (e.g., Essbase) to
store and access data
Usually requires proprietary(non SQL) data access tools
Provides exceptionally fast responsetimes
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
26/38
Star Schema Creates non-normalized data
structures
Easier for users to understand Optimized for OLAP
Uses fact (facts or measures in thebusiness) and dimension(establishes the context of the facts)tables
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
27/38
OLAP Tools Products come from vendors such as Brio, Cognos,
Hyperion, and BusinessObjects
Typically available as a fat or thin (i.e., brow ser) client
In a web environment, the browser communicates
w ith a web server, which talks to an application
server, which connects to backend databases
The application server provides query, reporting, and
OLAP analysis functionality over the web
Java applets or downloaded components augment the
thin client
A broadcast server may be used to schedule, run,
publish, and broadcast reports, alerts, and responses
over the LAN, email, or personal digital assistant.
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
28/38
Claim
# Physician ID
# Patient ID
# Service Code
# Payer ID# Claim Number
# Line Item Number
# Claim Date
Date of Services
Amount of Charge
Unit of Services
Service
#Service Code
Service Description
#Category Code
Time Periods
#Claim DateYear
Month
Quarter
Week
Payer
#Payer ID
Name
Address
Phone Number
EDI Number
Star Schema
Patient
#Patient ID
Patient Name
Address
Age
SexInsurance ID
Physician
#Physician ID
Physician Name
Specialty ID
Credential ID
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
29/38
Dimension Table Examples Retail -- store name, zip code, product
name, product category, day of week
Telecommunications -- call origin, calldestination
Banking -- customer name, accountnumber, branch, account officer
Insurance -- policy type, insured party
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
30/38
Fact Table Examples Retail -- number of units sold, sales
amount
Telecommunications -- length ofcall in minutes, average number ofcalls
Banking -- average monthly
balance Insurance -- claims amount
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
31/38
Warehouse Users Analysts
Managers
Executives
Operational personnel
Customers and suppliers
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
32/38
Warehouse Tools and
Applications SQL queries
Managed query environments
Structured and ad hoc reports DSS/EIS
Portals
Data mining Packaged applications
Custom-built applications
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
33/38
Owens & Minor Owens&Minor -- data warehousing has
supported integration along the supply chain.Winner of the 1999 TDWI Leadership Award
the nation's leading distributor of name-brandmedical and surgical supplies
has transformed its business model byintegrating supply chain management, e-business, data warehousing, and Internet
technologies as part of this initiative, WISDOM
(WebIntelligence Supporting Decisions fromOwens & Minor) has been especially valuable
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
34/38
Raw MaterialsSuppliers
Manufacturer Provider PatientOwens & Minor
PRODUCT
INFORMATION
Raw MaterialsSuppliers
Manufacturer Provider PatientOwens & Minor
PRODUCT
INFORMATION
+ 1,400 manufacturers + 4,000 Acute Care Facilities
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
35/38
WISDOM a Web-based decision support system
that provides information to OMsemployees, suppliers and customers
accesses data from a data warehousethat maintains supplier and customertransaction data
sold to trading partners as a value added
product WISDOM II provides data about the
transactions that suppliers and customershave with all of their trading partners
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
36/38
Sample Applications Supports reporting and queries for
internal personnel
Supports an EIS for senior management Suppliers can determine their market
share in specific hospitals
Hospitals can identify which products arebeing bought off contract
WISDOM II extends data warehousing totrading partners through an outsourcingarrangement
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
37/38
Questions
8/3/2019 Data-Warehousing [Compatibility Mode] - Copy
38/38
Top Related