Designing the Data Warehouse - Part 1
Transcript of Designing the Data Warehouse - Part 1
-
8/12/2019 Designing the Data Warehouse - Part 1
1/45
Designing the data warehouse
/ data marts
Methodologies and Techniques
-
8/12/2019 Designing the Data Warehouse - Part 1
2/45
Basic principles
-
8/12/2019 Designing the Data Warehouse - Part 1
3/45
Life cycle of the DW
Operational Databases Warehouse Database
First time load
Refresh
Refresh
Refresh
Purge or Archive
-
8/12/2019 Designing the Data Warehouse - Part 1
4/45
Oracle Warehouse
ComponentsRelational
tools
Applications/ Web
Any Data Any AccessAny Source
Externaldata
Operational
data
OLAPtools
Text, image
Oracle Medi`
Relational /Multidimensional
Spatial
Audio,videoWeb
-
8/12/2019 Designing the Data Warehouse - Part 1
5/45
Oracle Intelligence Tools
IS developsusers Views
Oracle Reports
Current
Business users
Oracle Discoverer
Tactical
Analysts
Oracle Express
Strategic
-
8/12/2019 Designing the Data Warehouse - Part 1
6/45
Oracle Data Mart Suite
Ware-housingEngines
Data ModelingOracle Data Mart Designer
DataManagement
Oracle EnterpriseManager
DataExtraction
Oracle Data MartBuilder
Data Access& AnalysisDiscoverer &
Oracle Reports
OLTP
Engines
OLTP
DatabasesData Mart
Database
Oracle8
SQL*PLUS
-
8/12/2019 Designing the Data Warehouse - Part 1
7/45
Big Bang Approach:
Advantages and
Disadvantages Advantages: warehouse built as part of major project
(eg: BPR) Having a big picture of the data
warehouse before starting the data
warehousing project
Disadvantages:
Involves a high risk, takes a longer time
Runs the risk of needing to change
requirements
-
8/12/2019 Designing the Data Warehouse - Part 1
8/45
Incremental Approach to
Warehouse Development Multiple iterations
Shorterimplementations
Validation of each
phase
Strategy
Definition
Analysis
Design
Build
Production
-
8/12/2019 Designing the Data Warehouse - Part 1
9/45
Benefits of an Incremental
Approach Delivers a strategic data warehouse
solution through incremental development
efforts
Provides extensible, scalable architecture
Quickly provides business benefits and
ensures a much earlier return of
investment Allows a data warehouse to be built based
on a subject or application area at a time
Allows the construction of an integrateddata mart environment
-
8/12/2019 Designing the Data Warehouse - Part 1
10/45
Data Mart
A subset of a data warehouse thatsupports the requirements of a
particular department or businessfunction.
Characteristics include: Do not normally contain detailed operational
data unlike data warehouses.
May contain certain levels of aggregation
-
8/12/2019 Designing the Data Warehouse - Part 1
11/45
MarketingSales
FinanceHuman Resources
Dependent Data Mart
DataWarehouse
Data Marts
External Data
Flat Files
OperationalSystems Marketing
Sales
Finance
-
8/12/2019 Designing the Data Warehouse - Part 1
12/45
Independent Data Mart
Sales or Marketing
External Data
Flat FilesOperationalSystems
-
8/12/2019 Designing the Data Warehouse - Part 1
13/45
Reasons for Creating a Data
Mart
To give users more flexible access to
the data they need to analyse most
often.
To provide data in a form that matches
the collective view of a group of users
To improve end-user response time.
Potential users of a data mart are
clearly defined and can be targeted for
support
-
8/12/2019 Designing the Data Warehouse - Part 1
14/45
Reasons for Creating a Data
Mart To provide appropriately structured data as
dictated by the requirements of the end-user
access tools.
Building a data mart is simpler compared with
establishing a corporate data warehouse.
The cost of implementing data marts is farless than that required to establish a data
warehouse.
-
8/12/2019 Designing the Data Warehouse - Part 1
15/45
Data Marts Issues
Data mart functionality
Data mart size
Data mart load performance
Users access to data in multiple datamarts
Data mart Internet / Intranet access
Data mart administration Data mart installation
-
8/12/2019 Designing the Data Warehouse - Part 1
16/45
Example of DW tool OLAP
Rotate and drill down to successivelevels of detail.
Create and examine calculated data
interactively on large volumes of data. Determine comparative or relative
differences.
Perform exception and trend analysis. Perform advanced analytical functions
for example forecasting, modeling, and
regression analysis
-
8/12/2019 Designing the Data Warehouse - Part 1
17/45
Original OLAP Rules
1. Multidimensional conceptual view
2. Transparency
3. Accessibility4. Consistent reporting performance
5. Client-server architecture
-
8/12/2019 Designing the Data Warehouse - Part 1
18/45
Original OLAP Rules
6. Multiuser support
7. Unrestricted cross-dimensional
operations8. Intuitive data manipulation
9. Flexible reporting
10. Unlimited dimensions andaggregation levels
-
8/12/2019 Designing the Data Warehouse - Part 1
19/45
1001
1007
1010
1020
Relational Database Model
31
42
22
32
F
M
M
F
Anderson
Green
Lee
Ramos
Attribute 1Name
Attribute 2Age
Attribute 3Gender
Row 1
Row 2
Row 3
Row 4
The table above illustrates the employee relation.
Attribute 4Emp No.
-
8/12/2019 Designing the Data Warehouse - Part 1
20/45
Multidimensional Database
Model
The data is found at the intersection of
dimensions.
Store
GL_Line
Time
FINANCE
Store
Product
Time
SALES
Customer
-
8/12/2019 Designing the Data Warehouse - Part 1
21/45
Two dimensions
-
8/12/2019 Designing the Data Warehouse - Part 1
22/45
Three dimensions
-
8/12/2019 Designing the Data Warehouse - Part 1
23/45
Specialised Multidimensional tool Benefits:
Quick access to very large volumes of data
Extensive and comprehensive libraries of
complex functions
analysis
Strong modeling and forecasting capabilities
Can access multidimensional and relational
database structures
Caters for calculated fields
Disadvantages:
Difficulty of changing model
Lack of support for very large volumes of data
May require significant processing power
-
8/12/2019 Designing the Data Warehouse - Part 1
24/45
MOLAP Server The application layer
stores data in amultidimensional structure
The presentation layer
provides the
multidimensional view MOLAPEngine
DSS client
Applicationlayer
Warehouse
Efficient storage and processing
Complexity hidden from the
user Analysis using preaggregated
summaries and precalculated
measures
-
8/12/2019 Designing the Data Warehouse - Part 1
25/45
-
8/12/2019 Designing the Data Warehouse - Part 1
26/45
-
8/12/2019 Designing the Data Warehouse - Part 1
27/45
ROLAP
ExpressServer
ExpressuserWarehouse
Datacache
Livefetch
Cache
Query
Data
Also Hybrid (HOLAP)
-
8/12/2019 Designing the Data Warehouse - Part 1
28/45
Choosing a Reporting
Architecture Business needs
Potential for growth
interface
enterprise architecture
Network architecture
Speed of access Openness
MOLAP
ROLAP
Simple Complex
Query
Performance
Good
OK
Analysis
-
8/12/2019 Designing the Data Warehouse - Part 1
29/45
Data Acquisition
Identify, extract, transform, and transportsource data
Consider internal and external data
Perform gap analysis between source dataand target database objects
Plan move of data between sources and target
Define first-time load and refresh strategy
Define tool requirements
Build, test, and execute data acquisition
modules
-
8/12/2019 Designing the Data Warehouse - Part 1
30/45
Modeling Warehouses differ from operational
structures:
Analytical requirements
Subject orientation
Data must map to subject orientedinformation:
Identify business subjects
Define relationships between subjects
Name the attributes of each subject
Modeling is iterative
Modeling tools are available
-
8/12/2019 Designing the Data Warehouse - Part 1
31/45
1. Defining the businessmodel
2. Creating the dimensional
model3. Modeling summaries
4. Creating the physical model
Physical model
1
2, 3
4
Select abusinessprocess
Modeling the Data Warehouse
-
8/12/2019 Designing the Data Warehouse - Part 1
32/45
Identifying Business Rules
Product
Type Monitor Status
PC 15 inch New
Server 17 inch Rebuilt19 inch CustomNone
Location
Geographic proximity
0 - 1 miles1 - 5 miles> 5 miles
Store
Store > District > Region
Time
Month > Quarter > Year
-
8/12/2019 Designing the Data Warehouse - Part 1
33/45
Creating the Dimensional Model
Identify fact tables Translate business measures into facttables
Analyze source system information for
additional measures Identify base and derived measures
Document additivity of measures
Identify dimension tables
Link fact tables to the dimensiontables
Create views for users
-
8/12/2019 Designing the Data Warehouse - Part 1
34/45
Dimension Tables
Dimension tables have the following
characteristics:
Contain textual information that
represents the attributes of the business Contain relatively static data
Are joined to a fact table through a
foreign key reference Product ChannelFacts(units,price)
Customer Time
-
8/12/2019 Designing the Data Warehouse - Part 1
35/45
Fact Tables
Fact tables have the following characteristics: Contain numeric measures (metrics) of the
business
May contain summarized (aggregated) data
May contain date-stamped data Are typically additive
Have key value that is typically a concatenatedkey composed of the primary keys of the
dimensions Joined to dimension tables through foreign
keys that reference primary keys in thedimension tables
-
8/12/2019 Designing the Data Warehouse - Part 1
36/45
Dimensional Model (Star
Schema)
Product Channel
Facts(units,price)
Customer Time
Dimension tables
Fact table
-
8/12/2019 Designing the Data Warehouse - Part 1
37/45
Star Schema Model
Central fact table
Radiating dimensions
Denormalized model
Store TableStore_idDistrict_id...
Item TableItem_idItem_desc...
Time TableDay_idMonth_idPeriod_id
Year_id
Product TableProduct_idProduct_desc
Sales Fact TableProduct_id
Store_idItem_idDay_idSales_dollarsSales_units...
-
8/12/2019 Designing the Data Warehouse - Part 1
38/45
Star Schema Model
Easy for users to understand
Fast response to queries
Simple metadata Supported by many front end tools
Less robust to change
Slower to build
Does not support history
-
8/12/2019 Designing the Data Warehouse - Part 1
39/45
Snowflake Schema Model
Time TableWeek_idPeriod_idYear_id
Dept TableDept_id
Dept_descMgr_id
Mgr TableDept_idMgr_id
Mgr_name
Product TableProduct_id
Product_desc
Item TableItem_id
Item_descDept_id
Sales Fact TableItem_idStore_id
Sales_dollarsSales_units
Store TableStore_id
Store_descDistrict_id
District TableDistrict_id
District_desc
-
8/12/2019 Designing the Data Warehouse - Part 1
40/45
-
8/12/2019 Designing the Data Warehouse - Part 1
41/45
Using Summary Data
Provides fast access to precomputed
data
Reduces use of I/O, CPU, and memory
Is distilled from source systems and
precalculated summaries
Usually exists in summary fact tables
Phase 3: Modeling summaries
-
8/12/2019 Designing the Data Warehouse - Part 1
42/45
Designing Summary Tables
Units Sales() Store
Product ATotal
Product B
TotalProduct CTotal
Average
Maximum
Total
Percentage
-
8/12/2019 Designing the Data Warehouse - Part 1
43/45
Summary Tables Example
SALES FACTSSales Region Month10,000 North Jan 9912,000 South Feb 9911,000 North Jan 99
15,000 West Mar 9918,000 South Feb 9920,000 North Jan 9910,000 East Jan 992,000 West Mar 99
SALES BY MONTH/REGIONMonth Region Tot_Sales$Jan 99 North 41,000Jan 99 East 10,000Feb 99 South 40,000
Mar 99 West 17,000
SALES BY MONTH
Month Tot_SalesJan 99 51,000Feb 99 40,000Mar 99 17,000
-
8/12/2019 Designing the Data Warehouse - Part 1
44/45
Summary Management
in Oracle8i
Product
Region
Time
Salessummary
City
Sales
State
Summaryusage
Summary advisorSpace
requirementsSummaryrecommendations
-
8/12/2019 Designing the Data Warehouse - Part 1
45/45
The Time Dimension
How and where should it be stored?
TimedimensionSales fact
Time is critical to the data warehouse.
A consistent representation of time is
required for extensibility.