Post on 22-Oct-2014
Data Warehouse Data Warehouse Development Approaches Development Approaches
11
Fundamental QuestionsFundamental Questions
Before deciding to build a data warehouse for your organization, you need to ask the following basic and fundamental questions and address the relevant issues:Top-down or bottom-up approach?Enterprise-wide or departmental?Which first—data warehouse or data mart?Build pilot or go with a full-fledged implementation?Dependent or independent data marts?
2
3
Data Warehouse Data Warehouse Development ApproachesDevelopment Approaches
Data warehouse development approaches
◦ Inmon Model: EDW approach ◦ Kimball Model: Data mart approach
Which model is better?◦ There is no one-size-fits-all strategy to
data warehousing ◦ One alternative is the hosted warehouse
General Data Warehouse General Data Warehouse Development ApproachesDevelopment Approaches
“Big bang” approach
Incremental approach:◦Top-down incremental approach◦Bottom-up incremental approach
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 4
““Big Bang” ApproachBig Bang” Approach
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 5
Analyze enterpriserequirements
Build enterprisedata warehouse
Report in subsets orstore in data marts
Incremental Approach Incremental Approach to Warehouse Developmentto Warehouse Development Multiple iterations Shorter implementations Validation of each phase
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 6
Strategy
Definition
Analysis
Design
Build
Production
Increment 1
Iterative
Top-Down ApproachTop-Down Approach
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 7
Analyze requirements at the enterprise level
Develop conceptual information model
Identify and prioritize subject areas
Complete a model of selected subject area
Map to available data
Perform a source system analysis
Implement base technical architecture
Establish metadata, extraction, and load processes for the initial subject area
Create and populate the initial subject area data mart within the overall warehouse
framework
Top downTop down
The advantages of this approach are:
A truly corporate effort, an enterprise view of data Inherently architected—not a union of disparate data marts Single, central storage of data about the content Centralized rules and control May see quick results if implemented with iterations
The disadvantages are: Takes longer to build even with an iterative method High exposure/risk to failure Needs high level of cross-functional skills High outlay without proof of concept
8
Bottom-Up ApproachBottom-Up Approach
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 9
Define the scope and coverage of the data warehouse and analyze the source systems within this scope
Define the initial increment based on the political pressure, assumed business benefit and data volume
Implement base technical architecture and establish metadata, extraction, and load processes as required by increment
Create and populate the initial subject areas within the overall warehouse framework
Bottom-UpBottom-Up
The advantages of this approach are:
Faster and easier implementation of manageable pieces Favorable return on investment and proof of concept Less risk of failure Inherently incremental; can schedule important data marts
first Allows project team to learn and grow
The disadvantages are: Each data mart has its own narrow view of data Permeates redundant data in every data mart Perpetuates inconsistent and irreconcilable data Proliferates unmanageable interfaces
10
Dimensional Modeling Dimensional Modeling ProcessProcessHigh level dimensional model design
◦ Choosing business model◦ Declaring the grain◦ Choosing dimensions◦ Identifying the facts
Detailed dimensional model developmentDimensional model review and validation
◦ IS◦ Core users◦ Business community
Final design iteration
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 11
Supplemental Slides : Supplemental Slides : Data Warehouse Design Data Warehouse Design Phases Phases
12
Defining the Business Defining the Business RequirementsRequirements The concept of business dimensions is fundamental to
the requirements definition for a data warehouse.
13
Information packageInformation packageYour primary goal in the requirements definition phase is to compile information packagesOnce you have firmed up the information packages, you’ll be able to proceed to the other phases.Essentially, information packages enable you to:
◦ Define the common subject areas
◦ Design key business metrics
◦ Decide how data must be presented
◦ Determine how users will aggregate or roll up
◦ Decide the data quantity for user analysis or query
◦ Decide how data will be accessed
14
15
16
Supplemental Slides : Supplemental Slides : The Others The Others
17
18
Snowflake Schema ModelSnowflake Schema Model
◦Direct use by some tools◦More flexible to change◦Provides for speedier data loading◦Can become large and unmanageable◦Degrades query performance◦More complex metadata
18
Country State County City
Degenerate DimensionsDegenerate Dimensions
order_number and order_line in the fact table
For example, you may be looking for average number of products per order. Then you will have to relate the products to the order number to calculate the average.
Attributes such as order_number and order_line in the example are called degenerate dimensions and these are kept as attributes of the fact table.
19
20
Storage and Performance Storage and Performance ConsiderationsConsiderations
Database sizingData partitioningIndexingStar query optimization
20
21
Database Sizing - Test Load Database Sizing - Test Load SamplingSampling
Analyze a representative sample of the data chosen using proven statistical methods.
Ensure that the sample reflects:◦Test loads for different periods◦Day-to-day operations◦Seasonal data and worst-case scenarios◦ Indexes and summaries
21
22
Data PartitioningData Partitioning
Breaking up of data into separate physicalunits that can be handled independently
Types of data partitioning ◦ Horizontal partitioning. ◦ Vertical partitioning
22
23
IndexingIndexing
Indexing is used for the following reasons:◦ It is a huge cost saving, greatly
improving performance and scalability.
◦ It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.
23
24
ParallelismParallelism
24
Parallel Execution Servers
Sales table
Customerstable
P3
P3
P1
P1
P2
P2
25
Using Summary DataUsing Summary Data
Designing summary tables offers the following benefits:◦Provides fast access to precomputed data◦Reduces use of I/O, CPU, and memory
25