Designing A Data Warehouse With Sql 2008
-
Upload
thomduclos -
Category
Documents
-
view
2.021 -
download
6
description
Transcript of Designing A Data Warehouse With Sql 2008
© 2005-2009 Kimball Group. All rights reserved. Page 1
DESIGNING A DATA
WAREHOUSE WITH SQL
SERVER 2008
Joy Mundy, [email protected]
Introductions and Background
� Presenter
� Joy Mundy, Kimball Group
� Kimball Group
� The authors of the Data Warehouse Toolkit series of books, including the Microsoft Data Warehouse Toolkit
� Kimball University DW / BI courses
� DW / BI strategic consulting
2
© 2005-2009 Kimball Group. All rights reserved. Page 2
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
3
Some Possible Reasons
A. The CIO told us to
B. It sounds like fun
C. It’s a great opportunity for us to add significant value to the business
D. We’re not building a DW/BI system, just an Executive Dashboard
Which one of these is “Right”?
4
© 2005-2009 Kimball Group. All rights reserved. Page 3
Answer: C.
� It is a great opportunity to add business value
� It is also a great opportunity to:
� Work with senior management
� Advance your career
� Play with fun, new technology
� However, there are a few risks…
5
DW / BI System Risks
� High profile� Success (and failure) is visible to senior management
� Business driven – can be hard for technologists
� Technology focus is rarely successful � “Build it and they will come” doesn't work
� Dashboards are appropriate for mature DW/BI systems, but are not a starting point
� Data quality and integration are hard problems, even if the technology works well
� The project is complex and politically challenging� Follow a proven approach
6
© 2005-2009 Kimball Group. All rights reserved. Page 4
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
7
Architecture Principles
� Business requirements determine architecture
� Listen to business requirements and translate them
into functional components
� This means your DW/BI system architecture will not
be the same as your neighbor’s
� Do not build major DW/BI components because you
are supposed to
8
© 2005-2009 Kimball Group. All rights reserved. Page 5
Architectural Approaches
� Build reports directly from the transaction systems
� Standalone marts
� Normalized data warehouse feeding downstream marts
� Kimball dimensional data warehouse
9
Others
ESRT
DIM
Sources•HR•Projects•Siebel•Skills Dtb•CustSat files•Sales•SAP•FeedWrx•Business lists•Many others
Sales Mart
Sales+
KPI View
CSAT old
PCD
CFR
RoBCSAT+
EMR Capacity Planning
Standalone Marts
Pros•Marts reflect business requirements•Get business value this year
Cons•Multiple extracts of the same data•Multiple transforms•Inconsistent versions of the same data•10th mart takes as long to build as first
© 2005-2009 Kimball Group. All rights reserved. Page 6
Others
ESRT
DIM
Sources•HR•Projects•Siebel•Skills Dtb•CustSat files•Sales•SAP•FeedWrx•Business lists•Many others
Sales+
KPI View
CSAT+
Capacity Planning
Normalized DW and Downstream Marts
Enterprise Data Warehouse
(not dimensional)•Integrated•Historical•Design reflects source systems
RoB
Pros•Data extracted and consolidated only once•Marts reflect business requirements
Cons•Takes too long to build a new mart•Too many business rules between EDW & marts; we still get inconsistencies•EDW is by (and for) IT, using its language and structures•Marts are for the business
Mart B
Sources•HR•Projects•Siebel•Skills Dtb•CustSat files•Sales•SAP•FeedWrx•Business lists•Many others
Mart A
Kimball Dimensional Data Warehouse
Kimball-style Dimensional
Enterprise Data Warehouse
•Integrated & historical•Design reflects analytic requirements•Built incrementally•Contains the most detailed data possible•Fact data hooks together via shared (conformed) dimensions•Presentation area is relational or OLAP•OLAP is recommended for Msft platform•(Still need relational DW)
Pros•Data extracted and consolidated only once•DW design meets business requirements•Data is structured to support easy analytic use with good perf•Data and terms are consistent•Once data is in the DW, building new KPIs or BI applications is much easier
Cons•Takes longer to get biz value than simply throwing together a mart
User applications
•Most “marts” become views into the enterprise system•Ad hoc use is supported and encouraged
We may supplement the main DW/BI system with a handful of custom BI apps that meet specific needs. These are the exception.
© 2005-2009 Kimball Group. All rights reserved. Page 7
Summary of Architectures
Approach Trxn system
burden
Ease of use Time to market
Report directly from trxn systems
Very high Very poor Poor
Departmental marts Moderate Good until you need something new. Navigation challenges
“90 days”, no economies of scale
Normalized DW + marts
Low DW = poorMarts = good until you need something newNavigation challenges
Huge up-front investment. Marts are“60 days”
Kimball dimensional DW
Low Very good Large up-front investment. Excellent economies of scale.
13
Sou
rce
Sys
tem
sS
ourc
e S
yste
ms
Bus
ines
s/E
xtra
ct
Bus
ines
s/E
xtra
ct
Rul
esR
ules
Bus
ines
s/E
xtra
ct
Bus
ines
s/E
xtra
ct
Rul
esR
ules RDBMSRDBMS
OLAPOLAP
MetadataMetadataMetadataMetadataB
usin
ess
Use
rsB
usin
ess
Use
rs
Dat
a Q
ualit
yD
ata
Qua
lity
Dim
ensi
onal
izat
ion
Dim
ensi
onal
izat
ion
The Microsoft DW/BI Technical Architecture
••SharePointSharePoint••Report BuilderReport Builder••Performance Performance PointPoint
© 2005-2009 Kimball Group. All rights reserved. Page 8
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
15
Kimball Method Basic Principles
� Business driven
� Iterative Lifecycle
� Dimensional model for data delivery
� Enterprise data framework
� Bus Matrix
� Conformed dimensions
� Full solution from extracts to business value
16
© 2005-2009 Kimball Group. All rights reserved. Page 9
The Kimball DW/BI Lifecycle
Key Concepts:- Business centric- Full solution- Enterprise aware
- Dimensional delivery- Iterative process- Incremental growth
TechnicalArchitecture
Design
BIApplicationSpecification
BIApplicationSpecification
BIApplicationDevelopment
ProjectPlanning
BusinessRequire-
mentsDefinition
BusinessRequire-
mentsDefinition
Maintenance
Project ManagementProject Management
PhysicalDesign
ETL Design &Development
DimensionalModeling
DimensionalModeling
Deployment
Growth
ProductSelection &Installation
17
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
18
© 2005-2009 Kimball Group. All rights reserved. Page 10
Business Requirements (1)
� Interview key people across the org
� Ask “What do you do?” not “What do you want?”
� It is our job to design the solution, not theirs
� Look for common analytic themes
� Better promotion response rate
� Improve sales performance
� Break themes down into business processes that generate needed data
� Promotions Responses Orders
19
Business Requirements (2)
� Design the data warehouse Enterprise Bus Matrix
� Prioritize themes with senior management
� Summarize finding in a Requirements Document
� Identify and recruit good business sponsor(s)
� Visionary
� Influential
� Reasonable
20
© 2005-2009 Kimball Group. All rights reserved. Page 11
Profile the Data
� Early and often
� Does the data exist to support the required analysis?
� Where are the problems affecting ETL design
� Primary keys
� Referential integrity
� NULL values
� Junk values
� The dreaded “Notes” field
� SSIS 2008 has useful data profiling functionality
21
Requirements Prioritization Based on Value and Feasibility
Low
High
Bu
sin
ess
Val
ue
/ Im
pac
t
HighLow Feasibility
Orders
CallTracking
OrdersForecast
Returns
ExchangeRates
ManufacturingCosts
ProductProfitability
CustomerProfitability
Shipping
Key Concepts:� Created in a meeting with Senior Mgmt.� Boxes come from Bus. Requirements
� Relative value is a business decision� Relative feasibility needs IT input
Promotions
© 2005-2009 Kimball Group. All rights reserved. Page 12
Enterprise Bus Matrix
Adventure Works <-- Conformed Dimensions -->Data Warehouse Bus Matrix
Business Process Bus
ines
s P
rior
ity
Dat
e (O
rder
, Sta
rt, S
hip)
Pro
duct
Pro
mot
ion
End
Cus
tom
er
Em
ploy
ee
Res
elle
r
Pag
e
Inte
rnet
Reg
iste
red
Use
r
Par
t
Ven
dor
Shi
pper
Pro
blem
Orders Forecasting 2 x x x x xReseller Orders 1 x x x x xInternet Orders 1 x x x x x xPurchasing x x x x x x xParts Inventory x x x x xManufacturing 6 x x xFinished Goods Inv. x x xShipping 3 x x x x x x xReturns 5 x x x x x xCustomer Calls 4 x x x x x x x xWeb Support 4 x x x x x x x x
Key Concepts:Key Concepts:� The high level DW/BI data architecture� Rows = Business Processes� Columns = Conformed Dimensions� DW/BI system implemented row by row based on business priority
23
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
24
© 2005-2009 Kimball Group. All rights reserved. Page 13
Microsoft Technology for the DW Back Room
� Integration Services is a competitive ETL tool� Great performance, solid toolbox
� Relational Database is strong BI platform
� Key BI-related features, including partitioning, compression, and star join optimization
� Analysis Services is OLAP market leader� Dimensional design is flexible
� More scalable and manageable
� Data Mining – strong mining platform, leverages AS for speed; good integration
25
Relational vs. OLAP (Why OLAP?)
� Relational strengths
� Data management
� Flexibility
� OLAP strengths
� Analytic language
� Ad hoc query performance
� Metadata layer
� Security, especially for ad hoc queries
26
© 2005-2009 Kimball Group. All rights reserved. Page 14
Microsoft Technology for the DW/BI Front Room
� Reporting Services� Good enterprise platform
� Programmer-oriented report designer
� Limited ad hoc query
� Data presentation� Office, SharePoint, [ProClarity]
� Integrated development (VS) and management environments
� Scale – technology can scale to multi-TBs� Plan to spend more time and $, including on significant consulting
expertise.
� Real-time features
27
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
28
© 2005-2009 Kimball Group. All rights reserved. Page 15
The Dimensional Model (the Target)
� Based on top business priority data area
� Fact table = measurement of business events
� Dimension tables = objects that participate in business events (Customer, Product, Date, …)
� Surrogate keys (meaningless integer)
� Slowly changing dimensions
� Type 1 = Overwrite old values with new
� Type 2 = Add a new row when values change
� Identify data quality issues now
29
Relational Dimensional Model
Product KeyProduct KeyCustomer KeyCustomer KeyDate KeyDate Key… other keys… other keys
Sales AmountSales AmountSales QuantitySales Quantity… other … other measuresmeasures
Sales FactSales Fact
ProductProductDateDate
CustomerCustomerOther dims…Other dims…
© 2005-2009 Kimball Group. All rights reserved. Page 16
Surrogate Keys
� Dimension PKs should be surrogate (meaningless) keys
� Managed by the DW
� Usually an integer type
� Usually populated via IDENTITY keyword in dimension table definition
� Why?� Small (int) keys are vital for performance
� The source system will re-use keys. They swear they won’t. But they will.
� Enables dimension attribute change tracking
31
Surrogate Keys and ETL
� Dimensions
� Carry source system key(s) as non-key attributes in the dimension
� New rows automatically get a new surrogate key
� Facts
� Fact table usually does not contain source system keys
� Final step of fact processing is to exchange the source system keys for DW surrogate keys
� Lookup to dimension tables based on source key, returning surrogate key
32
© 2005-2009 Kimball Group. All rights reserved. Page 17
Conformed Dimensions
� One master dimension table that all fact tables subscribe to
� Get agreement organization-wide on:
� What the dimensions are called
� Which hierarchies you have
� Similar-but-different attributes and hierarchies have different names
� Which attributes are managed by restating history and which by tracking history
� Create two sets of attributes if you need it both ways
� Why?
� Single version of the truth
� Flexibility of basic design
33
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
34
© 2005-2009 Kimball Group. All rights reserved. Page 18
The Need for BI Applications
� Approximately 10% of your user population will learn how to build ad hoc queries
� They must learn the tool AND the data
� This means you must build applications to provide access to the other 90%
� Structured
� Flexible (parameters, pick lists, formats)
� Well organized
35
BI Application Steps
1. BI Application design and specs
� Right after business requirements
� Template, mock-ups, specs, navigation framework
2. BI Application development
� Can’t start until data and tools are available
� Pull out your specs and get to work
� Best to do this as part of Beta testing
36
© 2005-2009 Kimball Group. All rights reserved. Page 19
Standard Reports
� We recommend going live with a modest number of reports (8-12)
� Enlist business users in creating and QA-ing reports
� Users don’t know what they want until you show them something
� Lots of reports are “theme and variations” –parameterize them!
� Build a BI portal to host the reports
� Brand it with the DW/BI logo
� Add useful info about operations, contents, and help
37
Advanced BI Applications
� Planning and forecasting applications� You need a decent history of fairly accurate data before you can
plan / forecast
� Planning and forecasting activities are highly analytic, with a little bit of writeback
� Heavy emphasis on “what-if”
� Data mining
� Collection of statistical techniques to identify trends and correlations
� Requires detailed (atomic) data
� Can be the most valuable thing you do with your DW/BI system
� Advanced BI apps are not Phase 1 projects
38
© 2005-2009 Kimball Group. All rights reserved. Page 20
Agenda
� The “Right” reason to build a DW/BI system
� The Kimball Architecture
� The Kimball Method and Lifecycle
� Business Requirements
� Technology Track
� Data Track
� BI Applications
� Operations, Maintenance, and Growth
39
Deployment, Maintenance, and Growth
� Deployment has two major components� Software and data availability (dev, test, prod)
� User preparedness (training, documentation, and support)
� Maintenance
� Monitor usage and performance
� Anticipate problems
� Growth� Iterate back through the Lifecycle with the next priority business process
40
© 2005-2009 Kimball Group. All rights reserved. Page 21
Session Summary
� The DW/BI system can be high value, but it is definitely high risk
� Reduce risk by using an approach based on
� business requirements
� a flexible data architecture
� delivering the full solution
� Microsoft SQL Server 2008 provides the full technology stack for DW/BI systems
� SQL Server 2008 is well suited for the Kimball Method
41
Next Steps
� Learn about your business
� Strategies, challenges, opportunities, terms
� Industry, competition, trends
� Learn the Kimball Method
� Learn about adding business value
� Learn the Lifecycle approach
� Learn the Microsoft SQL Server 2008 DW/BI toolset
� Get started!
� Do a high level requirements definition and prioritization
42
© 2005-2009 Kimball Group. All rights reserved. Page 22
For More Information…
� Kimball University� Next 4-day Microsoft class on 3/31 (Chicago). Stockholm in May.
� Other classes in modeling, lifecycle, and ETL throughout the year
� Websites� www.kimballgroup.com: articles and design tips
� forum.kimballgroup.com: the Kimball Forum
� Kimball Books� The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren
Thornthwaite with Ralph Kimball, Wiley, 2006 (the Microsoft book)
� The Data Warehouse Toolkit 2nd Edition, Ralph Kimball and Margy Ross, Wiley, 2002 (the modeling book)
� The Data Warehouse Lifecycle Toolkit 2nd Edition, Kimball, Ross and Thornthwaite, Wiley, 2008 (how to build a DW)
� The Data Warehouse ETL Toolkit, Kimball and Caserta, Wiley, 2004 (ETL theory and practice)
43