Agile Data Warehousing for the Enterprise
Transcript of Agile Data Warehousing for the Enterprise
Agile Data Warehousing for the Enterprise A Guide for Solution Architects and Project Leaders
Ralph Hughes, MA, PMP, CSM
^ 8 f l § i AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD • PARIS SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
ELSEVIER Morgan Kaufmann is an imprint of Elsevier
Füll Contents
List of Figures List of Tables Abbreviations Foreword Acknowledgments
XVII
xxiii xxv
xxvii xxix
1. Solving Enterprise Data Warehousing's "Fundamental Problem"
The Agile Solution in a Nutshell 1 Five Legs to Stand Upon 3 The Agile EDW Alternative is Ready to Deploy 5 Defining a Baseline Method for Agile EDW 5 Plenty of Motivation to "Go Agile" 7 Structure of the Presentation Ahead 7
Part I Summaries of Generic Agile Development Methods
2. Primer on Agile Development Methods
Defining "Agile" 13 Agile Manifesto Values and Principles 19 Serum in a Nutshell 20
User Stories 21 Scrum's Five-Step Delivery Iteration 23
Contributions from Extreme Programming 26 XP Values and Principles 27
3. Introduction to Alternative Iterative Methods
Lean Software Development 31 Lean Origins 31 Lean Methods as a Long-Term Destination 32 Lean Principles and Tools 33
Kanban 41 Quick Sketch of the Kanban Method 41 Visualizing and Maintaining Continuous
Flow 43
Evidence-Based Service Levels Comparing Kanban to Serum
The Hybrid "Scrumban" Approach Rational Unified Process
RUP Overview Why NotRUPforDW/BI?
Part I References
Part II Review of Fast EDW Coding and Risk Mitigation
4. Essential DW/BI Background and Definitions
5. Recap of Agile DW/BI Coding Practices
Iterative Coding Alone Significantly Improves Bl Projects Yet Data Integration Remains
a Challenge
44 45 47 49 49 52
55
Primary Source for DW/BI Standards Defining Enterprise Data Warehousing
Basic Business Terms Data and Information Terms Information Services Terms Software Engineering Terms Basic Architectural Concepts
System Architecture Data Architecture Reference Architecture Enterprise Architecture
Architectural Frameworks Zachman Enterprise Architectural
Framework DAMA Functional Framework Hammergren DW Planning Matrix
Additional Data Warehousing Concepts Traditional Project Management Terms
60 61 63 65 66 67 70 70 71 74 75 76
76 76 77 79 82
85
85
IX
X Füll Contents
New Roles for DW/BI Projects Project Architect Data Modeler Systems Analyst System Tester Proxy Product Owner Serum Master Including the New Roles on the Team's
Whale Chart 80/20 Specifications Developer Stories
DW/BI User Stories Hide Much of the Data Integration Work
Developer Stories Make DW/BI Work More Manageable
Developer Stories Require a Deeper Understanding of Value
Current Estimates Adding Techniques from Kanban
Pipelined Delivery Work-in-Progress Limits for Developers Iteration —1 and 0 Two-Pass Testing
Evidence-Based Service Level Agreements Proof that Agile DW/BI Works
Investigating Project Cost Impacts in More Detail
Some Myths Prove True
86 87 88 88 89 89 90
90 90 92
92
93
94 95 97 98
100 100 101 102 104
106 107
6. Eliminating Risk Through Nested Iterations
EDW Programs Slip into "231 Swamps" 109 231 Swamps Derive from a Command
and Control Strategy 110 Agile's Fundamental Risk Mitigation
Technique 111 Agile's General Risk Mitigation Strategy 111 Eliminating Miscommunication with
Multiplexed Engineering Phases 113 Agile EDW's Extended Risk Mitigation
Techniques 114 Three Types of Risk Threaten EDW
Programs 114 Mitigating the Risk of Application
Coding Concept Errors 116 Mitigating the Risk of Solution Concept
Errors 116 Mitigating the Risk of Business Concept
Errors 119
Part II References 121
Part III Agile EDW Requirements Management
7. Balancing between Two Extremes
Building the Case for Effective Requirements Management 126
Developers Often Neglect Requirements Work 128
Motivating Teams to Take Requirements Seriously 128
Easy to Overinvest in Requirements Management 130 "Requirements Management" Formally
Defined 130 Traditional Projects Employ a Big Spec
Up Front 130 Requirements are Inherently Diverse 132 Business Process Reengineering Can
Add to the Complexity 135 Reasons Not to Overinvest in Requirement
Work 136 Precision at the Expense of Accuracy 137 Business Partners are Adverse to Traditional
Requirements Gathering Efforts 138 Traditional Requirements Management
Fails More than it Succeeds 139 The Greatest Failure is Losing Business
Opportunity 139 Agile's Approach Centers on Balance 141
Agile Objectives for Requirements Management 141
Knowing when a Backlog is "Good Enough" 143
Enable Regulär "Current Estimates" 144 Keeping the Requirements Management
Process Agile 144 Two Intersecting Requirements
Management Value Chains 144 Salient Differences between GRM
and ERM 147 Business Analysts Implicit in Two Project
Lead Roles 149
8. Redefining the Epic Stack to Enable Value Accounting
Toward a Robust Epic Decomposition Framework 151 Defining the Backlog Hierarchy's
Structure 151
Füll Contents xi
Aligning the Epic Stack to the Company's Hierarchy 152
Clearly Defining Each Level within the Epic Stack 154
Testing Whether Stories are Good Enough 156 Clarifying Everything with Value Accounting 159
The Basics of Value Accounting 160 Value Accounting Makes Developers
More Effective 161 Value Accounting Mitigates Project Risk 162
Allocating Value Throughout an Epic Tree 163 Identifying the Value of a Project 163 Allocating Value to Epics 164 Allocating Value to Themes and User
Stories 164 Value Buildups by Environment Provide
Motivation and Clarity 165
Artifacts for the Generic Requirements Value Chain
Beware of Requirements Churn User Modeling/Personas End Users' Hierarchy of Needs
Benefits Offered by the Bl Hierarchy of Needs
Mind Maps and Fishbone Diagrams Vision Boxes Vision Statements Product Roadmaps
169 170 171
173 174 176 176 178
10. Artifacts for the Enterprise Requirements Value Chain
The Generic Value Chain Can Overlook Crucial Requirements 181
ERM as a Flexible RM Approach 183 Focusing on Enterprise Aspects of Project
Requirements 184 Functionality Dimension 184 Polarity Dimension 185 Orientation Dimension 185 Streamlined ERM Templates 186
Uncovering Project Goals with Sponsor's Concept Briefing 186 Justification Type 187 Customer Experience Impacts 188 Functional Area Impacts Assessments 188 Value of the Program 188 Program Success Metrics 189
11.
Identifying Project Objectives with Stakeholder's Requests Business System Challenges Current Manual Solution Desired Business Solution
189 189 189 190
Volume Requirements and End-User Census 190 Dependent Systems 190
Sketching the Solution with a Vision Document 191 Solutions Statements 191 Features and Benefits List 191 Context Diagram 194 Target Business Model 196 High-Level Architectural Diagram 197 Nonfunctional Requirements 197
Segmenting the Project with Subrelease Overview 198 Subrelease Identifier 200 Subrelease Scope 200 Business Process Supported 202 Technical Description 207 Nonfunctional Requirements 208
Providing Developer Guidance with Module Use Cases 209 Goal 209 Standard Flow of Events 209 Alternative Flow of Events 210 Special Requirements 212 Source-to-Target Mappings as
Supplemental Specifications 212 Nonfunctional Requirements as
Supplemental Specifications 212
Intersecting Value Chains for a Stereoscopic Project Definition
Intersecting the Two Value Chains 215 Agile EDW's Version of Requirements
Traceability 215 Addressing Nonfunctional
Requirements 217 The Proper Problem Domain for
Agile EDW 217 Agile EDW Supports Broader
Architectural Activities 219 Supporting the Organization's
Software Release Cycle 221 Phases Borrowed from Rational Unified
Process 221 Iterations —1 and 0 Fit into the Inception
Phase 221
xii Füll Contents
Arriving at a Predevelopment Project Estimate
Managing the Predevelopment Estimate Completing the Release Cycle
Techniques for the Elaboration Phase Choosing Developer Stories for the
Elaboration Phase Proving Out Architectures Using a
"Steel Thread" Prioritizing Project Backlogs Managing Incremental Precision
A Framework for Visualizing Progressive Requirements
The Freezer, Fridge, Counter Metaphor Effort Levels by Team Roles
Visualizing Requirements Management Demands with Effort Curves
Allocating Time for Nonfunctional Requirements
Conquering Complex Business Rules with an Embedded Method Add the Data Cowboy Role Special Skills and Tools for the Data
Cowboy Modified Data Mining Method Can Help Placing Business Rules Discovery and
Analysis into the Effort Curves Interfacing with Project Governance Not Returning to a Waterfall Approach
Part I I ! References
Part IV Agile EDW Data Engineering
12. Traditional Data Modeling Paradigms and Their Discontents
EDW at a Crossroads Reviewing the Reference Architecture Standard Normal Forms Lead to Complex
Integration Layers Conformed Dimensions Lead to Complex
Presentation Layers A Peek at the Agile Alternatives
Models, Architectures, and Paradigms Data Architecture Data Model Data Modeling Paradigm
Normalization Basics Designing Databases to Eliminate
Update Anomalies
223 225 226 226
226
227 228 229
230 230 232
232
234
235 235
236 236
238 239 242
245 13.
Example: One Table from First to Fifth Normal Form 262
Generalization Basics 271 Advantages and Disadvantages of
Generalization 271 Example: Generalizing a Sales Table
for the Party Entity 274 The Standard Approach and its Data
Modeling Paradigms 279 The Traditional Integration Layer as a
Challenged Concept 281 Involves an Expensive Hidden Layer 281 Results are Difficultto Understand 282 Entails High Maintenance Conversion
Costs 283 "Straight-To-Star" as a Controversial
Alternative 286 Four Change Cases for Appraising a Data
Modeling Paradigm 286 Change Case 1: Correcting Fourth Normal
Form Errors 287 Change Case 2: Generalizing to the Party
Model 287 Change Case 3: New Trigger Attribute
for a Slowly Changing Dimension 289 Change Case 4: Changing a Fact Table's
Grain 290
Surface Solutions Using Data Virtualization and Big Data
249 249
251
253 255 257 257 258 259 260
260
Leveraging Shadow It Example of a Five-Step Collaborative
Effort Lessons from the Case History
Faster Value Delivery with Data Virtualization
Defining Data Virtualization The Basic Use Case DVS Performance Features The Economics of Virtual Solutions DVS Surface Solutions and Progressive
Deployment Comparing DVS Surface Solutions
to the Previous Example Data Virtualization's Value Proposition EDW's Reference Architecture
Becomes Dynamic An Agile Role for Big Data
Introducing Big Data Technologies The Need for Big Data Technology The Promise of Schema-On-Read An Introduction to Hadoop
294
294 296
296 297 297 299 300
302
304 305
306 308 308 309 310 311
Contents xiii
Notable Contrasts between SQL and MapReduce
Making MapReduce Look Like SQL with Hive
Big Data ls Not Just Hive Using Big Data to Enhance EDW Agility
Agile Integration Layers with Hyper Normalization
Hyper Normalization Hinges on "Ensemble Modeling" Several Varieties of Hyper Normalization
Ex ist Hyper Normalized Data Modeling Concepts
Business Key Entities Linking Entities Attribute Entities Lightly Integrated, Persistent Staging Area Ensemble Modeling Components
Allow Light Integration and Agility An Insert-Only Paradigm Swedish Variation: Anchor Modeling
Reusable ETL Modules Accelerate New Development One ETL Pattern Needed Per Hyper
Normalized Table Type Parameter-Driven ETL Module Prototypes Calling the Reusable ETL Modules Self-Validating Reusable ETL Modules Estimate of Comparative Development
Efforts Common Data Retrieval Challenges
and Their Solutions HNF Aids the Leading Edge of the
Integration Layer Only Retrieving Datafrom an HNF Repository
Doubly Difficult Solution 0: Focus on Presentation Layer
Objects Solution 1: Dummy Attribute Records Solution 2: Current Record Indicators Solution 3: Point-in-Time Tables Solution 4: Table Pruning Solution 5: Bridging Tables Solution 6: Retrieval Query Writers Clearing an Architectural Review
Re-Architecting the EDW for Hyper Normalization The Simple Vault Style The Enhanced Vault Style The Source Vault Style The Raw Vault Style Blending Styles to Achieve Agility
314
317 324 325
Enabling Evolution of Existing EDW Components Change Case 1: Splitting Out Entities Change Case 2: Upgrading to a
Party Model HNF-Powered Agile Solutions Evidence of Success
Online Financial Services The Free University
366 366
367 368 371 372 372
329
330 331 333 334 335 337
339 342 343
344
345 346 348 350
352
352
353
354
356 356 356 356 358 359 360 361
361 362 363 364 364 365
15. Fully Agile EDW with Hyper Generalization
Hyper Generalization Involves a Mix of Modeling Strategies 375 Extreme Generalization 377 Adding Time-Oriented Object
Classification 380 Managing Things and Links with an
Associative Data Model 381 Storing Attributes as Name-Value Pairs 384 Storing Transaction Data in a Lightly
Dimensionalized Format 385 Managing Hyper Generalized Data in
HGF Requires an Automation Tool 386 HGF Enables Model-Driven Development
and Fast Deliveries 387 Eliminating Most Logical and Physical
Data Modeling 387 Controlling the EDW Design from a
Business Model Diagram 387 Driving Design Changes Using a Business
Model 389 Loading Data into the Hyper Generalized
Integration Layer 390 Loading the Dimensional Objects 390 Loading the Transactional Objects 391
Retrieving Information from a Hyper Generalized EDW 392 HGF Systems Maintain a Performance
Sublayer 392 Performance Layer Objects Enable
Business-Intelligible Data Retrieval 393 Model-Driven Evolution and Fast
Adaptation 395 Impact of Model Changes on Existing
Data 395 Hyper Generalization Tools Facilitate
Data Conversions 396 Supporting Derived Elements 397
Value-Added Loops 397 Model-Driven Master Data
Components 398 Addressing Performance Concerns 402
xiv Füll Contents
Demonstrating Agility Through Four Change Cases 403 Change Case 1: Upgrading Attributes to
Entities 403 Change Case 2: Consolidating Entities
into the Party Model 406 Change Case 3: New Trigger for a Slowly
Changing Dimension 409 Change Case 4: lncreasing the Grain
ofaFact Table 410 Recap of Change Case Findings 413
HGF-Powered Agile Solutions 414 Easier Backfills for Surface Solutions 415
Evidence of Success 416 Case History 1: Model-Driven
Development in Pharmaceuticals 416 Case History 2: Hyper Generalized Data
Warehousing in Specialty Retail 417
Part IV References 421
PartV Agile EDW Quality Management Planning
16. Why We Test and What Tests to Run
Why Test? Testing Keeps Agile Teams from Cutting
Corners Testing Keeps Root Cause Analysis
Manageable Testing Integrates Teamwork Across the
Pipeline Testing Leads to Better Requirements Testing Makes Real Progress Visible to
Everyone An Agile Approach to Quality Assurance
Striving for Balance Keeping Quality Assurance "Agile" Extending Test-Led Development Far
Above Unit Testing "What to Test?" Answered with Top-Down
Planning The Six Dimensions of DW/BI Testing Preliminary Definitions Dimension 1: Planning Dimension 2: System Dimension 3: Functional Dimension 4: Polarity
426
426
427
428 428
428 429 429 430
432
433 433 435 436 437 439 439
Dimension 5: Time Frame 440 Dimension 6: Point-of-View 440
A 2 X 2 Planning Matrix for Top-Down Test Selection 441 A Framework for Assessing a QA Plan's
Coverage 441 Linking Test Planning to Requirements
and Risk Management 443 "What to Test?" Answered Bottom-Up 444
Data Warehousing Testing Techniques 444 Traditional Application Testing
Techniques 446 Agile-Specific Test Techniques 449 An Easy-to-Follow Test Technique
Matrix for Low-Level Validations 451 Reusable Test Widgets 452 Test Cases Roll Forward Along the
System Dimension 453 Testing for Convergence 453
17. Designating Who, When, and Where
18.
Who Shall Write the Tests? A Framework for Understanding Who
Must Do What When Should Teammates Perform
Their QA Duties? Quality Activities Within an Iteration
Cycle Quality Duties at the End of a Release
Cycle Where Should Teammates Perform
Their QA Duties? Distributing Test Activities Across
Environments Distributing Test Techniques Across
Environments Key Quality Responsibilities by Team Role
Guiding the Team to Self-Organized Quality Planning
Suggested Quality Duties by Role The Overarching Duties of the System
Tester Certifying the User Demo's Data
How Many Testers are Needed?
457
458
463
464
466
468
468
469 470
470 471
473 474 475
Deciding How to Execute the Test Cases
Good Agile Quality Plans Involve Numerous Test Executions 477
Füll Contents xv
Alternatives to Sufficient Testing Unattractive 480
Facing Up to Test Automation 481 Step 1: Update the Top-Down Plan 482 Step 2: Start Building the Parameter-Driven
Widgets 482 Step 3: Plan Out the Test Data Sets 482
Identifying How Many Data Sets are Required 484
Planning to Create Dozens of Data Sets 485 Planning Storage for Dozens of
Data Sets 487 Planning also for Expected Results 487
Step 4: Implement the Engine, Whether Manual or Automated 487
Defining Test Scenarios 489 Step 5: Define the Project's Set of Testing
Aspects 489 Step 6: Build and Populate the Test Data
Repository 490 Step 7: Quantify the Testing Objectives 491 Step 8: Begin Creating Test Cases 493 Step 9: Start Up the Engine 493 Step 10: Visualize Project Progress with
Quality Assurance 494 Tests Implemented by Environment 494 Connect Top-Down and Bottom-Up
Quality Planning 496 Defects Over Time 496 Current Iteration Burndown Chart 496
Step 11: Document the Team's Success 497
PartV References 499
Part VI Integrating the Pieces of the Agile EDW Method
19. The Agile EDW Subrelease Cycle
Making the Release Cycle a Repeatable Process 503
Traditional Notions of Data Governance 504 A Life Cycle for Data Governance 505 Data Governance Actions for the EDW
Team 508 Machine-Assisted Data Governance
for the Subrelease Cycle 509 The Agile EDW Subrelease Value Cycle 510
The Fast Requirements Portion of the Subrelease Cycle 511
The Fast Delivery Portion of the Subrelease Cycle 512
Centering the Value Cycle on Data Governance and Quality 514 Deepening the Support for Data
Governance 514 Achieving World-Class Quality
Assurance 515 Guiding the Agile EDW Transition 515
The DW/BI Customer's Bill of Rights 516 Toward an Agile EDW Manifeste 518
Part VI References 521
Index 523