Data Vault Consortium A Mathematical Perspective of Data Vault.
Data Vault Overview
-
Upload
empowered-holdings-llc -
Category
Business
-
view
5.545 -
download
8
description
Transcript of Data Vault Overview
1
Data Vault Model &
Methodology© Dan Linstedt, 2011-2012 all rights
reserved
2
Agenda• Introduction – why are you here?• What is a Data Vault? Where does it come from?• Star Schema, 3nf, and Data Vault pros and cons
AS AN EDW solution..• When is a Data Vault a good fit?
o Benefits of Data Vault Modeling & Methodology
• <BREAK>• When to NOT use a Data Vault• Fundamental Paradigm Shift• Business Keys & Business Processes• Technical Review• Query Performance (PIT & Bridge)• What wasn’t covered in this presentation…
3
A bit about me…• Author, Inventor, Speaker – and
part time photographer…• 25+ years in the IT industry• Worked in DoD, US Gov’t, Fortune
50, and so on…
• Find out more about the Data Vault:o http://www.youtube.com/LearnDataVaulto http://LearnDataVault.com
• Full profile on http://www.LinkedIn.com/dlinstedt
4
Why Are YOU Here?• Your Expectations?• Your Questions?• Your Background?• Areas of Interest?
• Biggest question:
What are the top 3 pains your current EDW / BI solution is experiencing?
5
What is it?Where did it come
from? Defining the Data Vault Space
6
Data Vault Time Line
20001960 1970 1980 1990
E.F. Codd invented relational modeling
Chris Date and Hugh Darwen Maintained and Refined Modeling
1976 Dr Peter ChenCreated E-R Diagramming
Early 70’s Bill Inmon Began Discussing Data Warehousing
Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University
Mid 70’s AC Nielsen PopularizedDimension & Fact Terms
Mid – Late 80’s Dr Kimball Popularizes Star Schema
Mid 80’s Bill InmonPopularizes Data Warehousing
Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”
1990 – Dan Linstedt Begins R&D on Data Vault Modeling
2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling
7
Data Vault Modeling…
Took 10 years of Research and Design, including TESTING
to become flexible, consistent, and
scalable
8
What IS a Data Vault? (Business
Definition)
• Data Vault Modelo Detail orientedo Historical traceabilityo Uniquely linked set of
normalized tableso Supports one or more
functional areas of business
ProcurementSales DeliveryContracts
FinancePlanning
Operations
Business KeysSpan / CrossLines of Business
Functional Area
• Data Vault Methodology– CMMI, Project Plan– Risk, Governance, Versioning– Peer Reviews, Release Cycles– Repeatable, Consistent,
Optimized– Complete with Best Practices
for BI/DW
The Data Vault Model• The Data Vault model is a data modeling approach
…so it fits into the family of modeling approaches:
9
3rd Normal Form
Data Vault Star Schema
• While 3rd Normal Form is optimal for Operational Systems
…and Star Schema is optimal for OLAP Delivery / Data Marts
…the Data Vault is optimal for the Data Warehouse (EDW)
10
Supply Chain Analogy
Data Vault(EDW)
Source Systems
Data Marts
11
What Does One Look Like?
Customer
Sat
Sat
Sat
F(x)
Customer
Product
Sat
Sat
Sat
F(x)
Product
Order
Sat
Sat
Sat
F(x)
Order
Elements:•Hub•Link•Satellite
Link
F(x)
Sat
Records a history of the interaction
Hub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data
HUB
LINK
Satellite
Satellite
Colorized Perspective…Data Vault
Details
Business Keys
Associations
The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Details that describe them and provide context (Satellites).
3rd NF & Star Schema
12
(separation)
(Colors Concept Originated By: Hans Hultgren)
13
Star Schemas, 3NF, Data Vault:
Pros & ConsDefining the Data Vault Space
Why NOT use Star Schemas as an EDW?Why NOT use 3NF as an EDW?
Why NOT use Data Vault as a Data Delivery Model?
14
Star Schema Pros/Cons as an EDW
PROS• Good for multi-dimensional
analysis• Subject oriented answers• Excellent for aggregation points• Rapid development /
deployment• Great for some historical
storage
CONS• Not cross-business functional• Use of junk / helper tables• Trouble with VLDW• Unable to provide integrated
enterprise information• Can’t handle ODS or
exploration warehouse requirements
• Trouble with data explosion in near-real-time environments
• Trouble with updates to type 2 dimension primary keys
• Trouble with late arriving data in dimensions to support real-time arriving transactions
• Not granular enough information to support real-time data integration
15
3nf Pros/Cons as an EDWPROS• Many to many linkages• Handle lots of information• Tightly integrated information• Highly structured• Conducive to near-real time
loads• Relatively easy to extend
CONS• Time driven PK issues• Parent-child complexities• Cascading change impacts• Difficult to load• Not conducive to BI tools• Not conducive to drill-down• Difficult to architect for an
enterprise• Not conducive to spiral/scope
controlled implementation• Physical design usually
doesn’t follow business processes
16
Data Vault Pros/Cons as an EDW
PROS• Supports near-real time and
batch feeds• Supports functional business
linking• Extensible / flexible• Provides rapid build / delivery
of star schema’s• Supports VLDB / VLDW• Designed for EDW• Supports data mining and AI• Provides granular detail• Incrementally built
CONS• Not conducive to OLAP
processing• Requires business
analysis to be firm• Introduces many join
operations
17
Analogy: The Porsche, the SUV and the Big Rig
• Which would you use to win a race?• Which would you use to move a house?• Would you adapt the truck and enter a race with Porches and expect to
win?
18
A Quick Look at Methodology IssuesBusiness Rule Processing, Lack of Agility, and
Future proofing your new solution
19
EDW Architecture: Generation 1
• Quality routines• Cross-system dependencies• Source data filtering• In-process data manipulation
• High risk of incorrect data aggregation• Larger system = increased impact• Often re-engineered at the SOURCE• History can be destroyed (completely re-computed)
Sales
Finance
Contracts
Staging(EDW)
StarSchemas
Enterprise BI Solution
(batch)
Conformed DimensionsJunk Tables
Helper TablesFactless Facts
ComplexBusiness
Rules+Dependencies
Complex Business Rules #2
Staging + History
20
#1 Cause of BI Initiative Failure
Re-EngineeringFor
Every Change!
Anyone?
Let’s take a look at one example…
21
Re-Engineering
Customer
CustomerTransactions
Sales
Finance
Current Sources
Source
Join
BusinessRules
Data Flow (Mapping)
CustomerPurchases
** NEW SYSTEM**
IMPACT!!
22
Federated Star Schema Inhibiting
Agility
Time
Effort& Cost
High
Low
Start MaintenanceCycle Begins
Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time
RESULT: Business builds their own Data Marts!
Data Mart 1
Data Mart 2
Data Mart 3
The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.
23
EDW Architecture: Generation 2
Sales
Finance
Contracts
Staging EDW(Data Vault)
StarSchemas
ErrorMarts
ReportCollections
Enterprise BI SolutionSOA
(real-time)
(batch)
(batch)
ComplexBusiness
Rules
The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing
impacts to the enterprise data warehouse (EDW)
• Repeatable• Consistent• Fault-tolerant• Supports phased release
• Scalable• Auditable
FUNDAMENTAL GOALS
Unstructured
Data
24
NO Re-Engineering
Customer
CustomerTransactions
Sales
Finance
Current Sources
StageCopy
StageCopy
HubCustome
r
HubAcct
HubProduc
t
Link Transacti
on
Data Vault
CustomerPurchases
** NEW SYSTEM**
StageCopy
IMPACT!!
NO IMPACT!!!NO RE-ENGINEERING!
25
Progressive Agility and Responsiveness of
IT
Time
Effort& Cost
High
Low
Start MaintenanceCycle Begins
Foundational Base Built
New Functional Areas AddedInitial DV Build Out
Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.
26
What’s Wrong With the OLD METHODOLOGY?Using Star Schemas as your Data Warehouse leads to….
27
Dimensionitis• DimensionItis: Incurable Disease, the symptoms are the creation
of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay…
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………... …………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………... …………………...
…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...…………………...
…………………...…………………...…………………...…………………...…………………...…………………...…………………...
Business Says: Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department…
What can it hurt?
28
Deformed Dimensions• Deformity: The URGE to continue “slamming data” into an existing
conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare.
Re-Engineering the Load Processes EACH
TIME!
…………………………………… ………………… ………………… ………………… ………………… ………………… …………………
V1Comple
xLoad
90 days, $125k
Business Change
………………………………………………………………………………………………………………………………………………………………………………………………………………………………
V2
Complex
Load
120 days, $200k
Business Change
………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ………………… ……………………………………
V3
Complex
Load
180 days, $275k
Business Change
Business Wants a Change!Business said: Just add that to the existing Dimension, it will be easy right?
29
Silo Building / IT Non-Agility
• Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right?
Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type
First Star
Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneFact_ABCFact_DEFFact_PDQFact_MYFACT
Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type
Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type
Customer_IDCustomer_NameCustomer_AddrCustomer_Addr1Customer_CityCustomer_StateCustomer_ZipCustomer_PhoneCustomer_TagCustomer_ScoreCustomer_RegionCustomer_StatsCustomer_PhoneCustomer_Type
We built our own because IT costs too much…
SALES
We built our own because IT took too long…
FINANCE
We built our own because we needed customized dimension data…
MARKETING
Business ChangeTo Modify Existing Star = 180 days, $275k
30
Why is Data Vault a Good Fit?
31
What are the top business
obstacles in your data warehouse
today?
32
Poor Agility
Inconsistent Answer Sets
Needs Accountability
Demands Auditability
Desires IT Transparency
Are you feeling Pinned Down?
33
What are the top technology
obstacles in yourdata warehouse
today?
34
Complex Systems
Real-Time Data Arrival
Unimaginable Data Growth
Master Data Alignment
Bad Data Quality
Late Delivery/Over Budget
Are your systems CRUMBLING?
35
Have lead you down a
painful path…
Yugo
Worlds Worst Car
Existing Solutions
36
Projects Cancelled & Restarted
Re-engineering required to absorb new systemsComplexity drives
maintenance cost Sky highDisparate Silo Solutions
provide inaccurate answers!Severe lack of
Accountability
37
There must be a better way…
There IS a better way!
How can you overcome
these obstacles?
38
It’s Called the
Data Vault Model
and Methodology
39
What is it?
It’s a simpleEasy-to-use
PlanTo build your
valuableData Warehouse!
40
Uncomplicated Design
Simple Build-out
Rapid Adaptability
Understandable Standards
Effortless Scalability
Painless Auditability
Pursue Your Goals!
What’s the Value?
41
Why Bother With Something New?
Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'
42
What Are the Issues?
This is NOT what you want happening to your project!
THE GAP!!
43
What Are the Foundational Keys?
Flexibility
Scalability
Productivity
44
Key: Flexibility
Enabling rapid change on a massive scale without downstream impacts!
45
Key: Scalability
Providing no foreseeable barrier to increased size and scope
People, Process, & Architecture!
46
Key: Productivity
Enabling low complexity systems with high value output at a rapid
pace
47
< BREAK TIME >
48
How does it work?Bringing the Data Vault to Your Project
49
Key: Flexibility
Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts
No Re-
Engineeri
ng!
50
Case In Point:
Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!
51
Key: Scalability in Architecture
Scaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks
52
Case In Point:
Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!
53
Key: Scalability in Team Size
You should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:
Scale your team when desired, at different points in the project!
54
Case In Point:(Dutch Tax Authority)
Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault
55
Key: Productivity
Increasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing
processes
56
Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.
These individuals generated:• 90% of the ETL code for moving the data
set• 100% of the Staging Data Model• 75% of the finished EDW data Model• 75% of the star schema data model
57
The Competing Bid?
The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)
Our total cost? $30k and 2 weeks!
58
Results?
Changing the direction of the river takes less effort than stopping the flow
of water
59
When NOT to use the Data Vault Model &
Methodology
60
When NOT to Use the Data Vault
• You have:o a small set of point solution requirementso a very short time-frame for deliveryo To use the data one-time, then throw it awayo a single source system, single source applicationo A single business analyst in the entire company
• You do NOT have:o audit requirements forcing you to keep historyo multiple data center consolidation effortso near-real-time to worry abouto massive batch data to integrateo External data feeds outside your controlo Requirements to do trend analysis of all your datao Pain – that forces you to reengineer every time you ask for a change to
your current data warehousing systems
61
Fundamental Paradigm Shift
Exploring differences in the architecture, implementation, and process design.
62
It’s Not Just a Data Model…
SUCCESS!
Model Methodology
63
Different From ANYTHING ELSE!
• The Business Rules go after the Data Warehouse!• Data is interpreted on the way OUT!• Hold on… We do distinguish between HARD and SOFT
business rules…
Ok, now tell my WHY this is important?
64
EDW: The Old Way of Loading
Corporate Fraud Accountability Title XI consists of seven sections. Section 1101 recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It identifies corporate fraud and records tampering as criminal offenses and joins those offenses to specific penalties. It also revises sentencing guidelines and strengthens their penalties. This enables the SEC to temporarily freeze large or unusual payments.
Source 1
Source 2
Source 3
Business RulesChangeData!
Staging
HR Mart
Sales Mart
Finance Mart
Are changes to data ON THE WAY IN to the EDW equivalent to records tampering?
65
EDW: The New Compliant Way
1. Implement a Raw Data Vault Data Warehouse2. Move the business rules “downstream”
66
Business Keys & Business Processes
67
Business Keys & Business
Processes
Time
ProcurementSales
$$Revenue
DeliveryContractsFinance
PlanningManufacturing
CustomerContact
Sales Procurement
SLS123 SLS123SLS123 *P123MFG
*P123MFG
Excel Spreadsheet
Manual Process
NO VISIBILITY!
68
Technical ReviewHub, Link, Satellite - Definitions
69
HUB Data Examples
SQN CUST_ACCT LOAD_DTS RECORD_SRC1 ABC123 10-14-2000 SALES2 ABC-123 10-14-2000 SALES3 *ABC-123 10-14-2000 FINANCE4 123,ABCD 10-15-2000 CONTRACTS5 PEF-2956 10-16-2000 CONTRACTS
HUB_CUST_ACCTSQN PART_NUM LOAD_DTS RECORD_SRC1 MFG-25862 10-14-2000 MANUFACT2 MFG*25266 10-14-2000 MANUFACT3 *P25862 10-14-2000 PLANNING4 MFG_25862 10-15-2000 DELIVERY5 CN*25266 10-16-2000 DELIVERY
HUB_PART_NUMBER
SEQUENCE<BUSINESS KEY>{LAST SEEN DATE}<LOAD DATE><RECORD SOURCE>
Hub Structure
} Unique Index} Optional
70
Link Structures
LPS_SQNPRODUCT_SQNSUPPLIER_SQNLPS_LOAD_DTSLPS_REC_SOURCELPS_ENCR_KEY
Link_Product_Supplier Link_Customer_Account_Employee
LCAE_SQNCUSTOMER_SQNACCOUNT_SQNEMPLOYEE_SQNLCAE_LOAD_DTSLCAE_REC_SOURCE
UniqueIndex
SEQUENCE<HUB KEY SQN 1><HUB KEY SQN 2><HUB KEY SQN N>{LAST SEEN DATE}{CONFIDENCE}{STRENGTH}<LOAD DATE><RECORD SOURCE>
Link Structure
Unique Index
} Optional
Dynamic Link
71
Satellites Split By Source System
PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>NamePhone NumberBest time of day to reachDo Not Call Flag
SAT_SALES_CUSTPARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>First NameLast NameGuardian Full NameCo-Signer Full NamePhone NumberAddressCityState/ProvinceZip Code
SAT_FINANCE_CUST
PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>Contact NameContact EmailContact Phone Number
SAT_CONTRACTS_CUST
PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>{user defined descriptive data}{or temporal based timelines}
Satellite StructurePrimaryKey
72
Why do we build Links this way?
History Teaches Us…If we model for ONE relationship in the EDW, we BREAK the
others!
73
Portfolio
Customer
M
M
5 yearsFrom now X
Portfolio
Customer
M
1
10 Years ago X
Portfolio
Customer
1
MToday:
Hub Portfolio
Hub Customer
1
M
The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!
This situation forces re-engineering of the model, load routines, and queries!
74
History Teaches Us…If we model with a LINK table, we can handle ALL the
requirements!
Portfolio
Customer
M
M
5 years from now
Portfolio
Customer
1
MToday:
Portfolio
Customer
M
1
10 Years ago This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!
Hub Portfolio
Hub Customer
1
M
LNKCust-Port
M
1
Base EDW Created in CorporateFinancials in USA
HubHub
SatSatSatSat
HubHub
SatSatSatSat
LinkLink
SatSatSatSat
Applying the Data Vault to Global
DW2.0
HubHub
SatSatSatSatLinkLink
Manufacturing EDW in China
HubHub
SatSatSatSat
Planning in Brazil
LinkLink
HubHub
SatSatSatSatLinkLink
75
76
Hub Customer Hub OrderLnk Cust-Order
Sat Customer Sat Order Sat Order
DASD – Raid 0+1
Each table receives it’s own I/O channel, and it’s own Raid 0+1 Disk
DASD – Raid 0+1DASD – Raid 0+1
DASD – Raid 0+1 DASD – Raid 0+1 DASD – Raid 0+1
Extreme Data Vault Partitioning
77
Query PerformancePoint-in-time and Bridge Tables, overcoming query issues
78
Purpose Of PIT & Bridge• To reduce the number of joins, and to reduce the
amount of data being queried for a given range of time.
• These two together, allow “direct table match”, as well as table elimination in the queries to occur.
• These tables are not necessary for the entire model; only when:o Massive amounts of data are foundo Large numbers of Satellites surround a Hub or Linko Large query across multiple Hubs & Links is necessaryo Real-time-data is flowing in, uninterrupted
• What are they?o Snapshot tables – Specifically built for query speed
79
PIT Table Architecture
Hub Custome
r
HubOrder
Hub Product
Link Line Item
SatelliteLine Item
Sat 1
Sat 2
Sat 3
Sat 4
PIT Sat
Sat 1
Sat 2
Sat 3
Sat 4
PIT Sat
Sat 1
Sat 2
PARENT SEQUENCELOAD DATE{Satellite 1 Load Date}{Satellite 2 Load Date}{Satellite 3 Load Date}{…}{Satellite N Load Date}
Satellite: Point In Time
PrimaryKey
80
PIT Table Example
SQN LOAD_DTS NAME1 10-14-2000 Dan L1 11-01-2000 Dan Linedt1 12-31-2000 Dan Linstedt
SAT_CUST_CONTACT_NAMESQN LOAD_DTS CELL1 10-14-2000 999-555-12121 10-15-2000 999-111-12341 10-16-2000 999-252-28341 10-17-2000 999.257-28371 10-18-2000 999-273-5555
SAT_CUST_CONTACT_CELLSQN LOAD_DTS ADDR1 08-01-200026 Prospect1 09-29-2000 26 Prosp St.1 12-17-2000 28 November1 01-01-2001 26 Prospect St
SAT_CUST_CONTACT_ADDR
SQN LOAD_DTS SAT_NAME_LDTS SAT_CELL_LDTSSAT_ADDR_LDTS1 08-01-2000 NULL NULL 08-01-20001 09-01-2000 NULL NULL 08-01-20001 10-01-2000 NULL NULL 09-29-20001 11-01-2000 11-01-2000 10-18-2000 09-29-20001 12-01-2000 11-01-2000 10-18-2000 09-29-20001 01-01-2001 12-31-2000 10-18-2000 01-01-2001
Snapshot Date
81
BridgeTable Architecture
Hub Seller
Hub Product
Link
Satellite
Sat 1
Sat 2
Sat 3
Sat 4
Bridge
Hub Parts
Link
Satellite
UNIQUE SEQUENCELOAD DATE{Hub 1 Sequence #}{Hub 2 Sequence #}{Hub 3 Sequence #}{Link 1 Sequence #}{Link 2 Sequence #}{…}{Link N Sequence #}{Hub 1 Business Key}{Hub 2 Business Key}{…}{Hub N Business Key}
Satellite: BridgePrimary
Key
82
Bridge Table Data Example
SQN LOAD_DTS SELL_SQN SELL_ID PROD_SQN PROD_NUMPART_SQN PART_NUM1 08-01-2000 15 NY*1 2756 ABC-123-9K 525 JK*2*42 09-01-2000 16 CO*24 2654 DEF-847-0L 324 MN*5-23 10-01-2000 16 CO*24 82374 PPA-252-2A 9938 DD*2*34 11-01-2000 24 AZ*25 25222 UIF-525-88 7 UF*9*05 12-01-2000 99 NM*5 81 DAN-347-7F 16 KI*9-26 01-01-2001 99 NM*5 81 DAN-347-7F 24 DL*0-5
Snapshot Date
Bridge Table: Seller by Product by Part
83
What WASN’T Covered• ETL Automation• ETL Implementation• SQL Query Logic• Balanced MPP design• Data Vault Modeling on Appliances• Deep Dive on Structures (Hubs, Links, Satellites)• What happens when you break the rules?• Project management, Risk management &
mitigation, methodology & approach• Automation: Automated DV modeling, Automated
ETL production• Change Management• Temporal Data Modeling Concerns… And so on…
84
Conclusions
85
Who’s Using It?
86
The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon
“The Data Vault is foundationally strong and exceptionally scalable architecture.”
Stephen Brobst
“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
87
More Notables…
“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.”
Howard Dresner
“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit
from..”Scott Ambler
88
Where To Learn More• The Technical Modeling Book:
http://LearnDataVault.com
• The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions
• Contact me:http://DanLinstedt.com - web [email protected] - email
• World wide User Group (Free)http://dvusergroup.com