Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
-
Upload
kent-graziano -
Category
Data & Analytics
-
view
79 -
download
0
Transcript of Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
![Page 1: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/1.jpg)
Extreme BI: Creating Virtualized Hybrid Type1+2 Dimensions
Kent Graziano, Data Warrior LLCKeith Hoyle, McKesson Specialty Health
![Page 2: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/2.jpg)
Agenda
BiosQuick SurveyWhy Virtualize?Virtualizing with DV 2.0Virtualizing with our hybrid architectureThe Secret Transform TableDoes it work?
Copyright 2015 Data Warrior LLC
![Page 3: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/3.jpg)
Bio (Kent)
Data Vault Master, Certified DVDM (1.0), CDVP2 Authorized Data Vault 2.0 Bootcamp Instructor Oracle ACE Director (BI/DW) Blogger: The Data Warrior Data Architecture and Data Warehouse Specialist
● 30+ years in IT● 20+ years of data warehousing experience
Member: Boulder BI Brain Trust (BBBT) Author, Co-Author Past-President of ODTUG and RMOUG
Copyright 2015 Data Warrior LLC
![Page 4: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/4.jpg)
Bio (Keith)
Sr. Manager, Enterprise Data Architecture (McKesson Specialty Health)
25+ years in IT 8+ years in Genetic Engineering / Biochemistry
in Pharmaceutical industry Completed multiple successful EDW efforts with
large companies (Dell, HP, AMD, Aflac, Amgen, Glaxo-SmithKline, etc.)
Consulted through large firms catering to big pharma / biotech / medical industry
Copyright 2015 Data Warrior LLC
![Page 5: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/5.jpg)
Audience Survey
How long in Data Warehousing or BI? Have you heard of Data Vault?
● DV 1.0?● DV 2.0?● Ever built anything using DV model?
![Page 6: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/6.jpg)
Why Virtualize?
Support Agile project approach● Shorter iterations● Faster time to market
Eliminates ETL bottleneck● Specs● Coding● Testing (QA)
Replace with simple database views
![Page 7: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/7.jpg)
Basic Data Vault Example
![Page 8: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/8.jpg)
Where does Data Vault fit?
Data Vault goes here
![Page 9: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/9.jpg)
Virtualizing with pure Data Vault
Type 1 SCD – simple● Join Hub & Sats
● Use a PIT table to avoid the Max(LOAD_DTS) subqueries Type 2 SCD – a little harder
● See my post: How to Build a Virtual Type 2 Slowly Changing Dimension
● Need a historicized PIT table with surrogate key Type 2 SCD with DV 2.0
● Same but use MD5 Key on PIT table● Build with Hub BK + Sat1 LOAD_DTS + Sat2
LOAD_DTS + …
![Page 10: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/10.jpg)
Virtualizing with Gepetto
Almost the same as DV ● A Data Vault hybrid● Added join to the KM tables
Gepetto does not split stage tables into multiple Sats● No PIT table needed
Views do a UNION ALL to include multiple source● Each source is a different stage table tied to the
same KM● KM table serves as PIT table to align them
![Page 11: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/11.jpg)
HI Stage
COMNStage
<Full copies of source
data structures
with additional plumbing fields to facilitate capturing
subsequent data changes
over time>
FIN Stage FINPresentation
HI Presentation
COMNPresentation
Gepetto Schema Architecture
Source(s)of Record
BOBJ / BI / ReportingEDW V2
COMN Validation (DQ)
COMN Integration
<Enterprise business key model with
key mapping pointers to COMN_STG
data >
FIN
HI
CLIN
G2
MU
HI
KDW
CI SAS Routines
EDW V1
FDW / PMS
KDW Lite
Lynx
SFDC
MKTG
Δ CDC
Insert1X
only
ΣΣ
ΣΣ
Σ
![Page 12: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/12.jpg)
Gepetto Virtualization ArchitectureStage
Integrate
PresentationKDW_ORG
…PRIM_KEY CDC_KEYG2_PRACTICE
…PRIM_KEY CDC_KEY
DATA_XFRM<SRC System, Table, Field, Value fields>,<TGT: System, Table, Field, Value fields>
CDC_KEY field in STG also go into the CDC_KEY in INTG. Joins to other STG table(s) to complete R_x_KEY and D_x_KEY fields in INTG.
R_VSTR_VST_KEY
D_PAT_REC_KEYD_PRVDR_KEY
D_LOC_GRP_KEYD_LOC_KEY
D_CLNDR_KEY
KDW_PAT_VISIT<Patient Record ID
fields><Provider ID fields><Practice ID fields><Location ID fields><Visit Date fields>
…PRIM_KEY CDC_KEY
DIM_PAT_RECSCD2_PAT_REC_KEYSCD1_PAT_REC_KEY
D_PRSN_KEY…
DIM_PRVDRSCD2_PRVDR_KEYSCD1_PRVDR_KEY
…
DIM_PRCTC_HIERSCD2_PRCTC_HIER_KEYSCD1_PRCTC_HIER_KEY
D_LOC_KEY…
D_PAT_RECD_PAT_REC_KE
Y…
D_LOCD_LOC_KEY
…D_PRVDRD_PRVDR_KEY
…
KM_LOC_GRPD_LOC_GRP_KEY
CDC_KEY
LYNX_PRCTCPM_PRCTC_KEY
…PRIM_KEY CDC_KEY
1) Logical views can be used to initially vett reports, aggregations, etc. where possible (i.e. most dimensions, primitive facts, some aggregate facts, etc.)2) Materialized views can be used to vett the scaling of the solution3) ETL processes will be used to productional-ize the vetted solution4) STG data is transformed using joins to the DATA_XFRM table in INTG5) Data is scrubbed with standard SQL functionalities. (i.e. initcap, trim, remove special characters, etc.)
D_LOC_GRPD_LOC_GRP_K
EY…
KM_VSTR_VST_KEYCDC_KEY
FACT_VSTSCD2_VST_KEYSCD1_VST_KEYD_PAT_REC_KEYD_PRVDR_KEYD_PRCTC_KEY
D_LOC_KEYD_CLNDR_KEY
![Page 13: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/13.jpg)
MD5 Keys Concatenate source data fields and hash to create MD5 keys MD5 Key Types
● PRIM_KEY (STG):● All source fields (in table order) + LOAD_DTS● Uniquely ID’s all records with DW● Can serve as an SCD-2 key in virtual Dim’s/ Facts
● CDC_KEY (STG / INTG):● Source field(s) (in table order) used by SOR to ID data rows uniquely for change data
capture purposes● CDC_ATTR (STG):
● All non-CDC_KEY source field (in table order) to track changed for change data capture purposes
● NAT_KEY (STG):● Source field(s) (in table order) from a single SOR table used to logically ID data rows
uniquely● [D_XXX_KEY / R_XXX_KEY] BUS_KEY (INTG):
● Source field(s) (in table order) used to logically ID data rows uniquely (joins may be required)
● Can serve as an SCD-1 key in virtual Dim’s/ Facts
![Page 14: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/14.jpg)
Presentation Layer – (Stage / Integration Joins)
COMN_INTG contains Business Keys in Domains and linkages between Domains in Relationships
D_xxx_KEY and R_xxx_KEY fields in COMN_INTG are populated with hashed business keys also contained in KM_xxx tables in COMN_INTG
Domains and Relationships are joined to KeyMaps and COMN_STG tables to create different COMN_PRSNTN elements (3-NF or Star Schema style) and optimized as needed:● Small/Simple: Logical views (faster time to market, less
performance)● Medium: Materialized views● Large/Complex: ETL loaded/tuned tables (slower time to
market, more performance)
![Page 15: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/15.jpg)
Hybrid Type 1-2 Dims
Need to support both SCD1 and SCD2 queries Could build two sets of views We built 1 view that has two keys
● SCD1_<Hub>_KEY● SCD2_<Hub>_KEY
SCD1 Key = Hub/Domain PK (MD5) SCD2 Key = PRIM_KEY from Stage (MD5)
● Gepetto stage table is a Type 2 table already● Includes all columns + LOAD_DTS
![Page 16: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/16.jpg)
Example View Mapping: DIM2_MED
![Page 17: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/17.jpg)
Example Join Code: DIM2_MED
FROM COMN_INTG.D_MED D INNER JOIN COMN_INTG.KM_MED KM ON D.D_MED_KEY = KM.D_MED_KEY AND KM.EXPR_DTS IS NULL INNER JOIN COMN_STG.G2_MEDICATION STG ON KM.CDC_KEY = STG.CDC_KEY AND KM.REC_SRC = STG.REC_SRC AND KM.REC_SRC_TBL = STG.REC_SRC_TBL AND KM.LOAD_DTS <= STG.LOAD_DTS AND (KM.EXPR_DTS IS NULL OR KM.EXPR_DTS >= STG.EXPR_DTS)'
![Page 18: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/18.jpg)
Type 1 Rows – Current Values
Use the SCD1_KEY columns Use virtual CURR_FLG or EXPR_DTSCASE WHEN LEAD(stg.LOAD_DTS) OVER (PARTITION BY
stg.CDC_KEY ORDER BY stg.LOAD_DTS) IS NULL THEN 'Y' ELSE 'N' END CURR_FLG,LEAD(stg.LOAD_DTS) OVER (PARTITION BY
stg.CDC_KEY ORDER BY stg.LOAD_DTS) EXPR_DTS
![Page 19: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/19.jpg)
The Secret Transform Table
DATA_XFRM● In Integration layer● Data driven translation table● Allows “light” transformations via joins/views● Embedded in Virtual Dimension code
![Page 20: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/20.jpg)
Transform Table Design
![Page 21: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/21.jpg)
Example Xfrm Data
![Page 22: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/22.jpg)
Example Translation Join
LEFT OUTER JOIN comn_intg.data_xfrm ct ON ct.SRC_SCHEMA_NM = 'COMN_STG' AND ct.SRC_TABLE_NM = 'KDW_ORG' AND ct.SRC_FIELD_NM = 'currentsts' AND ct.TGT_SCHEMA_NM = 'COMN_PRSNTN' AND UPPER (ct.TGT_TABLE_NM) IN
('DIM2_PRCTC_HIER') AND ct.TGT_FIELD_NM = 'cntrct_typ_cd' AND ct.SRC_VALUE_NM = UPPER
(od.currentsts);
![Page 23: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/23.jpg)
Does it work?
Yes! Some views work great, others slow
● Usually with huge volumes Mitigation –
● Materialized views● Increased parallelism on base tables
Best option – Oracle Exadata● Implementing 11g SuperCLuster● Initial results – 10x performance improvement
![Page 24: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/24.jpg)
Conclusion: Benefits of Virtualization
We can now rapidly demonstrate the contents of a type 2 dim prior to ETL programming
With using PIT tables we don’t need the Load End DTS on the Sats so the Sats become insert only as well (simpler loads, no update pass required)
Another by product is the Sat is now also Hadoop compliant (again insert only)
Since the nullable Load End DTS is not needed, you can now more easily partition the Sat table by Hub Id and Load DTS.
![Page 25: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/25.jpg)
Cowpath Highway
Old Way vs New Way
Which way will you follow?
Sign up for WWDVC 2016 at wwdvc.com
![Page 26: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/26.jpg)
![Page 27: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/27.jpg)
Super Charge Your Data Warehouse
Available on Amazon.comSoft Cover or Kindle Format
Now also available in PDF at LearnDataVault.com
![Page 28: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/28.jpg)
New DV 2.0 Book Coming Soon
Available for pre-order on Amazon:http://www.amazon.com/Building-Scalable-Data-Warehouse-Vault/dp/0128025107/
![Page 29: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/29.jpg)
Contact Information
Kent GrazianoData Warrior [email protected] Twitter @KentGrazianoVisit my blog athttp://kentgraziano.com
![Page 30: Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions](https://reader035.fdocuments.us/reader035/viewer/2022070522/58eed05e1a28aba2368b45ef/html5/thumbnails/30.jpg)
Contact Information
Keith HoyleSr. Mgr., Enterprise Data Architecture
McKesson Specialty [email protected]
Visit my blog athttp://khoyle001.wordpress.com