Automating DWH Patterns Through Metadata
-
Upload
davide-mauri -
Category
Technology
-
view
2.411 -
download
1
description
Transcript of Automating DWH Patterns Through Metadata
![Page 1: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/1.jpg)
![Page 2: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/2.jpg)
Automating Data Warehouse Patterns Through MetadataDavide [email protected]
![Page 3: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/3.jpg)
Davide Mauri20 Years of experience on the SQL Server Platform
– Specialized in Data Solution Architecture, Database Design, Performance Tuning, Business Intelligence, Data Warehouse, Big Data & Analytics
Microsoft SQL Server MVPPresident of UGISS (Italian SQL Server UG)Mentor @ SolidQ
– Regular Speaker @ SQL Server events– Projects, Consulting, Mentoring & Training
Find me here:– Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx– Twitter:@mauridb
![Page 4: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/4.jpg)
Building a DWH in 2013Is still a (almost) manual process
A *lot* of repetitive low-value work
No (or very few) standard tools available
![Page 5: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/5.jpg)
How it should beSemi-automatic process
– “develop by intent”
Define the mapping logic from a semantic perspective– Source to Dimensions / Measures
• (Metadata anyone?)
Design the model and let the tool build it for you
CREATE DIMENSION CustomerFROM SourceCustomerTableMAP USING CustomerMetadata
ALTER DIMENSION CustomersADD ATTRIBUTE LoyaltyLevelAS TYPE 1
CREATE FACT OrdersFROM SourceOrdersTableMAP USING OrdersMetadata
ALTER FACT OrdersADD DIMENSION Customer
![Page 6: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/6.jpg)
The perfect BI process & architecture
AGILE BI
Iterative!
![Page 7: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/7.jpg)
DWH PROCESSESIs automation possible?
![Page 8: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/8.jpg)
Invest on Automation?Faster development
– Reduce Costs– Embrace Changes
Less bugs
Increase solution quality and make it consistent throughout the whole product
![Page 9: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/9.jpg)
Automation Pre-RequisitesSplit the process to have two separate type of processes
– What can be automated– What can NOT be automated
Create and impose a set of rules that defines– How to solve common technical problems– How to implement such identified solutions
![Page 10: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/10.jpg)
No Monkey Work!Let the people think and let the machines do the «monkey» work.
![Page 11: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/11.jpg)
Design Pattern“A general reusable solution to a commonly occurring problem within a given context”
![Page 12: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/12.jpg)
Design PatternGeneric ETL Pattern
– Partition Load– Incremental/Differential Load
Generic BI Design Pattern– Slowly Changing Dimension
• SCD1, SCD2, ecc.– Fact Table
• Transactional, Snapshot, Temporal Snapshot
![Page 13: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/13.jpg)
Design PatternSpecific SQL Server Patterns
– Change Data Capture– Change Tracking– Partition Load– SSIS Parallelism
![Page 14: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/14.jpg)
Engineering the DWH“Software Engineering allows and require the formalization of software building and maintenance process.”
![Page 15: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/15.jpg)
Sample Rules• Always put «last_update» column• Always log Inserted/Updated/Deleted rows to
log.load_info table• Use MD5 – binary(16) for checksums• Use views to expose data
– Dimension & Fact views MUST use the same column names for lookup columns
![Page 16: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/16.jpg)
Engineering the DWHThere are two intrinsc processes hidden in the development of a BI solution that must be allowed (or forced) to emerge.
![Page 17: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/17.jpg)
Business ProcessData manipulation, transformation, enrichment & cleansing logic
Specific for every customer. Almost not automatable
![Page 18: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/18.jpg)
Technical ProcessApplication of data extraction and loading techniques
Recurring (pattern) in any solution
Highly Automatable
![Page 19: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/19.jpg)
Hi-Level Vision
STGETLETL
OLTP DWH
ETL
Technical Process
Business Process
Technical Process
![Page 20: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/20.jpg)
ETL Phases«E» and «L» must be
– Simple, Easy and Straightforward– Completely Automated– Completely Reusable
«E» and «L» have ZERO value in a BI Solution– Should be done in the most economic way
![Page 21: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/21.jpg)
PATTERN Well known solution to common problems
![Page 22: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/22.jpg)
Source Full Load E
![Page 23: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/23.jpg)
Source Incremental Load EIn this scenario, “ID” is a IDENTITY/SEQUENCE.Probably a PK.
![Page 24: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/24.jpg)
Source Differential Load/1 E
In this scenario the source tabledoesn’t offer any specific way to Understand what’s changed
![Page 25: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/25.jpg)
Source Differential Load/2 E
In this scenario the source table has a TimeStamp-Like column
![Page 26: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/26.jpg)
Source Differential Load• SQL Server 2012 that can help with
incremental/differential load– Change Data Capture
• Natively supported in SSIS 2012• http://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-ser
ver-2012-2/– Change Tracking
• Underused feature in BI…not so rich as CDC but MUCH more simpler and easier
E
![Page 27: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/27.jpg)
SCD 1 & SCD 2 LStart
Lookup Dimension Id and MD5 Checksum From Business Key
Calculate MD5 Checksum of Non-SCD-Key Colums
Dimension Id is Null?YesInsert new members
into DWH No Checksum are different?
Yes
Store into temp table
Merge data from temp table to DWHEnd
![Page 28: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/28.jpg)
SCD 2 Special Note• Merge => UPDATE Interval + INSERT New Row
L
![Page 29: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/29.jpg)
FACT TABLE LOAD L
![Page 30: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/30.jpg)
Partition Load EL
![Page 31: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/31.jpg)
Parallel Load• Logically split the work in several steps
– E.g: Load/Process one customer at time• Create a «queue» table the stores information for each step
– Step 1 -> Load Customer «A»– Step 2 -> Load Customer «B»
• Create a Package that 1. Pick the first not already picked up 2. Do work3. Back to step 3
• Call the Package «n» times simultaneously
EL
![Page 32: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/32.jpg)
Other SSIS Specific Patterns• Range Lookup
– Not natively supported – Matt Masson has the answer in his blog
• http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
![Page 33: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/33.jpg)
METADATAA key ingredient in automation
![Page 34: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/34.jpg)
MetadataProvide context information
– Which columns are used to build/feed a Dimension?
– Which columns are Business Keys?– Which table is the Fact Table?– How Fact and Dimension are connected?
• Which columns are used?
![Page 35: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/35.jpg)
How to manage Metadata?• Naming Convention
• Extended Properties
• Specific, Ad Hoc Database or Tables
• Other (XML, File, ecc.)
![Page 36: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/36.jpg)
Naming Convention• The easiest and cheapest
– No additional (hidden) costs– No need to be maintained– Never out-of-sync– No documentation need
• Actually, it IS PART of the documentation– Imposes a Standard
• Very limited in terms of flexibility and usage
![Page 37: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/37.jpg)
Extended PropertiesSupport most of metadata needs
No additional software needed
Very verbose usage– Development of a wrapper to make usage simpler is
feasible and encouraged
![Page 38: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/38.jpg)
Metadata ObjectsDedicated Ad-Hoc Database and Tables
As Flexible as you need
Maintenance Overhead to keep metadata in-sync with data– Development of automatic check procedure is needed– DMV can help a lot here
![Page 39: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/39.jpg)
External Metadata ObjectsReally expensive to keep them in-sync
– A tool is needed, otherwise too much manual work
Does not give any specific benefits with respect to Ad-Hoc Database/Tables
![Page 40: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/40.jpg)
DEMO
![Page 41: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/41.jpg)
AUTOMATIONLet’s make it possible!
![Page 42: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/42.jpg)
Automation Scenarios• Run-Time: «Auto-Configuring» Packages
– Really hard to customize packages– SSIS limitations must be managed
• Eg: Data Flow cannot be changed at runtime• On-the fly creation of package may be needed
• Design-Time: Package Generators / Package Templates– Easy to customize created packages
![Page 43: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/43.jpg)
Automation Solutions• Specific Tool/frameworks
– BIML / MIST
• SQL Server Platform– SQL, PowerShell, .NET– SMO, AMO
![Page 44: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/44.jpg)
Package GeneratorsRequired Assemblies
Microsoft.SqlServer.ManagedDTSMicrosoft.SqlServer.DTSRuntimeWrapMicrosoft.SqlServer.DTSPipelineWrap
Path:C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies
![Page 45: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/45.jpg)
DEMO
![Page 46: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/46.jpg)
Useful Resources• «STOCK» Tasks:
– http://msdn.microsoft.com/en-us/library/ms135956.aspx
• How to set Task properties at runtime:– http://technet.microsoft.com/en-us/library/microsoft
.sqlserver.dts.runtime.executables.add.aspx
![Page 47: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/47.jpg)
BIML – BI Markup Language• Developed by Varigence
– http://www.varigence.com – http://bimlscript.com/ – MIST: BIML Full-Featured IDE
• Free via BIDS Helper– Support “limited” to SSIS package generation– http://bidshelper.codeplex.com
![Page 48: Automating DWH Patterns Through Metadata](https://reader033.fdocuments.us/reader033/viewer/2022061121/546f06b1af7959aa568b4ff9/html5/thumbnails/48.jpg)
THANK YOU!• For attending this session and
PASS SQLRally Nordic 2013, Stockholm