METADATA BASED DYNAMIC ETLS - DeveloperMarchdevelopermarch.com/developersummit/2015/report/... ·...

21
METADATA BASED DYNAMIC ETLS April 2015

Transcript of METADATA BASED DYNAMIC ETLS - DeveloperMarchdevelopermarch.com/developersummit/2015/report/... ·...

METADATA BASED DYNAMIC ETLS

April 2015

AGENDA

Background

Reports Generator –ETL Architecture

Implementing using SSIS

Problem of the Unknowns

How to solve the problem?

Capturing the meta-data

Using the meta-data

Generating ETLs dynamically

Challenges

Mitigating using BIML

Final Architecture

Lessons Learnt

Reports Generator for

Clearing Connectivity Standards

Is a product

To simplify integration with data systems

That accepts Clearing Data and…

Generates Clearing Connectivity Standard Reports

Background

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Clearing Reports

Generator

Std. Report 1

Std. Report n

. . .

Raw Data A

Raw Data B

Raw Data X

Raw Data Y

. . .

Reports Generator – High Level Architecture

Raw Data Sources

Schema A

Schema B

Schema C

Std. Reports

Schema X

Schema Y

Schema Z

Reports Generator

Transform

Extract

Load

Schema Known

during DesignSchema known

during design

Reports Generator ETL – SSIS implementation for 1 Report

Raw Data Source - A simple source

schema from which an ETL package

extracts the data

Transform - Transformation on the

extracted data

Standard Report - A destination schema

to load the data

Building a Product - The problem of the unknowns

ABC corporation

Raw Data Source

Schema A

Standard Report

Schema X

Report Generator

ETLs(Point to Point)

XYZ corporation

Raw Data Source

Schema B

Standard Report

Schema Y

Report Generator

ETLs(Point to Point)

A new prospect…

Source

Schema ?

Destination

Schema ?

Report Generator

ETLs

(????)

Takes time to build

How to solve the problem?

1. Capture the required meta-data

• Data about data

• Source & Destination Schema Info

• Tables, columns, Data type, expressions, etc.

2. Use the meta-data at run-time …

• A Bespoke solution

• A Do-All ETL package

• Dynamic ETLs

• Other options…

1. Capturing the meta-data

Meta-data storeSource/Destination

connection information

Schema Information

Source

Schema

Destination

Schema

Transformation

Information

Expressions

Aggregations

2. Use the meta-data at run-time – Dynamic ETLs

Meta-data Store

ETL Generator

Read

Metadata

Validate

Metadata

Generate ETL

dynamically

ETL (SSIS)

Packages

Each SSIS ETL Package …

is a highly complex XML

may vary for every version of

SSIS

not easy to maintain

Generating Dynamic ETLs in SSIS – Challenges/Risks

Mitigating the risks - Simpler language for ETL

<- OR -> ?

About 660 lines of

XML This is the whole

package!

• The Business Intelligence

Markup Language (BIML)

• An XML dialect for ETL

• Simple and easy to maintain

• Can target multiple SSIS

versions

• Powerful i.e. has inline C#

support

• Reusable using templates

More About BIML...

• Created by Varigence

• Varigence is mainly into BI development

• Supports multiple SSIS versions

• Well supported by multiple IDEs/tools

• Mist and Vivid

• More about BIML in http://bimlscript.com/

• More about varigence in https://varigence.com/

Tying it back - Dynamic ETLs using BIML

Metadata Store

ETL Generator

Read

MetadataValidate

Metadata

Load BIML

templates

Generate ETL

dynamically

Transform

template to BIML

ETL Packages

Load common

templates

Clearing Reports

Generator

Std. Report 1

Std. Report n

. . .

Raw Data A

Raw Data B

Raw Data X

Raw Data Y

. . .

Reports Generator – Final Architecture

Metadata Store

ETL Generator

Read

MetadataValidate

Metadata

Load BIML

templates

Generate ETL

dynamically

Transform

template to BIML

ETL Packages

Load common

templates

A star schema on the RHS introduces maximum

complexity. Use a flat schema

Tracing the input from the output is difficult

ETL as a technology is not for row by row

processing

Use the abstraction layer wisely

Improperly used BIML templates properly leads

to maintainability issues

Keep things simple - don’t write business logic in

BIML

Maintain version changes across deployments

Will it work?

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Lessons learnt

Background

Reports Generator –ETL Architecture

Implementing using SSIS

Problem of the Unknowns

How to solve the problem?

Capturing the meta-data

Using the meta-data

Generating ETLs dynamically

Challenges

Mitigating using BIML

Final Architecture

Lessons Learnt

RECAP

QUESTIONS

THANK YOU

APPENDIX

Appendix

• Transformation service architecture diagram

Transformation service