Post on 20-May-2015
ETL with
WSO2 Enterprise Middleware PlatformPrabath Abeysekara - Associate Technical Lead
Outline● A Classic Use Case
● What’s ETL and How It Is Interpreted In The Modern World?
● Why ETL?
● Challenges In Implementing ETL Solutions
● Why Traditional Standalone ETL Products Are Considered
Dead In The Modern World?
● What Factors To Be Considered When Implementing ETL In
Re-Architecting A System?
Outline contd..
● Impact Of Tooling
● Reference Architecture
○ How to build an “efficient, robust, scalable, auditable,
performing and maintainable” ETL solution with WSO2
EMP?
● Demo - Data Mapping With WSO2 Developer Studio
● Summary
● Q&A
A Classic Use Case - Financial Sector
Financial Reporting
Revenue Predictions
Other Analytics &
BI fronts
RDBMS
XML, Web Services
Flat files
ETL Process Enterprise
Data Warehouse
What’s ETL? - Traditional Interpretation
● Extract
● Transform
● Load
What’s ETL? - Modern Interpretation
● Extract
● Monitor
● Profile/Audit
● Analyze
● Cleanse
● Transform
● Load
Why ETL?
● Generally, to build and maintain data repositories with “single version of the truth” out of the multiple heterogenous data sources scattered across an organization or a business domain.
● Then, the business users can use that data for,○ Predictive Analysis○ Revenue predictions and comparisons○ Monitor Overall Growth of an organization○ Business Policies○ Strategic Decisions
Challenges
● Data definition establishment
● Need for expert knowledge
● Scalability and Performance
● Business user acceptance and seamless support for wide range of business use cases
● Maintenance, Data Archival
● Real-time or Near Real-time data synchronization
Why Standalone ETL Products Are Dead?
● Modern day organizations are evolving as it’s never been before.
● Tendency to adopt architecture patterns such as SOA to reduce IT costs and have flexible business processes is rapidly increasing.
● Organizations are more focussed towards “Connected businesses”.
● Thus, it’s very likely that an organization might have a IT infrastructure in place already.
Why Standalone ETL Products Are Dead?
● Adopting a standalone ETL product? Possible, but worthwhile?
● Generally less support for open standards. Extension points? Connectors? More custom code!
● Usually, relies on some proprietary data integration patterns, inducing high maintenance costs.
● Additional licensing costs, need for separate expert/operational assistance, again inducing high maintenance costs.
Why Standalone ETL Products Are Dead?
● Tendency to use in-house re-usable business components leveraging the benefits of SOA
● Less operational costs
● Scalability is a main focus nowadays.
● Having a similar process implemented enables, horizontal scalability at different layers as the need arises.
Re-Architecting A System’s DIL?
● Data Integration is always cumbersome
● Need for ensuring policy compliance of data at its target containers. (usually Enterprise Data Warehouses, Central MDM repositories, etc)
● Flexibility
● Ensuring acceptable Performance
● What about Reliability?
Re-Architecting A System’s DIL?
● How to deal with the freshness of data?
● When to synchronize?
● Need for tuning the system to meet various SLAs
Impact Of Tooling
XSLT Custom Code
Scripts
Impact Of Tooling
● Numerous ETL solutions fail because of the lack of tooling.
● Developers/Solution composers are left with manual coding of XSLT, Custom mappers, etc.
● Not scalable!
● Often requires a powerful flexible tooling platform particularly, as the system grows and matures.
Reference Architecture
Reference Architecture - Big Picture
ESB
BAM
DSS
DS
MB MB
DSS
Enterprise DW
Reference Architecture - Reliable extraction
ESB
DSS
DS
MB
Scheduled Tasks
Reference Architecture - Validate & Transform
Data Model X Data Model Y
ESB
WSO2 Data Mapper
Input Data Model
Output Data Model
Reference Architecture - Auditing
ESB
BAM
Data Quality Reports/ Dashboards
Data Policy Compliance Reports/ Dashboards
Reference Architecture - Reliable Loading
ESB MB
DSS
Enterprise DW
Tooling - Smooks Editor
Tooling - WSO2 Data Mapper
Demo
● Building a transformation between two simple data models using the Smooks Editor shipped with WSO2 Developer Studio.
Summary
● ETL, plays a pivotal role in any business organization.
● Often requires a lot of effort put into implementing a proper ETL process within an organization.
● Standalone ETL solutions can be costly.
● Re-architecting data models is made easy with WSO2 Enterprise Middleware Platform.
References
[1] How to use the Smooks Editor shipped with WSO2 Developer Studio
http://wso2.com/library/tutorials/2011/06/perform-data-mapping-smooks-editor-wso2-carbon-studio/
Q&A