Etl Report
-
Upload
jurguen-zambrano -
Category
Documents
-
view
229 -
download
0
Transcript of Etl Report
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 1/40
Evaluating
ETL and DataIntegration
Platforms
Wayne Eckerson and Colin White
R E P O R T S E R I E S
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 2/40
Research Sponsors
Business Objects
DataMirror Corporation
Hummingbird Ltd
Informatica Corporation
Evaluating ETL and Data Integration Platforms
Acknowledgements
TD W I w ould like to thank m any people w ho contributed to this report. First, w e appreciate
the m any users w ho responded to our survey, especially those w ho responded to our requests
for phone interview s. Second, w e’d like to thank Steve Tracy and Pieter M im no w ho review ed
draft m anuscripts and provided feedback, as w ell as our report sponsors w ho review ed outlines,
survey questions, and report drafts. Finally, w e w ould like to recognize TD W I’s production
team : D enelle H anlon, Theresa Johnston, M arie M cFarland, and D onna Padian.
About TDWIThe D ata W arehousing Institute (TD W I), a division of 101com m unications LLC, is the prem ier
provider of in-depth, high-quality education and training in the business intelligence and data
w arehousing industry. TD W I supports a w orldw ide m em bership program , quarterly educational
conferences, regional sem inars, onsite courses, leadership aw ards program s, num erous print
and online publications, and a public and private (M em bers-only) W eb site.
This special report is the property of The D ata W arehousing Institute (TD W I) and is m ade available to a restricted
num ber of clients only upon these term s and conditions. TD W I reserves all rights herein. Reproduction ordisclosure in w hole or in part to parties other than the TD W I client, w ho is the original subscriber to this report,is perm itted only w ith the w ritten perm ission and express consent of TD W I. This report shall be treated at alltim es as a confidential and proprietary docum ent for internal use only. The inform ation contained in the report isbelieved to be reliable but cannot be guaranteed to be correct or com plete.
For m ore inform ation about this report or its sponsors, and to view the archived report W ebinar, please visit:w w w .dw -institute.com /etlreport/.
© 2003 by 101com m unications LLC . All rights reserved. Printed in the U nited States. The D ata W arehousingInstitute is a tradem ark of 101com m unications LLC. O ther product and com pany nam es m entioned herein m aybe tradem arks and/or registered tradem arks of their respective com panies. The D ata W arehousing Institute is adivision of 101com m unications LLC based in Chatsw orth, CA .
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 3/40
Scope, Methodology, and Demographics . . . . . . . . . . .3
Executive Summary: The Role of ETL in BI . . . . . . . . . .4
ETL in Flux . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
The Evolution of ETL . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Framework and Components . . . . . . . . . . . . . . . . . . .5
A Quick History . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Code Generation Tools . . . . . . . . . . . . . . . . . .6
Engine-Based Tools . . . . . . . . . . . . . . . . . . . . . . .6
Code Generators versus Engines . . . . . . . . . . .7
Database-Centric ETL . . . . . . . . . . . . . . . . . . . . .8
Data Integration Platforms . . . . . . . . . . . . . . . . . . . . .9
ETL versus EAI . . . . . . . . . . . . . . . . . . . . . . . . . .9
ETL Trends and Requirements . . . . . . . . . . . . . . . . . . .10
Large Volumes of Data . . . . . . . . . . . . . . . . . . .10
Diverse Data Sources . . . . . . . . . . . . . . . . . .10Shrinking Batch Windows . . . . . . . . . . . . . . . .11
Operational Decision Making . . . . . . . . . . . . .11
Data Quality Add-Ons . . . . . . . . . . . . . . . . . . . .12
M eta Da ta M anag emen t . . . . . . . . . . . . . . . . . . . . . . . .13
Packaged Solutions . . . . . . . . . . . . . . . . . . . . . . . . .14
Enterprise Infrastructure . . . . . . . . . . . . . . . . .14
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Build or Buy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Why Buy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Maintaining Custom Code . . . . . . . . . . . . . . . .16
Why Build? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Build and Buy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
User Satisfac tion w ith ETL Tools . . . . . . . . . . . . . . .18
Challenges in Deploying ETL . . . . . . . . . . . . . . . . . .19
Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
THE DATA WAREHOUSING INSTITUTE www.dw-institute.com 1
Table of Contents
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . .21
Buy If You … . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Build If You … . . . . . . . . . . . . . . . . . . . . . . . . .22
Data Integration Platforms . . . . . . . . . . . . . . . . . . . . . .22
Platform Characteristics . . . . . . . . . . . . . . . . . .23
Data Integration Characteristics . . . . . . . . . . .23
High Performance and Scalability . . . . . . . . . . . . . .23
Built-In Data Cleansing and Profiling . . . . . . . . . . . .23
Complex, Reusable Transforma tions . . . . . . . . . . . . .24
Reliable Operations and Robust Administr ation . . . .25
Diverse Source a nd Target Systems . . . . . . . . . . . . .25
Update and Capture Utilities . . . . . . . . . . . . . . . . . . .26
Global M eta Data M anagement . . . . . . . . . . . . . . . .27
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
ETL Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . .28 Available Resources . . . . . . . . . . . . . . . . . . . . .29
Vendor Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Overall Product Considerations . . . . . . . . . . . . . . . .30
Design Features . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
M eta Data Management Features . . . . . . . . . . . . . .31
Transfor mation Features . . . . . . . . . . . . . . . . . . . . .31
Data Quality Features . . . . . . . . . . . . . . . . . . . . . . . .32
Performance Features . . . . . . . . . . . . . . . . . . . . . . .32
Extrac t and Capture Features . . . . . . . . . . . . . . . . . .33
Load and Update Features . . . . . . . . . . . . . . . . . . . .33
Operate and Administer Component . . . . . . . . . . . . .34
Integrated Product Suites . . . . . . . . . . . . . . . . . . . .34
Company Services and Pricing . . . . . . . . . . . . . . . . .35
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
by Wayne W. Eckerson and Colin White
Evaluating ETL and DataIntegration Platforms
REPORT SERIES
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 4/40
2 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
WAYNE ECKERSONis director of research for The D ata W arehousing Institute (TD W I), the
leading provider of high-quality, in-depth education and research services to data w arehousing
and business intelligence professionals w orldw ide. Eckerson oversees TD W I’s M em ber publi-
cations and research services.
Eckerson has w ritten and spoken on data w arehousing and business intelligence since 1994.
H e has published in-depth reports and articles about data quality, data w arehousing, custom er
relationship m anagem ent, online analytical processing (O LA P), W eb-based analytical tools,
analytic applications, and portals, am ong other topics. In addition, Eckerson has delivered pre-
sentations at industry conferences, user group m eetings, and vendor sem inars. H e has also
consulted w ith m any vendor and user firm s.
Prior to joining TD W I, Eckerson w as a senior consultant at the Patricia Seybold G roup, and direc-
tor of the G roup’s Business Intelligence & D ata W arehouse Service, w hich he launched in 1996.
COLIN WHITEis president of Intelligent Business Strategies. W hite is w ell know n for his in-
depth know ledge of leading-edge business intelligence, enterprise portal, database and W eb
technologies, and how they can be integrated into an IT infrastructure for building and sup-
porting the intelligent business.
W ith m ore than 32 years of IT experience, W hite has consulted for dozens of com panies
throughout the w orld and is a frequent speaker at leading IT events. H e is a faculty m em ber
at TD W I and currently serves on the advisory board of TD W I’s Business Intelligence Strategies
program . H e also chairs a portals and W eb Services conference.
W hite has co-authored several books, and has w ritten num erous articles on business intelli-
gence, portals, database, and W eb technology for leading IT trade journals. H e w rites a regularcolum n for DM Review m agazine entitled “Intelligent Business Strategies.”Prior to becom ing
an analyst and consultant, W hite w orked at IBM and Am dahl.
About the Authors
ETL- Extract, transform, and load (ETL) tools play a critical part increating data w arehouses, w hich form the bedrock of business intell i-gence (see below ). ETL tools sit at the int ersection of myriad sourceand target systems and act as a funnel to pull toget her and blend
heterogeneous data into a consistent format and m eaning andpopulate data warehouses.
Business Intelligence- TDWI uses the term "business intell igence"
or BI as an umbrella term t hat encompasses ETL, data w arehouses,reporting and analysis tools, and analytic applicati ons. BI projects turndata into information, knowledge, and plans that drive profitablebusiness decisions.
TDWI Definitions
The TDWI Report Series is designed
to educate technical and businessprofessionals about crit ical issues inBI. TDWI’s in-depth reports offer
objective, vendor-neutral researchconsist ing of interviews w ithindustry experts and a survey of BIprofessionals worldwide. TDWI in-depth reports are sponsored byvendors who collectively wish to
evangelize a discipline w ithin BI oran emerging approach or technology.
About the TDWI Report Series
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 5/40
26%
5%
Corporate IT professional (63%)
Systems integrator or external consultant (26%)
Business sponsor or business user (5%)
63%
6%
Other (6%)
Less than $10 million (15%)
$10 billion +(15%)
$1 billion to $10 billion (28%)
$100 million to $1 billion (25%)
$10 million to $100 million (18%)
15%15%
28%
25%
18%
USA (62%)
Scandinavian countries (5%)
India (4%)
Canada (6%)
18%
62%
6%
4%
5%3%2%
United Kingdom (3%)
Other (18%)
Mexico (2%)
9%8%
10%
7%
5%
7%
13%13%
6%
5%3%
14%
Federal government
Education
TelecommunicationsHealthcare
Insurance
State/local governmentOther industries
Consulting/professional services
Retail/wholesale/distributionManufacturing (non-computer)
Financial services
0 3 6 9 12 15
Software/Internet
Demographics
Position
Company Revenues
Industry
6%
Corporate (51%)
Business Unit/division (21%)
Department/functional group (21%)
51%
21%
21%
Does not apply (6%)
OrganizationStructure
Report Scope.This report examines
the current and future state of ETL
tools. It describes how business
requirements are fueling the creation
of a new generation of products called
data integration platforms. It examines
the pros and cons of building versus
buying ETL functionality and assesses
the challenges and success factors
involved wi th im plementing ETL tools.
I t f in ishes by providing a l ist of
evaluation criteria to assist
organizations in selecting ETLproducts or validating current ones.
Methodology.The research for this
report was conducted by interviewing
industry experts, including consultants,
industry analysts, and IT professionals,
who have implemented ETL tools.
The research is also based on a survey
of 1000+ business intelligence
professionals that TDWI conducted
in November 2002.
Survey Methodology.TDWI
received 1,051 responses to thesurvey. Of these, TDWI qualified 741
respondents who had both deployed
ETL functionality and were either IT
professionals or consultants at end-
user organizations. Branching logic
accounts for the variation in the
number of respondents to each
question. For example, respondents
w ho “built” ETL programs w ere
asked a few dif ferent questions from
those who “ bought” vendor ETL
tools. M ult i-choice questions and
rounding techniques account fortotals that don’t equal 100 percent.
Survey Demographics.Mo st
respondents were corporate IT
professionals who work at large U.S.
companies in a range of industries.
(See the ill ustrations on this page
for breakouts.)
Scope and Methodology
THE DATA WAREHOUSING INSTITUTE www.dw-institute.com 3
Scope, Methodology, and Demographics
Country
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 6/40
4 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Executive Summary: The Role of ETL in Business Intelligence
ETL is the heart and soul of business intelligence (BI). ETL processes bring together and com -
bine data from m ultiple source system s into a data w arehouse, enabling all users to w ork off a
single, integrated set of data—a single version of the truth. The result is an organization that
no longer spins its w heels collecting data or arguing about w hose data is correct, but one that
uses inform ation as a key process enabler and com petitive w eapon.
In these organizations, BI system s are no longer ni ce to have , but essential to success. These
system s are no longer stand-alone and separate from operational processing—they are inte-
grated w ith overall business processes. As a result, an effective BI environm ent based on inte-
grated data enables users to m ake strategic, tactical, and operational decisions that drive the
business on a daily basis.
Why ETL is Hard. According to m ost practitioners, ETL design and developm ent w ork con-
sum es 60 to 80 percent of an entire B I project. W ith such an inordinate am ount of resources
tied up in ETL w ork, it behooves BI team s to optim ize this layer of their BI environm ent.
ETL is so tim e consum ing because it involves the unenviable task of re-integrating the enter-
prise’s data from scratch. O ver the span of m any years, organizations have allow ed their busi-
ness processes to dis-integrate into dozens or hundreds of local processes, each m anaged by a
single fiefdom (e.g., departm ents, business units, divisions) w ith its ow n system s, data, and
view of the w orld.
W ith the goal of achieving a single version of truth, business executives are appointing BI
team s to re-integrate w hat has taken years or decades to undo. Equipped w ith ETL and m od-
eling tools, BI team s are now expected to sw oop in like conquering heroes and rescue the
organization from inform ation chaos. O bviously, the challenges and risks are daunting.
Mitigating Risk. ETL tools are perhaps the m ost critical instrum ents in a B I team ’s toolbox.
W hether built or bought, a good ET L tool in the hands of an experienced ET L developer can
speed deploym ent, m inim ize the im pact of system s changes and new user requirem ents, and
m itigate project risk. A w eak ETL tool in the hands of an untrained developer can w reak
havoc on BI project schedules and budgets.
ETL in FluxG iven the dem ands placed on ET L and the m ore prom inent role that BI is playing in corpora-
tions, it is no w onder that this technology is now in a state of flux.
More Complete Solutions. O rganizations are now pushing ETL vendors to deliver m ore com -
plete B I “solutions.”Prim arily, this m eans handling additional back-end data m anagem ent and
processing responsibilities, such as providing data profiling, data cleansing, and enterprise
m eta data m anagem ent utilities. A grow ing num ber of users also w ant BI vendors to deliver acom plete solution that spans both back-end data m anagem ent functions and front-end report-
ing and analysis applications.
Better Throughput and Scalability. They also w ant ETL tools to increase throughput and per-
form ance to handle exploding volum es of data and shrinking batch w indow s. Rather than
refresh the entire data w arehouse from scratch, they w ant ETL tools to capture and update
changes that have occurred in source system s since the last load.
ETL: The Heart andSoul of BI
ETL Work Consumes 60to 80 Percent of all
BI Projects
ETL Tools Are the MostImportant in a BI
Toolbox
Users Seek Integrated Toolsets
ETL Tools MustProcess More Data inLess Time
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 7/40
THE DATA WAREHOUSING INSTITUTE www.dw-institute.com 5
The Evolu tio n of ETL
More Sources, Greater Complexity, Better Administration.ETL tools also need to handle a
w ider variety of source system data, including W eb, XM L, and packaged applications. To inte-
grate these diverse data sets, ETL tools m ust also handle m ore com plex m appings and trans-
form ations, and offer enhanced adm inistration to im prove reliability and speed deploym ents.
“Near-Real-Time” Data.Finally, ETL tools need to feed data w arehouses m ore quickly w ithm ore up-to-date inform ation. This is because batch processing w indow s are shrinking and
business users w ant integrated data delivered on a tim elier basis (i.e., the previous day, hour,
or m inute) so they can m ake critical operational decisions w ithout delay.
Clearly, the m arket for ETL tools is changing and expanding. In response to user requirem ents,
ETL vendors are transform ing their products from single-purpose ETL products into m ulti-pur-
pose data in tegration platforms . BI professionals need help understanding these m arket and
technology changes as w ell as how to leverage the convergence of ETL w ith new technologies
to optim ize their BI architectures and ensure a healthy return on their ETL investm ents.
The Evolution of ETL
Framew ork and Components
Before w e exam ine the future of ETL, w e w ill define the m ajor com ponents in an ETL fram ew ork.
ETL stands for extract, tran sform , and load . That is, ETL program s periodically extract data
from source system s,transform the data into a com m on form at, and then load the data into
the target data store, usually a data w arehouse.
As an acronym , how ever, ETL only tells part of the story. ETL tools also com m only move or
transport data betw een sources and targets,document how data elem ents change as they
m ove betw een source and target (i.e., m eta data),exchange this m eta data w ith other applica-
tions as needed, and administer all run-tim e processes and operations (e.g., scheduling, error
m anagem ent, audit logs, and statistics). A m ore accurate acronym m ight be EM TD LEA!
The diagram on page 5 depicts the m ajor com ponents involved in ETL processing. The follow -
ing bullets describe each com ponent in m ore detail:
Users Want TimelierData
Do We Need a NewAcronym?
Illustration S1. The diagram shows the core components of an ETL product. Courtesy of Intelligent Business Strategies.
Extract
Administration& operations
services
Transport services
Load
Meta dataimport/export
Databases& files
Source adapters
Transform
Runtimemeta dataservices
Target adapters
Designmanager
Legacyapplications
Databases & files
Meta datarepository
ETL Processing Framework
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 8/40
6 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
•Design manager: Provides a graphical m apping environm ent that lets developers
define source-to-target m appings, transform ations, process flow s, and jobs. The designs
are stored in a m eta data repository.
•Meta data management: Provides a repository to define, docum ent, and m anage infor-
m ation (i.e., m eta data) about the ETL design and runtim e processes. The repository
m akes m eta data available to the ETL engine at run tim e and other applications
•Extract: Extracts source data using adapters, such as O D BC, native SQ L form ats, or flat file
extractors. These adapters consult m eta data to determ ine w hich data to extract and how .
•Transform: ETL tools provide a library of transform ation objects that let developers transform
source data into target data structures and create sum m ary tables to im prove perform ance.
•Load: ETL tools use target data adapters, such as SQ L or native bulk loaders, to insert
or m odify data in target databases or files.
•Transport services: ETL tools use netw ork and file protocols (e.g., FTP) to m ove data
betw een source and target system s and in-m em ory protocols (e.g., data caches) to
m ove data betw een ETL run-tim e com ponents.
• Administration and operation: ETL utilities let adm inistrators schedule, run, and
m onitor ETL jobs as w ell as log all events, m anage errors, recover from failures, andreconcile outputs w ith source system s.
The com ponents above com e w ith m ost vendor-supplied ET L tools, but they can also be
built by data w arehouse developers. A m ajority of com panies have a m ix of packaged and
hom egrow n ETL applications. In som e cases, they use different tools in different projects, or
they use custom code to augm ent the functionality of an ETL product. The pros and cons of
building and buying ETL com ponents are discussed later in the report. (See “Build or Buy?”
on page 15.)
A Quick History
Code Generation Tools
In the early 1990s, m ost organizations developed custom code to extract and transform datafrom operational system s and load it into data w arehouses. In the m id-1990s, vendors recog-
nized an opportunity and began shipping ETL tools designed to reduce or elim inate the labor-
intensive process of w riting custom ETL program s.
Early vendor ETL tools provided a graphical design environm ent that generated third-genera-
tion language (3G L) program s, such as a CO BO L. Although these early code-generation tools
sim plified ETL developm ent w ork, they did little to autom ate the runtim e environm ent or
lessen code m aintenance and change control w ork. O ften, adm inistrators had to m anually dis-
tribute and m anage com piled code, schedule and run jobs, or copy and transport files.
Engine-Based ToolsTo autom ate m ore of the ETL process, vendors began delivering “engine-based”products in
the m id to late 1990s that em ployed p roprietary scripting languages running w ithin an ETL
or D BM S server. These ETL engines use language interpreters to process ETL w orkflow s at
runtim e. The ETL w orkflow s defined by developers in the graphical environm ent are stored
in a m eta data repository, w hich the engine reads at runtim e to determ ine how to process
incom ing data.
Although this interpretive approach m ore tightly unifies design and execution environm ents, it
doesn’t necessarily elim inate all custom coding and m aintenance. To handle com plex or
unique requirem ents, developers often resort to coding custom routines and exits that the tool
ETL Engines InterpretDesign Rules atRuntime
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 9/40
THE DATA WAREHOUSING INSTITUTE www.dw-institute.com 7
The Evolu tio n of ETL
accesses at runtim e. These user-developed routines and exits increase com plexity and m ainte-
nance, and therefore should be used sparingly.
All Processing Occurs in the Engine.Another significant characteristic of an engine-based
approach is that all processing takes place in the engine, not on source system s (although hybrid
architectures exist). The engine typically runs on a W indow s or UN IX m achine and establishesdirect connections to source system s. If the source system is non-relational, adm inistrators use a
third-party gatew ay to establish a direct connection or create a flat file to feed the ETL engine.
Som e engines sup port parallel processing, w hich enables them to process ETL w orkflow s in
parallel across m ultiple hardw are processors. O thers require adm inistrators to m anually define
a parallel processing fram ew ork in advance (using partitions, for exam ple)—a m uch less
dynam ic environm ent.
Code Generators versus EnginesThere is still a debate about w hether ETL engines or code generators, w hich have im proved
since their debut in the m id-1990s, offer the best functionality and perform ance.
Benefits of Code Generators.“The benefit of code generators is that they can handle m orecom plex processing than their engine-based counterparts,”says Steve Tracy, assistant director
of inform ation delivery and application strategy at H artford Life Insurance in Sim sbury, CT.
“M any engine-based tools typically w ant to read a record, then w rite a record, w hich lim its
their ability to efficiently handle certain types of transform ations.”
Consequently, code generators also elim inate the need for developers to m aintain user-w ritten
routines and exits to handle com plex transform ations in ETL w orkflow s. This avoids creating
“blind spots”in the tools and their m eta data.
In addition, code generators produce com piled code to run on various platform s. Com piled
code is not only fast, it also enables organizations to distribute processing across m ultiple plat-
form s to optim ize perform ance.
“Because the [code generation] product ran on m ultiple platform s, w e did not have to buy a
huge server,”says K evin Light, m anaging consultant of business intelligence solutions at ED S
Canada. “Consequently, our client’s overall capital outlay [for a code generation product] w as
less than half w hat it w ould cost them to deploy a com parable engine-based product.”
Benefits of Engines. Engines, on the other hand, concentrate all processing in a single server.
Although this can becom e a bottleneck, adm inistrators can optim ally configure the engine
platform to deliver high perform ance. Also, since all processing occurs on one m achine,
adm inistrators can m ore easily m onitor perform ance and quickly upgrade capacity if needed
to m eet system level agreem ents.
This approach also rem oves political obstacles that arise w hen ETL adm inistrators try to dis-
tribute transform ation code to run on various source system s. By offloading transform ationprocessing from source system s, ETL engines interfere less w ith critical runtim e operations.
Enhanced Graphical Development.In addition, m ost engine-based ETL products offer visual
developm ent environm ents that m ake them easier to use. These graphical w orkspaces enable
developers to create ETL w orkflow s com prised of m ultiple ETL objects for defining data m ap-
pings and transform ations. (See Illustration S2.) Although code generators also offer visual
developm ent environm ents, they often aren’t as easy to use as engine-based tools, lengthening
overall developm ent tim es, according to m any practitioners.
ETL Engines CentralizeProcessing in a High-Performance Server
Code GeneratorsEliminate the Needto Maintain CustomRoutines and Exits
Visual DevelopmentEnvironmentsHelp ManageComplex J obs
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 10/40
8 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Database-Centric ETLToday, several D BM S vendors em bed ETL capabilities in their D BM S products (as w ell as
O LA P and data m ining capabilities). Since these vendors offer ETL capabilities at little or no
extra charge, organizations are seriously exploring this option because it prom ises to reduce
costs and sim plify their BI environm ents.
In short, users are now asking, “W hy should w e purchase a third-party ETL tool w hen w e can
get ETL capabilities for free from our database vendor of choice? W hat are the additional ben-
efits of buying a third-party ETL tool?”
To address these questions, organizations first need to understand the level and type of ETL
processing that database vendors support. Today, there are three basic groups:
Cooperative ETL. H ere, third-party ETL tools can leverage com m on database functionality to
perform certain types of ETL processing. For exam ple, third-party ETL tools can leverage stored
procedures and enhanced SQ L to perform transform ations and aggregations in the database
w here appropriate. This enables third-party ETL tools to optim ize perform ance by exploiting
the optim ization, parallel processing, and scalability features of the D BM S. It also im proves
recoverability since stored procedures are m aintained in a com m on recoverable data store.
Complementary ETL. Som e database vendors now offer ETL functions that m irror features
offered by independent ET L vendors. For exam ple, m aterialized view s create sum m ary tables
that can be autom atically updated w hen detailed data in one or m ore associated tables
change. Sum m ary tables can speed up query perform ance by several orders of m agnitude. In
addition, som e D BM S vendors can issue SQ L statem ents that interact w ith W eb Services-basedapplications or m essaging queues, w hich are useful w hen building and m aintaining near-real-
tim e data w arehouses.
Competitive ETL. M ost database vendors offer graphical developm ent tools that exploit the ETL
capabilities of their database products. These tools provide m any of the features offered by
third-party ETL tools, but at zero cost, or for a license fee that is a fraction of the p rice of
independent tools. At present, how ever, database-centric ETL solutions vary considerably in
quality and functionality. Third-party ETL tools still offer im portant advantages, such as the
Illustration S2.Many ETL tools provide a graphical interface for mapping sources to targets and managing complex workflows.
ETL Visual Mapping Interface
“Why Purchasean ETL Tool WhenOur DatabaseBundles One?”
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 11/40
THE DATA WAREHOUSING INSTITUTE www.dw-institute.com 9
The Evolu tio n of ETL
range of data sources supported, transform ation pow er, and adm inistration. Today, database-
centric ETL products are useful for building departm ental data m arts, but this w ill change over
tim e as D BM S vendors enhance their ETL capabilities.
In sum m ary, database vendors currently offer ETL capabilities that both enhance and com pete
w ith independent ET L tools. W e expect database vendors to continue to enhance their ET Lcapabilities and com pete aggressively to increase their share of the ETL m arket.
Data Integration Platforms
D uring the past several years, business requirem ents for BI projects have expanded dram ati-
cally, placing new dem ands on ETL tools. These requirem ents are outlined in the follow ing
section (See “ETL Trends and Requi rements ”). Consequently, ETL vendors have begun to dra-
m atically change their products to m eet these requirem ents. The result is a new generation
ETL tool that TD W I calls a da ta in teg ra t ion p la t fo r m .
As a platform, this em erging generation of ETL tools provides greater perform ance, through-
put, and scalability to process larger volum es of data at higher speeds. To deal w ith shrinking
batch w indow s, these platform s also load data w arehouses m ore quickly and reliably, usingchange data capture techniques, continuous processing, and im proved runtim e operations.
The platform s also provide expanded functionality, especially in the areas of data quality,
transform ation pow er, and adm inistration.
As a data integration hub, these products connect to a broader array of databases, system s,
and applications as w ell as other integration hubs. They capture and process data in batch or
real tim e using either a hub-and-spoke or peer-to-peer inform ation delivery architecture. They
coordinate and exchange m eta data am ong heterogeneous system s to deliver a highly integrat-
ed environm ent that is easy to use and adapts w ell to change.
Solution Sets. U ltim ately, data integration platform s are designed to m eet a larger share of an
organization’s BI requirem ents. To do this, som e ETL vendors are extending their product
lines horizontally , adding data quality tools and near-real-tim e data capture facilities to pro-
vide a com plete data m anagem ent solution. O thers are extending vertically , adding analytical
tools and applications to provide a com plete B I solution.
ETL versus EAITo be clear, ETL tools are just one of several technologies vying to becom e the preem inent
enterprise data integration platform . O ne of the m ost im portant technologies is enterprise
application integration (EAI) softw are, such as BEA System s, Inc.’s W ebLogic Platform and
Tibco Softw are Inc.’s ActiveEnterprise.
EAI softw are enables developers to create real-tim e, event-driven interfaces am ong disparate
transaction system s. For exam ple, during the Internet boom , com panies flocked to EAI tools to
connect e-com m erce system s w ith back-end inventory and shipping system s to ensure thatW eb sites accurately reflected product availability and delivery tim es.
Although EAI softw are is event-driven and supports transaction-style processing, it does not
generally have the sam e transform ation pow er as an ETL tool. O n the other hand, m ost ETL
tools do not support real-tim e data processing. To overcom e these lim itations, som e ETL and
EAI vendors are now partnering to provide the best of both approaches. EAI softw are cap-
tures data and application events in real tim e and passes them to the ETL tools, w hich trans-
form the data and loads it into the B I environm ent.
Database Vendors Both
Enhance and Competewith ETL Vendors
Higher Performance,Complete Solution
ETL Vendors AreExpanding Vertically
and Horizontally
Some ETL and EAIVendors ArePartnering...
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 12/40
10 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Already, som e ETL vendors have begun incorporating EAI functionality into their core engines.
These tools contain adapters to operational applications and operate in an “alw ays aw ake”m ode
so they can receive and process data and events generated by the applications in real tim e.
Future Reality. It w on’t be long before all ETL and EAI vendors follow suit, integrating both
ETL and EAI capabilities w ithin a single product. The w inners in this race w ill have a substan-tial advantage in vying for delivery of a com plete, robust data integration platform .
ETL Trends and Requirements
As m entioned earlier, business requirem ents are fueling the evolution of ETL into data integra-
tion platform s. Business users are dem anding access to m ore tim ely, detailed, and relevant
inform ation to inform critical business decisions and drive fast-m oving business processes.
Large Volumes of DataM ore business users today w ant access to m ore granular data (e.g., transactions) and m ore
years of history across m ore subject areas than ever before. N ot surprisingly, data w arehouses
are exploding in size and scope. Terabyte data w arehouses—a rarity several years ago—are
now fairly com m onplace. The largest data w arehouses exceed 50 terabytes of raw data, and
the size of these data w arehouses is putting enorm ous pressure on ETL adm inistrators to
speed up ETL processing.
O ur survey indicates that the percentage of organizations loading m ore than 500 gigabytes
w ill triple in 18 m onths, w hile those loading less than 1 gigabyte w ill decrease by a third.
(See Illustration 1.)
Diverse Data SourcesO ne reason for increased data volum es is that business users w ant the data w arehouse to cull data
from a w ider variety of system s. According to our survey, on average, organizations now extract
data from 12 distinct data sources. This average w ill inexorably increase over tim e as organizationsexpand their data w arehouses to support m ore subject areas and groups in the organization.
Although alm ost all com panies use ETL to extract data from relational databases, flat files, and
legacy system s, a significant percentage now w ant to extract data from application packages,
such as SAP R/3 (39 percent), XM L files (15 percent), W eb-based data sources (15 percent),
and EAI softw are (12 percent). (See Illustration 2.)
“Spreadmarts.” In addition, m any respondents also w rote in the “other”category that they w ant
ETL tools to extract data from M icrosoft Excel and Access files. In m any com panies, critical data
ETL and EAI WillConverge
More History, MoreDetail, More SubjectAreas
Illustration 1. The average data load will increase significantly in the next 18 months. Based on 756 respondents.
TODAY 18 MONTHS
Less than 1GB 59% 40%
1GB to 500GB 38% 50%
500GB+ 3% 10%
Eighteen-Month Plans for Data Load Volumes
On Average, DataWarehouses Cull Data
from 12 DistinctSources
Excel SpreadsheetsAre a Critical DataSource
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 13/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 11
ETL Trends & Requirements
is locked up in these personal data stores, w hich are controlled by executives or m anagers run-
ning different divisions or departm ents. Each group populates these data stores from different
sources at different tim es using different rules and definitions. As a result, these spreadsheets
becom e surrogate independent data m arts—or “spreadm arts”as TD W I calls them . The prolifer-
ation of spreadm arts precipitates organizational feuds (“dueling spreadsheets”) that can paralyze
an organization and drive a CEO to the brink of insanity.
Shrinking Batch WindowsW ith the expansion in data volum es, m any organizations are finding it virtually im possible to
load a data w arehouse in a single batch w indow . There are no longer enough hours at night
or on the w eekend to finish loading data w arehouses containing hundreds of gigabytes or
terabytes of data.
24x7 Requirements. Com pounding the problem , batch w indow s are shrinking or becom ing
non-existent. This is especially true in large organizations w hose data w arehouses and opera-
tional system s span regions or continents and m ust be available around the clock. There is no
longer any “system s dow ntim e”to execute tim e consum ing extracts and loads.
Variable Update Cycles.Finally, the grow ing num ber and diversity of sources that feed the
data w arehouse m ake it difficult to coordinate a single batch load. Each source system oper-
ates on a different schedule and each contains data w ith different levels of “freshness”or rele-
vancy for different groups of users. For exam ple, retail purchasing m anagers find little value in
sales data if they can’t view it w ithin one day, if not im m ediately. Accountants, how ever, m ay
w ant to exam ine general ledger data at the end of the m onth.
Operational Decision MakingAnother reason that ETL tools m ust support continuous extract/load processing is that users w ant
tim elier data. D ata w arehouses traditionally support strategic or tactical decision m aking based on
historical data com piled over several m onths or years. Typical strategic or tactical questions are:
•“What a re our year -over-year trend s?”
•“How close are we to meeting thi s mon th’s revenue goals?”
•“If we reduce pri ces, what impact wi ll i t have on our sales and inventory for the next
thr ee months?”
Illustration 2. Types of data sources that ETL programs process. Multi -choice question, based on 755 respondents.
39%
15%
65%
15%
4%
12%
81%
89%
15%
Replication or change data capture utilities
Web
Other
EAI/messaging software
Relational databases
Packaged applications
Mainframe/legacy systems
XML
Flat files
0 20 40 60 80 100
Data Sources
More Data, Less Time
Data Has Different“Expiration Dates”
Users Want TimelierData
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 14/40
12 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Near-real-timeLoads Will Triplein 18 Months
Users Want ETLVendors to Offer
Data Quality Tools
Operational BI. H ow ever, business users increasingly w ant to m ake operational decisions
based on yesterday’s or today’s data. They are now asking:
•“Based on yesterday’s sales, which products shou ld we shi p to which stores at what pri ces
to max imi ze today’s revenues?”
•“What is the most profitable way to reroute packages off a truck that broke down this mornin g?”
•“What produ cts and di scounts should I offer to the customer that I’m cu rr entl y speaking
wi th on the phone?”
According to our survey, m ore than tw o-thirds of organizations already load their data w are-
houses on a nightly basis. (See Illustration 3.) H ow ever, in the next 18 m onths, the num ber of
organizations that w ill load their data w arehouses m ultiple tim es a day w ill double, and the
num ber that w ill load data w arehouses in near real tim e w ill triple!
To m eet operational decision-m aking requirem ents, ETL processing w ill need to support tw o types
of processing: (1) large batch ETL jobs that involve extracting, transform ing, and loading a signifi-
cant am ount of data, and (2) continuous processing that captures data and application eventsfrom source system s and loads them into data w arehouses in rapid intervals or near real tim e.
M ost organizations w ill need to m erge these tw o types of processing, w hich is com plicated
because of the num erous interdependencies that adm inistrators m ust m anage to ensure consis-
tency of inform ation in the data w arehouse and reports.
Data Quality Add-OnsM any organizations w ant to purchase com plete solutions, not technology or tools for building
infrastructure and applications.
Add-on Products. W hen asked w hich add-on products from ETL vendors are “very im portant,”
users expressed significant interest in data cleansing and profiling tools. (See Illustration 4.)
This is not surprising since m any B I projects fail because the IT departm ent or (m ore thanlikely) an outside consultancy underestim ates the quality of source data.
Code, Load, and Explode. A com m on scenario is the “code, load, and explode”phenom enon.
This is w here ETL developers code the extracts and transform ations, then start processing only
to discover an unacceptably large num ber of errors due to unanticipated values in the source
data files. They fix the errors, rerun the ETL process, only to find m ore errors, and so on. This
ugly scenario repeats itself until the project deadlines and budgets becom e im periled and
angry business sponsors halt the project.
Illustration 3. Based on 754 respondents.
TODAY IN 18 M ONTHS
M onthly 32% 27%
W eekly 34% 29%Daily/ night ly 69% 65%
M ult iple t imes per day 15% 30%
Near real t ime 6% 19%
Data Warehouse Load Frequency
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 15/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 13
ETL Trends & Requirements
“In m y experience, 10 percent [of the tim e and cost associated w ith ETL] is learning the ETL tool the
first tim e out and m aking it conform to your w ill,”said John M urphy, an IT professional in a new s-
group posting. “N inety percent is the interm inable iterative process of learning just how little you
know about the source data, how dirty that data really is, how difficult it is to establish a cleansingpolicy that results in certifiably correct data on the target that m atches the source, and how long
it takes to re-w ork all the assum ptions you m ade at the outset w hile you w ere still an innocent.”
The problem w ith current ETL tools is that they assume the data they receive is clean and
consistent. They can’t assess the consistency or accuracy of source data and they can’t handle
specialized cleansing routines, such as nam e and address scrubbing. To m eet these needs, ETL
vendors are partnering w ith or acquiring data quality vendors to integrate specialized data
cleansing routines w ithin ETL w orkflow s.
Meta Data Management
As the num ber and variety of source and target system s proliferate, BI m anagers are seeking
autom ated m ethods to docum ent and m anage the interdependencies am ong data elem ents inthe disparate system s. In short, they w ant a global m eta data m anagem ent solution.
As the hub of a BI project, an ETL tool is w ell positioned to docum ent, m anage, and coordi-
nate inform ation about data w ithin data m odeling tools, source system s, data w arehouses, data
m arts, analytical tools, portals, and even other ETL tools and repositories. M any users w ould
like to see ETL tools play a m ore pivotal role in m anaging global m eta data, although others
think ETL tools w ill alw ays be inherently unfit to play the role of global traffic cop.
“Certainly, ETL vendors should store m eta data for its ow n dom ain, but if they start trying to
becom e a global m eta data repository for all BI tools, they w ill have trouble,”says Tracy from
H artford Life. “I’d prefer they focus on im proving their perform ance, m anageability, im pact
analysis reports, and ability to support com plex logic.”
Enforce Data Consistency. N evertheless, large com panies w ant a global m eta data repository to
m anage and distribute “standard”data definitions, rules, and other elem ents w ithin and
throughout a netw ork of interconnected BI environm ents. This global repository could help
ensure data consistency and a “single version of the truth”throughout an organization.
Although ETL vendors often pitch their global m eta data m anagem ent capabilities, m ost tools
today fall short of user expectations. (And even single vendor all-in-one B I suites fall short.)
W hen asked in an op en-ended question, “W hat w ould m ake your ETL tool better?”m any sur-
vey respondents w rote “better m eta data m anagem ent”or “enhanced m eta data integration.”
Ill ustration 4.A significant percentage of users want ETL tools to offer data cleansing and profiling capabil ities. Based on
740 respondents.
10%
9%
21%
24%
43%
Packaged analytic reports
Data cleansing tools
Packaged data models
Analytic tools
Data profiling tools
0 10 20 30 40 50
Rating the Importance of Add-On Products
Lack of Knowledgeabout Source DataCripples Projects
ETL Tools Assumethe Data Is Clean
Users Want ETL Toolsto Play a Pivotal Rolein Meta Data
Most ETL Tools FallShort of Expectations
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 16/40
14 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Packaged Solutions Can
Speed Deployment
“I don’t know if ETL tools should be used for a [BI] m eta data repository, but no one else is
stepping up to the plate,”says ED S C anada’s Light. Light says the gap betw een expectation
and reality som etim es puts him as a consultant in a tough position. “M any clients think the
ETL tools can do it all, and w e [as consultants] end up as the m eat in a sandw ich trying to
force a tool to do som ething it can’t.”
Packaged Solutions
Besides add-on products, an increasing num ber of organizations are looking to buy com pre-
hensive, off-the-shelf packages that provide near instant insight to their data. For instance,
m ore than a third of respondents (39 percent) said they are m ore attracted to vendors that
offer ETL as part of a com plete application or analytic suite. (See Illustration 5.)
K err-M cG ee C orporation, an energy and chem icals firm in O klahom a City, purchased a pack-
aged solution to support its O racle Financials application. “W e had valid num bers w ithin aw eek, w hich is incredible,”says B rian M orris, a data w arehousing specialist at the firm . “W e
don’t have a lot of tim e so anything that is pre-built is helpful.”
A packaged approach excels, as in Kerr-M cG ee’s case, w hen there is a single source, w hich is
a packaged application itself w ith a docum ented data m odel and A PI. The package K err-
M cG ee used also supplied analytic reports, w hich the firm did not use since it already had
reports defined in another reporting tool.
Some Skepticism. H ow ever, there is skepticism am ong som e users about ETL vendors w ho w ant
to deliver end-to-end BI packages. “W e w ant ETL vendors to focus on their core com petency, not
their brand,”says Cynthia C onnolly, vice president of application developm ent at Alliance Capital
M anagem ent. “W e w ant them to enhance their ETL products, not branch off into new areas.”
Enterprise InfrastructureMulti-Faceted Usage. Finally, organizations also w ant to broaden their use of ETL to support
additional applications besides B I. (See Illustration 6.) In essence, they w ant to standardize on
a single infrastructure product to support all their enterprise data integration requirem ents.
A m ajority of organizations already use ETL tools for BI, data m igration, and conversion proj-
ects. But, a grow ing num ber w ill use ETL tools to support application integration and m aster
reference data projects in the next 18 m onths.
Illustration 5. Analytic suites. Based on 747 respondents.
Yes (39%)
No (43%)
Not sure (18%)
39%
43%
18%
Analytic Suites
“ Does a vendor’s ability to offer ETL as part of a c omplete analytic suite make
the vendor and its produc ts more attrac tive to your team?”
Organizations Want aStandard DataIntegrationInfrastructure
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 17/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 15
Bui ld or Buy?
SUMMARYAs data w arehouses am ass m ore data from m ore sources, BI team s need ETL tools to process
data m ore quickly, efficiently, and reliably. They also need ETL tools to assum e a larger role in
coordinating the flow of data am ong source and target system s and m anaging data consistency
and integrity. Finally, m any w ant ETL vendors to provide m ore com plete solutions, either to
m eet an organization’s enterprise infrastructure needs or deliver an end-to-end BI solution.
Build or Buy?
First Decision. As BI team s exam ine their data integration needs, the first decision they need to
m ake is w hether to build or buy an ETL tool. D espite the ever-im proving functionality offered
by vendor ETL tools, there are still m any organizations that believe it is better to hand w rite
ETL program s than use off-the-shelf ETL softw are. These com panies point to the high cost ofm any ETL tools and the abundance of program m ers on their staff to justify their decision.
Strong Market to “Buy.” N onetheless, the vast m ajority of BI team s (82 percent) have pur-
chased a vendor ETL tool, according to our survey. O verall, 45 percent use vendor tools
exclusively, w hile 37 percent use a com bination of tools and custom ETL code. O nly 18 per-
cent exclusively w rite ETL program s from scratch. (See Illustration 7.)
Illustration 6. Besides BI, organizations want to use ETL products for a variety of data integration needs. Based on 740 responses.
40%
45%
81%
87%Data warehousing/business intelligence
Reference data integration(e.g., customer hubs)
Application integration
Data migration/conversion
0 20 40 60 80 100
Today
In 18 months
27%
29%
10%
11%
Current and Planned Uses of ETL
Illustration 7. The vast majori ty of organizations building data warehouses have purchased an ETL tool from a vendor. Based
on 761 responses.
Build only (18%)
Buy only (45%)
Build & buy (37%)
18%
45%
37%
Build or Buy?
Most Companies BuyETL Tools
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 18/40
16 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
W hy Buy?
Maintaining CustomCodeThe prim ary reason that organizations purchase vendor ETL tools is to m inim ize the tim e and
cost of developing and m aintaining proprietary code.
“W e need to expand the scale and scope of our data w arehouse and w e can’t do it by hand-
cranking code,”says Richard Bow les, data w arehouse developer at Safew ay Stores plc in
London. Safew ay is currently evaluating vendor ETL tools in preparation for the expansion of
its data w arehouse.
The problem w ith Safew ay’s custom CO BO L program s, according to Bow les, is that it takes
too m uch tim e to test each program w henever a change is m ade to a source system or target
data w arehouse. First, you need to analyze w hich program s are affected by a change, then
check business rules, rew rite the code, update the C O BO L copybooks, and finally test and
debug the revised program s, he says.
“Custom code equals high m aintenance. W e w ant our team focused on building new routines
to expand the data w arehouse, not preoccupied w ith m aintaining existing ones,”Bow les says.
Tools that Speed Deployment Save Money.M any B I team s are confident that vendor ETL tools
can better help them deliver projects on tim e and w ithin budget. These m anagers ranked
“speed of deploym ent”as the num ber one reason they decided to purchase an ETL tool,
according to our survey. (See Table 1.)
K en K irchner, a data w arehousing m anager at W erner Enterprises, Inc., in O m aha, N E, is using
an ETL tool to replace a team of outside consultants that botched the firm ’s first attem pt at
building a data w arehouse.
“O ur consultants radically underestim ated w hat needed to be done. It took them 15 m onths to
code 130 stored procedures,”says Kirchner. “A [vendor ETL] tool w ill reduce the num ber of
people, tim e, and m oney w e need to deliver our data w arehouse.”
ED S C anada’s Light says in m any situations, clients have older ETL environm ents w here the
business rules are encapsulated in CO BO L or other 3G Ls, m aking it im possible to do im pact
analysis. M oving to an engine-based tool allow s the business rules to be m aintained in a sin-
gle location w hich facilitates m aintenance.”
Table 1. Ranking of reasons for buying an ETL tool.
To deploy ETL more quickly
It had all the functionality we need
Help us manage meta data
Build a data integration infrastructure
1
2
3
4
0 1000 2000 3000 4000 5000
Ranking Score
Reasons for Buying an ETL Tool - Ranking
Custom Code IsLaborious to Modify
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 19/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 17
Bui ld or Buy?
W hy Build?
Cheaper and Faster. H ow ever, som e organizations believe it is cheap er and quicker to code
ET L program s than use a vendor ETL tool. (See Table 2.)
“Custom softw are is w hat saves us tim e and m oney, even in m aintenance, because it’s gearedto our business,”says M ike G albraith, director of G lobal Business System s D evelopm ent at
Tyco Electronics in H arrisburg, PA .
The key for ET L program s—or any softw are develop m ent project—is to w rite good code,
G albraith says. “W e use softw are engineering techniques. W e w rite our code from specifica-
tions and a m eta data m odel. Everything is self docum enting and that has saved us a lot.”
H ubert G oodm an, director of business intelligence at Cum m ins, Inc., a leading m aker of diesel
engines, says the cost to m aintain his group’s custom ETL code is less than the annual m ainte-
nance fee and training costs that another departm ent is paying to use a vendor ETL tool.
G oodm an’s team uses object-oriented techniques to w rite efficient, m aintainable PL/SQ L code.
To keep costs dow n and speed turnaround tim es, he outsources ETL code developm ent over-
seas to program m ers in India. (Som e practitioners say it is im practical to outsource G U I-based
ETL developm ent because of the interactive nature of such tools.)
The Hard Job of ETL.Several BI m anagers also said vendor ETL tools don’t address the really
challenging aspects of m igrating source data into a data w arehouse, specifically how to identi-
fy and clean dirty data, build interfaces to legacy system s, and deliver near-real-tim e data. For
these m anagers, the m ath doesn’t add up—w hy spend $200,000 or m ore to purchase an ETL
tool that only handles the “easy”w ork of m apping and transform ing data?
Build and Buy
Slow Migration to ETL Packages. D espite these argum ents, there is a slow and steady m igration
aw ay from w riting custom ETL program s. M ore than a quarter (26 percent) of organizations that
use custom code for ETL w ill soon replace it w ith a vendor ETL tool. Another 23 percent are
planning to augm ent their custom code w ith a vendor ETL tool. (See Illustration 8.)
W hen asked w hy they are sw itching, m any B I m anagers said that “vendor ETL tools didn’t
exist”w hen they first im plem ented their data w arehouse. O thers said they initially couldn’t
afford a vendor ETL tool or didn’t have tim e to investigate new tools or train team m em bers.
Typically, these team s leave existing hand code in p lace and use the ETL package to support
a new project or source system . O ver tim e, they displace custom code w ith the vendor ETL
Table 2. Ranking of reasons for buying an ETL tool. Based on respondents who code ETL exclusively.
Less expensive
Our culture encourages in-house development
Build a data integration infrastructure
Deploy more quickly
0 500 1000 1500 2000
1
2
3
4
Ranking Score
Reasons for Coding ETL Programs - Ranking
Cummins OutsourcesETL Coding Overseasto Speed Development
Firms Are GraduallyReplacing HomeGrown Code
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 20/40
18 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
tool as their team gains experience w ith the tool and the tool proves it can deliver adequate
perform ance and functionality.
Mix and Match. Som e team s, how ever, don’t plan to abandon custom code. Their strategy is
to use w hatever approach m akes sense in each situation. These team s often use custom code
to handle processes that vendor ETL tools don’t readily support out of the box.
For exam ple, Preetam Basil, a Priceline.com data w arehousing architect, points out that
Priceline.com uses a vendor ETL tool for the bulk of its ETL processing, but it has developed
U N IX and SQ L code to extract and scrub data from its volum inous W eb logs.
User Satisfaction w ith ETL Tools
O verall, m ost BI team s are generally pleased w ith their ETL tools. According to our survey, 28
percent of data w arehousing team s are “very satisfied”w ith their vendor ETL tool, and 51 per-
cent are “m ostly satisfied.”Vendor ETL tools get a slightly higher satisfaction rating than hand-
coded ETL program s. (See Illustration 9.)
A m ajority of organizations (52 percent) plan to upgrade to the next version of their ETL tool
w ithin 18 m onths, a sign that the team is com m itted to the product. Another 22 percent w ill
m aintain the product “as is”during the period. A sm all fraction plan to replace their vendor ETL
tool w ith another vendor ETL tool (7 percent) or custom code (2 percent). (See Illustration 10.)
Illustration 8. Almost half (49 percent) of respondents plan to replace or augment their custom code with a vendor ETL tool.
Based on 415 respondents.
Continue to enhance the custom code (43%)
Replace it with a vendor ETL tool (26%)
Augment it with a vendor ETL tool (23%)43%
26%
23%
7%2%
Outsource it to a third party (2%)
Other (7%)
Eighteen-Month Plans for Your ETL Custom Code?
Users Are “MostlySatisfied” withETL Tools
Strategy: Custom Code
Where Appropriate
Illustration 9. Vendor ETL tools have a higher satisfaction rating than custom ETL programs for organizations that use either vendor ETL
tools or custom code exclusively. Based on 341 and 134 responses, respectively.
Very Satisfied
Mostly Satisfied
Somewhat Satisfied
46%
24%
12%
18%
Not Very Satisfied
51%
16%
5%
28%
0 10 20 30 40 50 60
Vendor ETL tool
Custom code
Satisfaction Rating: Vendor Tools versus Custom Code
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 21/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 19
Bui ld or Buy?
Very few organizations have abandoned vendor ETL tools. M ost have purchased one tool and
continue to use it in a production environm ent. (See Illustrations 11a and 11b.) In other
w ords, there is very little ETL shelfw are.
Chall enges in Deploying ETL
D espite this rosy picture, BI team s encounter num erous challenges w hen deploying vendor
ETL tools. The top tw o challenges are “ensuring adequate data quality”and “understanding
source data,”according to our survey. (See Table 3.) Although these problem s are not neces-
sarily a function of the ETL tool or code, they do represent an opportunity for ETL vendors to
expand the capabilities of their toolset to m eet these needs.
Complex Transformations. The next m ost significant challenge is designing com plex trans-
form ations and m appings. “The goal of [an ETL] tool is to m inim ize the am ount of code
you have to w rite outside of the tool to handle transform ations,”says A lliance Capital
M anagem ent’s Connolly.
U nfortunately, m ost ETL tools fall short in this area. “W e w ish there w as m ore flexibility in
w riting transform ations,”says B rian K oscinski, ETL project lead at Am erican Eagle O utfitters in
W arrendale, PA. “W e often need to drop a file into a tem porary area and use our ow n code to
sort the data the w ay w e w ant it before w e can load it into our data w arehouse.”
Illustration 11a. Most BI teams only use one ETL tool. Based on 617 responses. Illustration 11b. Very few ETL tools go unused. Based on 617 responses.
ETL Products in Production
Number of ETL Tools
0 (6%)
1 (60%)
2 (25%)
60%
25%
6% 6%3%
3 (6%)
4+(3%)
0 (81%)
1 (15%)
2 (4%)
81%
15%4%1%
3 (1%)
ETL Shelfware
“ How many vendor ETL tools has your teampurcha sed that it does not use?”
Illustration 10.Most teams are committed to their current vendor ETL tool. Based on 622 responses.
Upgrade to the next or latest version (52%)
Maintain as is (22%)
Augment it with custom code or functions (11%)
52%
22%
11%
3%7%
Replace it with a tool from another vendor (7%)
Augment it with a tool from another vendor (3%)
2%4%
Replace it with hand-written code (2%)
Outsource it to a third party (0%)
Other (4%)
Eighteen-Month Plans for Your Vendor ETL Tool
Data Quality Issues Top all Challenges
Goal: Minimize CodeWritten Outside theETL Tool
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 22/40
20 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
According to our survey, only a sm all percentage of team s (11 percent) are currently extend-
ing vendor ETL tools w ith custom code. The rest are using a “reasonable am ount”of custom
code or “none at all”to augm ent their vendor ETL tools.
Skilled ETL Developers.Another significant challenge, articulated by m any survey respondents and
users interview ed for this report, is finding and training skilled ETL program m ers. M any say the
skills of ETL developers and fam iliarity w ith the tools can seriously affect the outcom e of a project.
Priceline.com ’s Basil agrees. “Initially, w e didn’t have ETL developers w ho w ere know ledge-
able about data w arehousing, database design, or ETL p rocesses. As a result, they built ineffi-
cient processes that took forever to load.”
ED S C anada’s Light says it’s im perative to hire experienced ETL developers at the outset of a
project, no m atter w hat the cost, to establish developm ent standards. O therw ise, the “prob-
lem s encountered dow n the line w ill be m uch larger and harder to grapple w ith.”O thers add
that it’s w ise to hire at least tw o ETL developers in case one leaves in m idstream .
M ost BI m anagers are frustrated by the tim e it takes developers to learn a new tool (about three
m onths) and then becom e proficient w ith it (12 or m ore m onths.) Although the skill and experi-
ence of a developer counts heavily here, a good ETL developm ent environm ent—along w ith
high-quality training and support offered by the vendor—can accelerate the learning curve.
The Temptation to Code. D evelopers often com plain that it takes m uch longer to learn the ETL
tool and m anipulate transform ation objects than sim ply coding the transform ations from
scratch. And m any give in to tem ptation. But perseverance pays off. D evelopers are five to
six tim es m ore productive once they learn an ET L tool com pared to using a com m and-line
interface, according to Pieter M im no.
Table 3. Data quality issues are the top two challenges facing ETL developers.
Ensuring adequate data quality
Creating complex mappings
Creating complex transformations
Understanding source data
Ensuring adequate performance
Providing access to meta data
Finding skilled ETL programmers
Collecting and maintaining meta data
Ensuring adequate scalability
Loading data in near real time
Ensuring adequate reliability
Integrating with third-party tools or applications
Creating and integrating user defined functions
Integrating with load utilities
Debugging programs
0 1000 2000 3000 4000 5000 6000 7000 8000
1
11
10
9
8
7
6
5
4
3
2
15
1413
12
Ranking Score
Ranking of ETL Challenges
Developers Take 12 orMore Months to BeProficient with anETL Tool
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 23/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 21
Bui ld or Buy?
PricingSticker Shock. Perhaps the biggest hurdle w ith vendor ETL tools is their price tag. M any data
w arehousing team s experience sticker shock w hen vendor ETL salespeople quote license fees.
M any team s w eigh the perceived value of a tool against its list price—w hich often exceeds
$200,000 for m any tools—and w alk aw ay from the deal.
“For $200,000 you can buy at least 4,000 m an hours of developm ent and the tool’s ongoing m ain-
tenance fee pays for one-half of a head count,”says data w arehousing consultant D arrell Piatt.
The price of ETL tools is a “difficult pill to sw allow ”for com panies that don’t realize that a
data w arehouse is different from a transaction system , says Bryan LaPlante, senior consultant
w ith Pragm atek, a M inneapolis consultancy that w orks w ith m id-m arket com panies. These
com panies don’t realize that an ETL tool m akes it easier to keep up w ith the continuous flow
of changes that are part and parcel of every data w arehousing program , he says.
But even com panies that understand the im portance of ETL tools find it difficult to open their
checkbooks. “Although ETL tools are m uch m ore functional than they w ere five years ago,
m ost are still not w orth buying at list price,”says Piatt.
Spread the Wealth.To justify the steep costs, m any team s sell the tool internally as an infrastruc-
ture com ponent that can support m ultiple projects. “The initial purchase is expensive, but it is
easier to justify w hen you use it for m ultiple projects,”says H eath H atchett, group m anager of
business intelligence engineering at Intuit in M ountain View , CA. O thers negotiate w ith vendors
until they get the price they w ant. “It’s a buyers’m arket right now ,”says Safew ay’s Bow les.
G iven the current econom y and the increasing m aturity of the ETL m arket, the list prices for ETL
products are starting to fall. The dow nw ard pressure on prices w ill continue for the next few
years as softw are giants bundle robust ETL program s into core data m anagem ent program s,
som etim es at no extra cost, and BI vendors package ETL tools inside of com prehensive B I suites.
In addition, a few start-ups now offer sm all-footprint products at extrem ely affordable prices.
Recommendations
Every B I project team has different levels and types of resources to m eet different business
requirem ents. Thus, there is no right or w rong decision about w hether to build or buy ETL capa-
bilities. But here are som e general recom m endations to guide your decision based on our research.
Buy If You …• Want to Minimize Hand Coding. ET L tools enable developers to create ETL w ork-
flow s and objects using a graphical w orkbench that m inim izes hand coding and the
problem s that it can create.
• Want to Speed Project Deployment. In the hands of experienced ETL program m ers,
an ETL tool can boost productivity and deploym ent tim e significantly. (H ow ever, be
careful, these productivity gains don’t happen until ETL program m ers have w orked w itha tool for at least a year.)
• Want to Maintain ETL Rules Transparently. Rather than bury ETL rules w ithin code,
ETL tools store target-to-source m appings and transform ations in a m eta data repository
accessible via the tool’s G U I and/or an A PI. This reduces dow nstream m aintenance
costs and insulates the team w hen key ETL program m ers leave.
•Have Unique Requirements the Tool Supports. If the tool supports your unique
requirem ents—such as providing adapters to a packaged application your com pany just
purchased—it m ay be w orthw hile to invest in a vendor ETL tool.
Hard to J ustify thePrice Compared toUsing Programmers
Downward PricingPressures Now Exist
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 24/40
22 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
•Need to Standardize Your BI Architecture. Purchasing a scalable ETL tool w ith suffi-
cient functionality is a good w ay to enable your organization to establish a standard
infrastructure for BI and other data m anagem ent projects. Reusing the ETL tool in m ulti-
ple projects helps to justify purchasing the tool.
Build If You …•Have Skilled Programmers Available. Building ETL program s m akes sense w hen
your IT departm ent has skilled program m ers available w ho understand BI and are not
bogged dow n delivering other projects.
•Practice Sound Software Engineering Techniques. The key to coding ETL is w riting
efficient code that is designed from a com m on m eta m odel and rigorously docum ented.
•Have a Solid Relationship with a Consulting Firm. Consultancies that you’ve w orked
w ith successfully in the past and understand data w arehousing can efficiently w rite and
docum ent ETL program s. (Just m ake sure they teach you the code before they leave!)
•Have Unique Functionality. If you have a significant num ber of com plex data sources
or transform ations that ETL tools can’t support out of the box, then coding ETL pro-
gram s is a good option.
•Have Existing Code. If you can reuse existing, high-quality code in a new project,
then it m ay be w iser to continue to w rite code.
Data Integration Platforms
As m entioned earlier, m any vendors are extending their ETL products to m eet new user
requirem ents. The result is a new generation of ETL products that TD W I calls data in tegration
platforms . These products extend ETL tools w ith a variety of new capabilities, including data
cleansing, data profiling, advanced data capture, increm ental updates, and a host of new
source and target system s. (See Illustration S3.)
W e can divide the m ain features of a data integration platform into platform characteristics and
data integration characteristics. N o ETL product today delivers all the features outlined below .
Illustration S3.Data integration platforms extend the capabilities of ETL tools.
Extract
Administration& operations
services
Transport services
Load
Meta dataimport/export
Databases& files
Source adapters
Transform
Runtimemeta dataservices
Target adapters
Designmanager
Legacyapplications
Applicationintegration software
GlobalMeta datarepository
Applicationpackages
Applicationintegrationsoftware
Web &XML data
Eventdata
Databases& files XML
Adapter
developmentkit
Capture
Update
Clean
Profile
Data Integration Platforms
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 25/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 23
Data Integration Platforms
H ow ever, m ost ETL vendors are m oving quickly to deliver full-fledged data integration platform s.
PlatformCharacteristics:•H igh perform ance and scalability
•Built-in data cleansing and profiling
•Com plex, reusable transform ations•Reliable operations and robust adm inistration
Data Integration Characteristics:•D iverse source and target system s
•U pdate and capture facilities
•N ear-real-tim e processing
•G lobal m eta data m anagem ent
High Performance and Scalability
Parallelization.D ata integration platform s offer exceedingly high throughput and scalability
due to parallel processing facilities that exploit high-perform ance com puting platform s.
The parallel engine processes w orkflow stream s in parallel using m ultiple operating system
threads, m ulti-pass SQ L, and an in-m em ory data cache to hold tem porary data w ithout storing
it to disk. W here dependencies exist am ong or w ithin stream s, the engine uses pipelining to
op tim ize throughput.
The high-perform ance com puting platform supports clusters of servers running m ultiple CPU s
and load balancing softw are to optim ize perform ance under large, sustained loads.
Benchmarks.In the m id-1990s, ETL engines achieved a m axim um throughput of 10 to 15 giga-
bytes of data per hour, w hich w as sufficient to m eet the needs of m ost BI projects at the tim e.
Today, how ever, ETL engines now boast throughput of betw een 100 and 150 gigabytes an
hour or m ore, thanks to steady im provem ents in m em ory m anagem ent, parallelization, distrib-
uted processing, and high-perform ance servers.1
Built-In Data Cleansing and Profiling
To avoid the “code, load, and explode”syndrom e m entioned earlier, good data integration
platform s provide built-in support for data cleansing and profiling functionality.
Data Profiling Tools. D ata profiling tools provide an exhaustive inventory of source data.
(Although m any com panies use SQ L to sam ple source data value, this is akin to sighting the
tip of an iceberg.) D ata profiling tools identify the range of values and form ats of every field
as w ell as all colum n, row , and table dependencies. The tool spits out reports that serve as
the Rosetta Stone of your source files, helping you translate betw een hopelessly out-of-date
copy books and w hat’s really in your source system s.
Data Cleansing Tools.D ata cleansing tools validate and correct business rules in source data. M ost
data cleansing tools apply specialized functions for scrubbing nam e and address data: (1) parsing
and standardizing the form at of nam es and addresses, (2) verifying those nam es/addresses against
a third-party database, (3) m atching nam es to rem ove duplicates, and (4) householding nam es to
consolidate m ailings to a single address.
1 There are m any variables that affect throughput—the com plexity of transform ations, the num ber of sourcesthat m ust be integrated, the com plexity of target data m odels, the capabilities of the target database, and soon. Your throughput rate m ay vary considerably from these rates.
Parallel Engines andHigh-Performance
Computing Platforms
Today’s ETL EnginesCan Process150GB/Hour
Profiling Tools: theRosetta Stone toSource Systems
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 26/40
24 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Som e data cleansing tools now can also apply m any of the above functions to specific types of
non-nam e-and-address data, such as products.
D ata integration platform s integrate data cleansing routines w ithin a visual ETL w orkflow . (See
Illustration S4.) At runtim e, the ETL tool issues calls to the data cleansing tool to perform its
validation, scrubbing, or m atching routines.
Ideally, data integration platform s exchange m eta data w ith profiling and cleansing tools. For
instance, a data profiling tool can share errors and form ats w ith an ETL tool so developers can
design m appings and transform ations to correct the dirty data. Also, a data cleansing tool can
share inform ation about m atched or householded custom er records so the ETL tool can com -
bine them into a single record before loading the data w arehouse.
Complex, Reusable Transformations Extensible, Reusable Objects. D ata integration platform s provide a robust set of transform ation
objects that developers can extend and com bine to create new objects and reuse them in vari-
ous ETL w orkflow s. Ideally, developers should be able to deploy a custom object that dynam -
ically adapts to its environm ent, m inim izing the tim e developers spend updating m ultiple
instances of the object.
“M ost ETL tools today lim it reuse because they are context sensitive,”says Steve Tracy at H artford
Life Insurance in Sim sbury, CT. “They let you copy and paste an object to support 37 instance of a
source system , but you have to configure each object separately for each. This becom es painful if
there is a change in the source system and you have to reconfigure all 37 instances of the object.”
External Code. Although ETL tools have gained a raft of built-in and extensible transform ationobjects, developers alw ays run into situations w here they need to w rite custom code to sup-
port com plex or unique transform ations.
To address this pain point, data integration platform s provide internal scripting languages that
enable developers to w rite custom code w ithout leaving the tool. But if a developer is m ore
com fortable coding in C, C++, or another language, the tool should also be able to call exter-
nal routines from w ithin an ETL w orkflow and m anage those routines as if they w ere internal
objects. In the sam e w ay, the tool should be able to call third-party schedulers, security sys-
tem s, m eta data repositories, and analytical tools.
Illustration S4 . Data integration platforms integrate data cleansing routines into an ETL graphical workflow.
Integrating Cleansing and ETL
Calls to External CodeShould Be Transparent
Most “Reusable” ETLObjects Are ContextSensitive
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 27/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 25
Data Integration Platforms
Reliable Operations and Robust Administration
N o m atter how m uch horsepow er a system possesses, if its runtim e environm ent is unstable,
overall perform ance suffers. In our survey, 86 percent of respondents rated “reliability”as a
“very im portant”ETL feature. (See Illustration 13) This w as the highest rating of any ETL fea-
ture respondents w ere asked to evaluate.
Robust Schedulers.D ata integration platform s provide m uch m ore robust operational and
adm inistrative features than the current generation of ETL tools. For exam ple, they provide
robust schedulers that orchestrate the flow of inter-dependent jobs (both ETL and non-ETL)
and run in a lights-out environm ent. O r they interface w ith a third-party scheduling tool so the
organization can m anage both ETL and non-ET L jobs w ithin the sam e environm ent in a highly
granular fashion.
Conditional Execution.Job processing is m ore reliable because data integration platform s are
“sm art”enough to perform conditional execution based on thresholds, content, or inter-
process dependencies. This reduces the am ount of coding needed to prevent such outages,
especially as BI environm ents becom e m ore com plex.
Debugging and Error Recovery.In addition, data integration platform s provide robust debug-
ging and error recovery capabilities that m inim ize how m uch code developers have to w rite to
recover from errors in the design or runtim e environm ents. These tools also deliver reports
and diagnostics that are easy to understand and actionable. Instead of logging w hat happened
and w hen, users w ant the tools to say why it happened and recommend fixes.
Enterprise Consoles. Surprisingly, only 18 percent of com panies said a single ETL operations
console and graphical m onitoring w as “very im portant.”(See Illustration 15) This indicates that
m ost organizations do not yet need to m anage m ultiple ETL servers. This w ill change as ETL
usage grow s and organizations decide to standardize the scheduling and m anagem ent of ET L
processes across the enterprise.
W e suspect m ost organizations w ill use in-house enterprise system s m anagem ent tools, such
as Com puter Associates’U nicenter, to m anage distributed ETL system s.
Diverse Source and Target Systems
Intelligent Adapters. The m ark of a good data integration platform is the num ber and diversity
of source and target system s it supports. Besides supporting relational databases and flat
files—w hich m ost ETL tools support today—data integration platform s provide adapters to
intelligently connect to a variety of com plex and unique data stores.
Application Interfaces and Data Types. For exam ple, operational application packages, such as
SA P’s R/3, contain unique data types, such as pooled and clustered tables, as w ell as several
different application and data interfaces. D ata integration platform s offer native support for
these interfaces and can intelligently handle unique data types.
Web Services and XML.D ata integration platform s also provide am ple support to the W eb
w orld. W e expect XM L and W eb Services to constitute an ever greater portion of the process-
ing that ETL tools perform . XM L is becom ing a standard m eans of interchanging both data and
m eta data. W eb Services (w hich use X M L) is becom ing the standard m ethod for enabling het-
erogeneous applications to interact over the Internet. N ot surprising, alm ost half (47 percent)
of survey respondents said W eb Services w ere an “im portant”or “fairly im portant”feature for
ETL tools to possess. It’s likely that W eb Services and XM L w ill becom e a de facto m ethod for
Users Want GreaterReliability aboveAll Else
Web Services andXML Are FundamentalServices for ETL
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 28/40
26 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
stream ing X M L data from operational system s to ETL tools and from data w arehouses to
reporting and analytical tools and applications.
Desktop Data Stores. Also, data integration platform s support pervasive desktop applications,
such as M icrosoft Excel and Access. By connecting to these sources via XM L and W eb
Services, data integration platform s enable organizations to halt the grow th of “spreadm arts”and the organizational infighting they create.
Adapter Development Kits. Finally, data integration platform s provide adapter developm ent kits that
let developers create custom adapters to link to unique or uncom m on data sources or applications.
Update and Capture Utilities
To m eet business requirem ents for m ore tim ely data and negotiate shrinking batch w indow s,
data integration platform s support a variety of data capture and loading techniques.
Incremental Updates. First data integration platform s update data w arehouses increm entally
instead of rebuilding or “refreshing”them (i.e., dim ensional tables) from scratch every tim e.
M ore than one-third of our survey respondents (34 percent) said increm ental update is a “very
im portant”feature in ETL tools. (See Illustration 14.)
Change Data Capture. To increm entally update a data w arehouse, ETL tools need to detect and
dow nload only the changes that occurred in the source system since the last load, a process
called “change data capture.”(Som etim es, the source system has a facility that identifies
changes since the last extract and feeds them to the ETL tool.) The ETL tool then uses SQ L
update, insert, and delete com m ands to update the target file w ith the changed data.
This process is m ore efficient than “bulk refreshes”w here the ETL tool com pletely rebuilds all
dim ension tables from scratch and appends new transactions to the fact table in a data w are-
house. M ore than a third (38 percent) of our respondents said that change data capture w as a
very im portant feature in ETL tools. (See Illustration 14.)
More Frequent Loads. Another w ay ETL tools can m inim ize load tim es is to extract and load
data at m ore frequent intervals—every hour or less—executing m any sm all batch runs
throughout the day instead of a single large one at night or on the w eekends.
For exam ple, Intuit Inc. uses a vendor ETL tool to continuously extract data every couple of
m inutes from an O racle 11i source system and load it into the data w arehouse, according to
Intuit’s H atchett. H ow ever, for its large nightly loads, it uses hardw are clusters on its ETL serv-
er and load balancing to m eet its ever-shrinking batch w indow s.
O ne w ay to accom m odate continuous loads w ithout interfering w ith user queries, is for
adm inistrators to create m irror im ages of tables or data partitions. The ETL tool then loads the
m irror im age in the background w hile end users query the “active”table or partition in the
foreground. O nce the load is com pleted, adm inistrators sw itch pointers from the original parti-
tion to the new ly updated one.2
Bulk Loading. D ata integration platform s leverage native bulk load utilities to “block slam ”data
into target databases rather than sim ply using SQ L inserts and updates. D ata integration plat-
form s can update target tables by selectively applying SQ L w ithin a bulk load operation,
som ething that tools today can’t support.
26 THEDATA WAREHOUSING INSTITUTE www.dw-institute.com
2 Adm inistrators m ust m ake sure that the ETL tool or database does not up date sum m ary tables in the targetdatabase until a “quiet period”—usually at night—and users m ust be educated about data’s volatility duringthe period w hen it’s being continuously updated.
A Remedy toSpreadmarts?
Full Refresh versusIncremental Updating
Organizations AreLoading Data MoreFrequently
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 29/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 27
Data Integration Platforms
Near-real-time Data Capture.Finally, organizations seek ETL tools that capture and load data
into the data w arehouse in near real tim e. This “trickle feed”process is needed because busi-
ness users w ant integrated data delivered on a tim elier basis (i.e., the previous day, hour, or
m inute—so they can m ake critical tactical and operational decisions w ithout delay).
Priceline.com , for exam ple, uses m iddlew are to capture source data changes into a stagingarea and then use a vendor ETL tool to load it into the data w arehouse. It has built a custom
program to feed m ultiple sessions in parallel to its vendor ETL tool, but is planning to
upgrade a new version that supports parallel processing soon.
EAI Software. As m entioned earlier in this report, m ost ETL tools today partner w ith EAI tools
to deliver near-real-tim e capabilities. The EAI tools typically use application-level interfaces to
peel off new transactions or events as soon as they are generated by the source application.
They then deliver these events to any application that needs them either im m ediately (near real
tim e), at predefined intervals (scheduled), or w hen the target application asks for them (publish
and subscribe).
Bi-directional, Peer-to-Peer Interfaces.Although EAI tools are associated w ith delivering real-
tim e data, they support a variety of delivery m ethods. Also, in contrast to ETL tools, their
application interfaces are typically bidirectional and peer-to-peer in nature, instead of unidirec-
tional and hub-and-spoke. EAI softw are flow s data and events back and forth am ong applica-
tions, each alternatively serving as sources or targets.
U nlike m ost of today’s ETL tools, data integration platform s w ill blend the best of both ETL
and EAI capabilities in a single toolset. Few organizations w ant to support dual data integra-
tion platform s and w ould prefer to source ETL and EAI capabilities from a single vendor.
The upshot is that data integration platform s w ill process data in batch or near real tim e
depending on business requirem ents. They w ill also sup port unidirectional and bidirectional
flow s of data and events am ong disparate applications and data stores using a hub-and-spoke
or peer-to-peer architecture.
Global M eta Data M anagement
To optim ize and coordinate the flow of data am ong applications and data stores, data inte-
gration platform s offer global m eta data m anagem ent services. The tools autom atically docu-
m ent, exchange, and synchronize m eta data am ong various applications and data stores in a
BI environm ent.
Meta Data Repository and Interchange. To do this, the tools m aintain a global repository of
m eta data. The repository stores m eta data about ETL m appings and transform ations. It m ay
also contain relevant m eta data from up stream or dow nstream system s, such as data m odeling
and analytical tools.
Common Warehouse Metamodel. M ore im portantly, the tools exchange and synchronize m eta
data w ith other tools via a robust m eta data interchange interface, such as the O bject
M anagem ent G roup’s Com m on W arehouse M etam odel (CW M ), w hich is based on X M L.
According to our survey, 48 percent of respondents rated “openness”and 41 percent rated
“m eta data interface”as “very im portant”design features. (See Illustration 12) Although som e
ETL vendors recently announced support for CW M and although the specifications are consid-
ered robust, it’s still too early to tell w hether these standards w ill gain substantial m om entum
to facilitate w idespread m eta data integration.
Trickle Feeding Datain Near Real Time
Data IntegrationPlatforms Merge theBest of ETL and EAI
Data IntegrationPlatforms SynchronizeMeta Data
Openness and MetaData Interfaces AreImportant Features
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 30/40
28 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Impact Analysis Reports. D ata integration platform s also generate im pact analysis reports that
analyze the dependencies am ong elem ents used by all system s in a B I environm ent. The
reports identify objects that m ay be affected by a change, such as adding a colum n, revising a
calculation, or changing a data type or attribute.
Such reports “reduce the am ount of tim e and effort required to [m anage changes] and reducethe need for custom w ork,”w rote one survey respondent. M oreover, w hen approp riate, the
tools autom atically update each other w ith such changes w ithout IT intervention. (See
Illustration S5 for a sam ple im pact analysis report.) Thirty-three percent of respondents said
im pact analysis rep orts are a “very im portant”feature.
Data Lineage. In addition, data integration platform s offer “data lineage”reports that describe or
depict the origins of a data elem ent or report and the business rules and transform ations that
w ere applied to it as it m oved from a source system to the data w arehouse or a data m art.
D ata lineage reports help business users understand the nature of the data they are analyzing.
SUMMARY D ata integration platform s represent the next generation of ETL tools. These tools extend cur-
rent ETL capabilities w ith enhanced perform ance, transform ation pow er, reliability, adm inistra-
tion, and m eta data integration. The platform s also support a w ider array of source and target
system s and provide utilities to capture and load data m ore quickly and efficiently, including
supporting near-real-tim e data capture.
ETL Evaluation Criteria
Guidelines for Selecting ETL Products. It m ay take several years for m ajor ETL tools to evolve
into data integration platform s and support all the features described above. In the m eantim e,
you m ay need to select an ETL tool or reevaluate the one you are using. If so, there are
m any things to consider. The follow ing list is a detailed—but by no m eans com prehensive—
list of evaluation criteria.
Illustration S5.ETL tools should quickly generate reports that clearly outl ine dependencies among data elements in a BI envi-
ronment and highlight the impact of changes in source systems or downstream applications.
Impact Analysis Report
Impact Analysis ReportsSave Time and Money
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 31/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 29
ETL Evalu atio n Crit eri a
It’s im portant to rem em ber that not all the criteria below m ay be applicable or im portant to
your environm ent. The key is to first understand your needs and requirem ents and then
build criteria from there. The list that follow s can help trigger ideas for im portant features
that you m ay need.
W e’ve tried to provide descriptive attributes to qualify the requirem ents so you can understandthe extent to w hich a tool m eets the criteria. If asked, all vendors w ill say their tool can per-
form certain functions. It’s better to phrase questions in a w ay that lets you ascertain their true
level of support.
Available ResourcesETL Matrix.The evaluation list below is a sum m ary version of a m atrix docum ent that TD W I
M em bers can access at the follow ing U RL: w w w .dw -institute.com /etlm atrix/. Another set of cri-
teria is available at: w w w .clickstream dataw arehousing.com /Clickstream ETLCriteria.htm .
ETL Product Guide.If you w ould like to view a com plete list of ETL products, go to TD W I’s
Marketplace Onl in e (w w w .dw -institute.com /m arketplace) and click on the “D ata
Integration”category, and then select the “D W M apping and Transform ation”subcategory,
and related subcategories.
Courses on ETL. Finally, TD W I also offers several courses on ETL, including TD W IData
Acqui siti on, TDWI Data Cleansin g, and Evaluati ng and Selecting ETL and Data Cleansin g
Tools . See our latest conference and sem inar catalogs for m ore details (w w w .dw -
institute.com /education/).
Vendor Attributes
Before exam ining tools, you need to evaluate the com pany selling the ETL tool. You w ant to
find out (1) w hether the vendor is financially stable and w on’t disappear, and (2) how com -
m itted the vendor is to its ETL toolset and the ETL m arket. Asking these questions can quickly
shorten your list of vendors and the tim e spent analyzing products!
For m ore details, ask questions in the follow ing areas:
•M ission and Focus
o Prim ary focus of com pany
o Percent of revenues and profits from ETL
o ETL m arket and product strategy
•M aturity and Scale
o Years in business. Public or private?
o N um ber of em ployees? Salespeople? D istributors? VA RS?
•Financial H ealth
o Y-Y trends in revenues, profits, stock price
o C ash reserves; D ebt•Financial Indicators
o License-to-service revenues
o Percent of revenues spent on R& D
•Relationships
o N um ber of distributors, O EM s, VA Rs
Focus on YourRequirements
How Stable Is theVendor?
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 32/40
30 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Overal l Product Considerations
Before diving in and evaluating an ETL tool’s features and functions, m ake sure you have the
right m indset. The first principle of tool selection is to question and verify everything. “It’s
never as easy as the vendor says,”w rites a B I professional from a m ajor electronics retailer.
“Verify everything they say dow n to the last turn of the screw before the sale is com plete and
the softw are license agreem ent is signed.”
Platforms.After adopting a skeptical attitude, step back and see if the product m eets your
overall needs in key areas. Topping the list is w hether the product runs on platform s that you
have in house and w ill w ork w ith your existing tools and utilities (e.g., security, schedulers,
analytical tools, data m odeling tools, source system s, etc.).
Full Lifecycle Support.Second, find out w hether the product supports all ETL processes and
provides full lifecycle support for the developm ent, testing, and production of ETL program s.
You don’t w ant a partial solution. Also, m ake sure it supports the preferred developm ent style
of your ETL developers, w hether that’s procedural, object-oriented, or G U I-driven.
Requisite Performance and Functionality.Finally, m ake sure the tool provides adequate p er-form ance and supports the transform ations you need. The only w ay to validate this is to have
vendors perform a proof of concept (PO C) w ith your data. Although a PO C m ay take several
days to set up and run, it is the only w ay to really determ ine w hether the tool w ill m eet your
business and technical requirem ents and is w orth buying. You should also never pay the ven-
dor to perform a PO C or agree to purchase the product if they com plete the PO C successfully.
“It’s really im portant to take these tools for a test drive w ith your data,”says ED S C anada’s
Light. M im no adds, “A proof of concept exposes how different the tools really are—this is
som ething you’ll never ascertain from evaluations on paper only.”
Design Features
Ease of Use. M ost BI team s w ould like vendors to m ake ETL tools easier to learn and use. Agraphical developm ent environm ent can enhance the usability of an ETL product and acceler-
ate the steep learning curve. Alm ost all survey respondents (84 percent) said a visual m apping
interface, for exam ple, w as either a “very im portant”or “fairly im portant”design feature.
Illustration 12. Ease of use is rated as a “very important” design feature. Based on 746 respondents.
48%
45%
51%
40%
41%
59%
70%
34%
0 10 20 30 40 50 60 70 80
Visual mapping interface
Single logical meta data store
Meta data interfaces
Ease of use
Openness
Debugging
Meta data reports (e.g., impact analysis)
Transformation language and power
Ratings of Importance of Design, Transformation, andMeta Data Features
You Don’t Want aPartial Solution
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 33/40
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 31
ETL Evalu atio n Crit eri a
G ood ETL tools save developers tim e by letting them com bine and reuse objects, tasks, and
processes in various w orkflow s. W orkflow s are visual and interactive, letting developers focus
on one step at a tim e. W izards and cut-and-paste features also speed developm ent, as w ell as
interactive debuggers that don’t require developers to w rite code.
M ore specifically, ask w hether the ETL tool provides:
•An integrated, graphical developm ent environm ent
•Interactive w orkflow diagram s to depict com plex flow s of jobs, processes, tasks, and data
•A graphical drag-and-drop interface for m apping source and target data
•The ability to com bine m ultiple jobs or tasks into a “container”object that can be visu-
ally depicted and opened up in a w orkflow diagram
•Reuse of containers and custom objects in m ultiple w orkflow s and projects
•A procedural or object-oriented developm ent environm ent
•A scripting language to w rite custom transform ation objects and routines
•Exits to call external code or routines on the sam e or different platform s
•Version control w ith check-in and check-out
•An audit trail that tracks changes to every process, job, task, and object
M eta Data M anagement Features
As the hub of a data w arehousing or data integration effort, an ETL tool is w ell positioned to
capture and m anage m eta data from a m ultiplicity of tools. Today, m ost ETL tools autom atical-
ly docum ent inform ation about their develop m ent and runtim e processes in a m eta data
repository (e.g., relational tables or a proprietary engine) w hich can be accessed via query
tools, a docum ented API, or a W eb brow ser.
Today, the best ETL tools im port and export technical and business m eta data w ith targeted
up stream and dow nstream system s in the B I environm ent. They also generate im pact analysis
and data lineage reports across these tools. M ost should be w orking on w ays to autom ate the
exchange of m eta data using CW M and W eb Services standards and keep heterogeneous BIsystem s in synch.
For m ore details on m eta data, ask w hether an ETL product provides:
•Self-docum enting m eta data for all design and runtim e processes
•Support for technical, business, and operational m eta data
•A rich repository for storing m eta data
•A hierarchical repository that can m anage and synchronize m ultiple local or globally
distributed m eta data repositories
•The ability to reverse-engineer source m eta data, including flat files and spreadsheets
•A robust interface to interchange m eta data w ith third-party tools
•Robust im pact analysis reports
•D ata lineage reports that show the origin and derivation of an object
Transformation Features
Extensible, Reusable, Context-Independent Objects. The best ETL tools provide a rich library of
transform ation objects that can be extended and com bined to create new , context-independ-
ent, reusable objects. The tools also provide an internal scripting language and transparent
support for calls to external routines.
Evaluate the Realityversus Promise of
a Vendor’s MetaData Strategy
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 34/40
32 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
To learn about transform ation features, ask w hether the tool provides:
•A rich library of base transform ation objects
•The ability to extend base objects
•Context-independent objects or containers
•Record- and set-level processing logic and transform s•G eneration of surrogate keys in m em ory in a single pass
•Support for m ultiple types of transform ation and m apping functions
•The ability to create tem porary files, if needed for com plex transform ations
•The ability to perform recursive processing or loops
Data Quali ty Features
The best ETL tools provide data p rofiling facilities, and can spaw n data cleansing routines
from w ithin ETL w orkflow s. To get details on data profiling and cleansing facilities, ask
w hether the ETL product:
•Provides internal or third-party data profiling and cleansing tools
•Integrates cleansing tasks into visual w orkflow s and diagram s•Enables profiling, cleansing and ETL tools to exchange data and m eta data
•Autom atically generates rules for ETL tools to build m appings that detect and fix data defects
•Profiles form ats, dependencies, and values of source data files
•Parses and standardizes records and fields to a com m on form at
•Verifies record and field contents against external reference data
•M atches and consolidates file records
Performance Features
Perform ance features received som e of the highest ratings in our survey. (See Illustration 13.)
As the volum e of data increases, batch w indow s shrink, and users require m ore tim ely data,
team s are looking to ETL tools to m ake up the difference.
The best ETL tools offer a parallel processing engine that runs on a high-perform ance com put-
ing server. It leverages threads, clusters, load balancing, and other facilities to speed through-
put and ensure adequate scalability.
Illustration 13. Reliabi lity and performance/throughput were rated as “very important” features by more than 80 percent of
750 survey respondents.
70%
56%
70%
22%
34%
81%
86%
Parallel processing
Real-time data capture
Incremental update
Reliability
Availability
Scalability
Performance and throughput
0 20 40 60 80 100
Important Performance Features: Rating
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 35/40
To learn m ore about perform ance features, ask w hether the product provides:
•A flexible, high-perform ance w ay to process and transform source data
•Intelligent m anagem ent of aggregate/sum m ary tables
o B undled w ith p roduct or extra priced option
•H igh-speed processing using m ultiple concurrent w orkflow s, m ultiple hardw are proces-sors, and system s threads and clustered servers
•In-m em ory cache to avoid creating interm ediate tem porary files
•Linear perform ance im provem ent w hen adding processor and m em ory
Extract and Capture Features
Leading ETL tools provide native support for a w ide range of source databases, a top require-
m ent am ong our survey respondents. (See Illustration 14.) The low est com m on denom inator is
support for O D BC/JD BC, but m ake sure the tool can directly access the sources you support,
preferably at no additional cost and w ithout the use of a third-party gatew ay.
U sers also em phasized the im portance of extracting and integrating data from m ultiple source
system s. “The ability to use an ETL tool to extract data from three sources and com bine them
together is w orth the effort com pared to w riting the routine in som ething like PL/SQ L,”says
Am erican Eagle’s K oscinski.
To understand extract capabilities, ask w hether the tool provides:
•The ability to schedule extracts by tim e, interval, or event.
•Robust adapters that extract data from various sources
•A developm ent fram ew ork for creating custom data adapters
•A robust set of rules for selecting source data•Selection, m apping, and m erging of records from m ultiple source files
•A facility to capture only changes in source files.
Load and Update Features
To understand load and up date features, ask w hether the product can:
•Load data into m ultiple types of target system s
•Load data into heterogeneous target system s in parallel
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 33
ETL Evalu atio n Crit eri a
Illustration 14. Users want an ETL tool to support a wide range of sources. Based on 745 respondents who rated the above items as “very important.”
34%
33%
36%
12%
22%
38%
54%
Support for native load utilities
Web services support
Real-time data capture
Breadth of sources
Incremental update
Breadth of targets supported
Change data capture
0 10 20 30 40 50 60
Extract and Load Features: User Ratings
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 36/40
•Support data partitions on the target
•Both refresh and append data in the target database
•Increm entally update target data
•Support high-speed load utilities via an API
•Turn off referential integrity and drop indexes, if desired
•Take a snapshot of data after a load for recovery purposes, if desired
•Autom atically generate D D L to create tables
Operate and Administer Component
The ultim ate test of an ETL product is how w ell it runs the program s that ETL developers design,
and how w ell it recovers from errors. Thus, m onitoring and m anaging the ETL runtim e environ-
m ent is a critical requirem ent. O ur survey respondents rated “error reporting and recovery”espe-
cially high (79 percent) w ith “scheduling”not far behind (66 percent). (See Illustration 15.)
For detailed inform ation on adm inistrative features, ask w hether the product provides:
•A visual console for m anaging and m onitoring ETL processes
•A robust, graphical scheduler to define job schedules and dependencies
•The ability to validate jobs before running them .
•A com m and line or application interface to control and run jobs
•Robust graphical m onitoring that displays in near real tim e:
•The ability to restart from checkpoints w ithout losing data
•M API-com pliant notification of errors and job statistics
•Built-in logic for defining and m anaging rejected records
•Robust set of easy-to-read, useful adm inistrative reports•A detailed job or audit log that records all activity and events in the job:
•Rigorous security controls, including links to LD AP and other directories
Integrated Product Suites
M any vendors today bundle ETL tools w ithin a business intelligence or data m anagem ent
suite. Their goal is to ensure a suite—or com plete, integrated solution—w ill provide m ore
value and be m ore attractive to m ore organizations.
34 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
Illustration 15. Error reporting leads all administration features, according to 745 respondents who rated the above items as “very important.”
40%
18%
51%
18%
66%
79%
Single operations console
Graphical monitoring
Error reporting and recovery
Single logical meta data store
Debugging
Scheduling
0 10 20 30 40 50 60 70 80
Rating of Administration Features
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 37/40
To drill into add-on products, ask:
•Is the package geared to a specific source application or is it m ore generic?
•H ow integrated is the ETL tool w ith other tools in the suite?
o D o they have a com m on look and feel?
o D o they share m eta data in an autom ated, synchronized fashion?o D o they share a com m on install and m anagem ent system ?
Company Services and Pric ing
As m entioned earlier in this report, m any user organizations find the price of ETL tools too
high. Alm ost three-quarters of survey respondents (71 percent) consider pricing am ong the top
three factors affecting their decision to buy a vendor ETL tool. (See Illustration 16.)
Bundles versus Components.U sers pay close attention to vendor packaging options. Vendors
that offer all-in-one packages at high list prices can often price them selves out of contention.
Vendors that break their packages into com ponents that enable users to buy only w hat theyneed are m ore attractive to users w ho w ant to m inim ize their initial cost outlays. H ow ever,
runtim e charges can ruin the equation if a vendor charges the custom er each tim e a com po-
nent runs on a different platform .
M ore specifically, ask about:
•Pricing for a four CPU system on W indow s and U N IX
•Annual m aintenance fees
•Level of support, training, consulting, and docum entation provided
•Pricing for add-on softw are and interfaces required to run the product
•Annual user conferences
•Cost of m oving from a single server to m ultiple servers
•U ser group available
•O nline discussion groups
THEDATA WAREHOUSING INSTITUTE www.dw-institute.com 35
ETL Evalu atio n Crit eri a
Illustration 16. Pricing is a significant factor affecting purchasing decisions. Based on 621 respondents.
A lot. It's a top factor. (10%)
Significantly. Among the top 3 factors. (61%)
Somewhat. An important but not critical factor. (26%)
61%
26%
4% 10%
Not much. It is a requirement for doing business. (4%)
How Does Pricing Affect Your ETL Purchasing Decision?
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 38/40
Conclusion
ETL tools are the traffic cop for business intelligence applications. They control the flow of
data betw een m yriad source system s and BI applications. As BI environm ents expand and
grow m ore com plex, ETL tools need to change to keep pace.
Next Generation Functionality. Today, organizations need to m ove m ore data from m ore sources
m ore quickly into a variety of distributed BI applications. To m anage this data flow , ETL tools
need to evolve from batch-oriented, single-threaded processing that extracts and loads data in
bulk to continuous, parallel processes that capture data in near real tim e. They also need to pro-
vide enhanced adm inistration and ensure reliable operations and high availability.
As the traffic cop of the B I environm ent, they need to connect to a w ider variety of system s and
data stores (notably XM L, W eb Services-based data sources, and spreadsheets) and coordinate
the exchange of inform ation am ong these system s via a global m eta data m anagem ent system .
To sim plify and speed the developm ent of com plex designs, the tools need a visual w ork envi-
ronm ent that allow s developers to create and reuse custom transform ation objects.
Finally, ETL tools need to deliver a larger part of the B I solution by integrating data cleansing
and profiling capabilities, EAI functionality, toolkits for building custom adapters, and possibly
reporting and analysis tools and analytic applications.
This is a huge list of new functionality. The sum total represents a next-generation ETL prod-
uct—or a data integration platform . It w ill take ETL vendors several years to extend their
products to support these areas, but m any are w ell on their w ay.
Focus on Your Business Requirements.The key is to understand your current and future ETL
processing needs and then identify the product features and functions that support those
needs. W ith this know ledge, you can then identify one or m ore products that can adequately
m eet your ETL and BI needs.
36 THE DATA WAREHOUSING INSTITUTE www.dw-institute.com
Evaluating ETL and Data Integration Platforms
ETL Tools Need toKeep Pace
It Will take ETLVendors Several
Years to Deliver DataIntegration Platforms
Conclusion
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 39/40
Business Objects
3030 Orchard Parkway
San Jose, CA 95134
408.953.6000Web: www.businessobjects.com
Business Objects is the world's leading provider of business
intelligence (BI) solutions.Business intelligence lets organi-
zations access,analyze,and share information internally
with employees and externally with customers,suppliers,
and partners.It helps organizations improve operationalefficiency,build profitable customer relationships,and
develop differentiated product offerings.
The company's products include data integration tools, theindustry's leading integrated business intelligence platform,
and a suite of enterprise analytic applications.Business
Objects is the first to offer a complete BI solution that iscomposed of best-of-breed components,giving organiza-
tions the means to deploy end-to-end BI to the enterprise,from data extraction to analytic applications.
Business Objects has more than 17,000 customers in over
80 countries.The company's stock is publicly traded underthe ticker symbols NASDAQ:BOBJ and Euronext Paris
(Euroclear code 12074). It is included in the SBF 120 and IT
CAC 50 French stock market indexes.
DataMirror
3100 Steeles Avenue East,
Suite 1100
Markham, Ontario L3R 8T3
Canada
905.415.0310
Fax: 905.415.0340
Email: [email protected]
Web: www.datamirror.com
DataMirror (Nasdaq:DMCX;TSX:DMC),a leading
provider of enterprise application integration and resiliency
solutions,gives companies the power to manage,monitor,and protect their corporate data in real time.DataMirror’s
comprehensive family of LiveBusiness™ solutions enablescustomers to easily and cost effectively capture,transform,
and flow data throughout the enterprise.DataMirror
unlocks the experience of now™ by providing the instantdata access,integration,and availability companies require
today across all computers in their business.
1,700 companies have gone live with DataMirror software,including Debenhams,Energis,GMAC Commercial
Mortgage,the London Stock Exchange,OshKosh B’Gosh,
Priority Health,Tiffany & Co., and Union Pacific Railroad.
Hummingbird Ltd.
1 Sparks Avenue
Toronto, Ontario M2H 2W1
CanadaEmail: [email protected]
Web: www.hummingbird.com
Headquartered in Toronto, Canada,Hummingbird Ltd.
(NASDAQ:HUMC,TSE:HUM), is a global enterprise
software company employing 1,300 people in nearly 40
offices around the world. Hummingbird’s revolutionaryHummingbird Enterprise™,an integrated information and
knowledge management solution suite,manages the entire
lifecycle of information and knowledge assets.
Hummingbird Enterprise creates a 360-degree view of allenterprise content,both structured and unstructured data,
with a portfolio of products that are both modular and
interoperable.Today, five million users rely on
Hummingbird to connect,manage,access,publish,and
search their enterprise content. For more information,please visit:http://www.hummingbird.com.
Informatica Corporation
Headquarters
2100 Seaport Boulevard
Redwood City, CA 94063
650.385.5000
Toll-free U.S.: 800.653.3871
Fax: 650.385.5500
Informatica offers the industry's only integrated business
analytics suite, including the industry-leading enterprise data
integration platform,a suite of packaged and domain-specificdata warehouses,the only Internet-based business intelli-
gence solution,and market-leading analytic applications.The
Market Leading Data Integration Solution comprise of
Informatica PowerCenter®,PowerMart®,and InformaticaPowerConnect(tm) products,the industry's leading data
integration platform helps companies fully leverage and
move data from virtually any corporate system into data
warehouses,operational data stores,staging areas,or other
analytical environment.Offering real-time performance,scalability,and extensibility,the Informatica data integration
platform can handle the challenging and unique analytic
requirements of even the largest enterprises.
SPONSORS
7/26/2019 Etl Report
http://slidepdf.com/reader/full/etl-report 40/40
The Data Warehousing Institute5200 Southcenter Blvd., Suite 250
Seattle, WA 98188
Local: 206.246.5059
Membership
As the data warehousing and business intelligence field continues to evolve and develop,
it is necessary for information technology professionals to connect and interact with one
another. TDWI provides these professionals with the opportunity to learn from each
other, network, share ideas, and respond as a collective whole to the challenges and
opportunities in the data warehousing and BI industry.
Through Membership with TDWI, these professionals make positive contributions to the
industry and advance their professional development. TDWI Members benefit through
increased knowledge of all the hottest trends in data warehousing and BI, which makes
TDWI Members some of the most valuable professionals in the industry. TDWI Members
are able to avoid common pitfalls, quickly learn data warehousing and BI fundamentals,
and network with peers and industry experts to give their projects and companies a
competitive edge in deploying data warehousing and BI solutions.
TDWI Membership includes more than 4,000 Members who are data warehousing
and information technology (IT) professionals from Fortune 1000 corporations,
consulting organizations, and governments in 45 countries. Benefits to Members
from TDWI include:
• Quarterly Business Intelligence Journal
• Biw eekly TDWI FlashPoint electronic bulletin
• Quarterly TDWI M ember Newsletter
• Annual Data W arehousing Salar y, Roles, and Responsibilities Report
• Quarterly Ten M istakes to Avoid series
• TDWI Best Practices Awar ds summaries
• Semiannual W hat Works: Best Practices in Business Intelligence and
Data Warehousing corporate case study compendium
• TDWI’s M arketplace Online comprehensive product and service guide
• Annual technology poster
• Periodic research report summaries
• Special discounts on all conferences and seminars
• Fifteen-percent discount on all publications and merchandise
Membership with TDWI is available to all data warehousing, BI, and IT professionals
for an annual fee of $245 ($295 outside the U.S.). TDWI also offers a Corporate
Membership for organizations that register 5 or more individuals as TDWI Members.