Bloor_Informatica_PowerCenter_8
Transcript of Bloor_Informatica_PowerCenter_8
-
8/14/2019 Bloor_Informatica_PowerCenter_8
1/26
Informatica PowerCenter 8Philip Howard
an evaluation from
-
8/14/2019 Bloor_Informatica_PowerCenter_8
2/26
-
8/14/2019 Bloor_Informatica_PowerCenter_8
3/26
PowerCenter 8
Bloor Research 2006 Page
Informatica PowerCenter 8
Fast facts
Inormatica PowerCenter consists o the traditional data movement tools orwhich Inormatica is well known together with a number o options, including
data quality, real-time processing, data ederation (though this is not yet com-
pletely integrated) and team development capabilities, all o which add up to a
data integration environment with a broad spread o capabilities. A major eature
o the latest release is that the company has reengineered the underlying architec-
ture o the product to be more services-based. In doing so, it allows or greater
distribution o the platorm across a grid environment as well as oering spin-o
benets (high availability and scalability) or SMP systems.
It is also worth noting the availability o Metadata Manager, Data Analyzer
and PowerExchange. Te rst two o these are available as part o PowerCenter
Advanced Edition, with the ormer providing an extended metadata environment
that supports such things as data lineage throughout the target environment (that
is, including business intelligence tools, databases, design tools and so orth),
while the latter provides extended business intelligence and reporting capabilities
to users o PowerCenter. PowerExchange, on the other hand, is a set o products
(each o which is optional) that provide access to mainrame legacy data sources,
including real-time and change data capture capabilities.
Key findings
In the opinion o Bloor Research the ollowing represent the key acts o whichprospective users should be aware:
Te grid implementation in this release is particularly impressive and should
provide signicantly improved perormance while, at the same time, enhanc-
ing availability.
Another major new eature o this release is the products push-down optimi-
sation, which allows transormations to be perormed wherever that is most
appropriate. We especially like the fexibility that this oers, not just in terms
o mapping but also with respect to the deployment o the optimiser.
Java development is now supported rom within the PowerCenter environ-
ment so that data lineage can now include coded transormations. You can
call code rom PowerCenter processes and vice versa.
In this release there is a single, web-based point o administration or all pur-
poses rather than the multiple capabilities that were previously provided.
Data ederation is provided through Inormatica having licensed source code
rom Composite Sotware, which the company is integrating with Power-
Center. However, at the present time this task has not yet been completed
-
8/14/2019 Bloor_Informatica_PowerCenter_8
4/26
PowerCenter 8
Bloor Research 2006Page 2
and tight integration will be introduced via a point release during the course
o 2006.
While the standard PowerCenter product manages and processes structured
data there is also an Unstructured Data Option that supports access to, and
transormation o, unstructured (or example, pd les) and semi-structured(such as HIPAA messages) data.
A major eature is that PowerCenter can be used to design the target database
schema and it retains detailed and ongoing knowledge o the target database.
Inormatica leverages this inormation with its dependency analysis acil-
ity (which we especially like) that allows you to examine the eects o any
change. You can visually inspect the lineage o a eld through a data fow,
and control and execute the ripple o a change going orward in that fow.
We like the analytic workfow capability provided by Data Analyzer, though
we would preer it to have a graphical icon-based interace. Tis analytic capa-
bility allows you to dene a path through an analysis or query so that even
the inexperienced can quickly perorm root cause analysis.
Tere are some areas o the product where a more user-riendly approach
would be useul. We are thereore pleased that this will be the emphasis in
PowerCenter 9.
Inormatica PowerCenter has quite impressive data proling capabilities in
its own right. However, in order to be able to oer a complete solution and
or data cleansing purposes the company has recently acquired Similarity
Systems. We believe this to be a very sensible move. A uller discussion o
this purchase and its implications is included in this review.
We like Metadata Manager and think that all major users o PowerCenter
should consider its deployment because it extends data lineage throughout
the environment. In particular, the act that it operates at eld level (and
thereore supports things such as database views, which other products oten
do not) is impressive.
The bottom line
Inormatica has long been part o a duopoly that dominates the data integration
market and in terms o PowerCenter we do not expect that position to change.
In such situations, products tend to leaprog one another but, at present at least,
we believe that Inormatica has a signicant lead over its main rival, not least with
respect to perormance. Moreover, the indications are that this is likely to remain
the case or some time to come. For example, we expect Inormatica to be the rst
o the leading vendors to have a ully integrated data integration platorm (that is,
incorporating ully integrated data ederation as well as data movement and data
quality) in place. All o this bodes well or Inormatica. While one can always nd
criticisms we are seriously impressed by the latest release o PowerCenter and the
companys plans going orward.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
5/26
PowerCenter 8
Bloor Research 2006 Page
Vendor information
Background information
Inormatica was ounded in 1993 as a services company specialising in helping itscustomers to migrate to a client/server environment. It was not until 1996 that
it introduced its rst product, Inormatica PowerMart, which was ollowed by
Inormatica PowerCenter in 1998.
Inormatica markets and sells its products both through a direct sales orce and by
means o partners that OEM Inormaticas products. In addition, the company
has a co-marketing and cross-selling partnership with webMethods, to ocus on
Business Activity Monitoring, and it has also licensed the source code o Composite
Sotwares EII (enterprise inormation integration) solution, which Inormatica
is integrating within PowerCenter (more about this later). Te company is also
very active in the SI (Systems Integrators) space, and it has a very broad range o
partners in this area including Accenture, Fujitsu, IBM, KPMG, ata Consulting,
InoSys, LogicaCMG, CapGemini, AtosOrigen and eradata amongst others.
Historically, the company has not been particularly acquisitive but there have
been some notable purchases over the years, most recently Striva and Similarity
Systems. Te ormers technology now underpins the PowerExchange suite o
real-time connectors while the latter is a brand new development (January 2006).
Tis purchase will have a signicant impact, both internally and with respect to
partners in the data quality space, which we will discuss in detail in due course.
Product availability
PowerCenter is Inormaticas main product but there are three additional prod-
ucts that it also licenses: PowerExchange, which provides real-time connectivity
capability; Data Analyzer (previously PowerAnalyzer), which provides BI-type
unctionality or PowerCenter environments; and Metadata Manager (previously
SuperGlue), which provides extended metadata options. While PowerExchange
is available as a stand-alone option both Data Analyzer and Metadata Manager
are available only as a part o the PowerCenter Advanced Edition. In addition,
there are also various chargeable options within the PowerCenter Connect am-
ily, which provides the universal data access to source and target systems; while
AHANOR (the previous Similarity Systems product) and real-time processing
are also PowerCenter options.
PowerCenter 8 was released in December 2005 or new customers. Version 8.1,
scheduled to be available in April 2006, will be the version to which existing users
can upgrade. Tere are two version o PowerCenter: the Standard Edition and the
Advanced Edition. In this review we shall concentrate on the latter.
Te operating systems supported by PowerCenter include Windows 2000/2003,
HP-UX (including 64-bit Itanium support), AIX (ditto), Sun Solaris and Linux
(both Red Hat and SuSE).
-
8/14/2019 Bloor_Informatica_PowerCenter_8
6/26
PowerCenter 8
Bloor Research 2006Page
Since version 7.1.3, which was released in September 2005, PowerCenter has
supported unstructured and semi-structured data sources as well as structured
ones, as ollows:
Structured sources supported include Oracle, Sybase, Inormix, SQL Server,
DB2, IMS, VSAM, IDMS, ADABAS, AS/400, Netezza, eradata, DAAl-legro, Hyperion Essbase, SAS, Microsot Access, fat les, XML and web logs
in addition to the PowerConnect products, which support native connectiv-
ity to data sources such as SAP R/3, PeopleSot and Siebel, as well as other
acilities detailed in the section on PowerCenter (o which the various Pow-
erConnect options orm a part). Acquisition o data is via native interaces,
ODBC, remote integration or EAI. Te PowerExchange system support (or
real-time connectivity) is detailed in the relevant section that ollows.
Unstructured support is provided or Microsot Oce products such as
Excel and PowerPoint as well as Lotus Notes, Adobe Acrobat, HML and so
orth, which are accessed natively. Semi-structured support or things such
as HIPAA documents, EDI messages and the like is provided by means o
template libraries.
PowerCenter 8 represents a major change to the architecture o the product to
provide improved perormance and high availability. However, while the empha-
sis has been on architecture the company has not had time to ocus on things like
ease o use. While there are some eatures o this release that enhance that, such as
the single point o administration (see later), the company has not had the time
that it would have liked to urther develop these capabilities. For this reason the
next major release o PowerCenter (9.0) will ocus on usability. Details are not
available at this time.
Financial results
Inormatica has grown rapidly in the last ew years. For example, in March 2000
it had just 200 sta. oday it has over 1000. In part, o course, this is due to the
companys acquisitions but it is primarily a refection o the companys move
into global markets. In addition to the United States, the company has oces in
the UK, Germany, Switzerland, France, the Netherlands, Australia, Singapore,
Canada, India, China, aiwan, Korea and Japan. In Latin America Inormatica
products are distributed by Sottek, which has oces in 9 countries in Central
and South America. Elsewhere there are distributors across Europe (where Inor-
matica does not have its own oces), in South Arica, the Philippines, Israel and
Saudi Arabia.
Inormatica foated on the stock market (NASDAQ) in 1999 and in its most re-
cent quarter (Q4, 2005) it reported revenues o $79.8m compared to $60m in the
same period during 2004. On a Pro Forma basis (that is, excluding one-o items)
net income was $14.7m compared to $5.2m while, on a GAAP basis, net income
improved to $13.6m rom a loss o $98.7m.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
7/26
PowerCenter 8
Bloor Research 2006 Page
In the last ull year (2005) the company reported total revenues o $267.4m versus
$219.7m in 2004. Net income was signicantly improved, both on a GAAP and
Pro Forma basis, improving to $33.8m versus a loss o $104.4m, and to $39.3m
rom $13.7m, respectively.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
8/26
PowerCenter 8
Bloor Research 2006Page 6
Product description
Introduction
PowerCenter 8 represents a major architectural shit in terms o how the productworks. In part, the idea behind this re-design is to oer a platorm or all integra-
tion capabilities, whether provided by Inormatica or third-parties, based on a
service-oriented architecture. What is signicant about this is that it separates the
platorm rom the integration solutions that sit on top o it, with the ormer pro-
viding common services to the latter, in a similar ashion to the way that a data-
base supports the applications that make use o it. In simple terms, this means that
on the one hand you can concentrate on providing perormance and scalability
via the platorm, while on the other you can ocus on user unctionality through
the applications that make use o the platorm. Tus, or example, it means that
the acilities o Inormaticas optimiser will be shareable across any ront-end ap-
plication that wants to use it. It also means that transormation models will be
separated rom the engine that implements them, which could potentially lead to
the standardisation o such models.
More specically, Inormatica is intent on providing a complete data integra-
tion platorm and not merely one that provides EL. O course, EL per se is
no longer necessarily the methodology you would use even or moving data into
a data warehouse, and nor is it simply about warehousing environments: EL
today is also about data migration, ERP consolidation, data synchronisation and
so orth. However, Inormaticas vision is broader even than this. As previously
mentioned, PowerCenter has already been expanded so that it can support the
transormation o such things as EDI documents but the company sees the whole
realm o EII and data ederation (that is, the ability to address heterogeneousront-end [transactional] and back-end data sources within a single query in near
real-time) as within the compass o its platorm.
We agree with this vision. Te connectors you use or access in an EII environ-
ment, the optimisation you require, and the transormation capabilities you need
are all essentially the same in both EL and EII environments, so it makes sense
to have a single platorm that supports all o these acilities with common services
that support both sets o unctionality.
Rather than create a whole new set o EII capability, Inormatica has chosen in-
stead to license source code rom Composite Sotware (a leading EII vendor) and
integrate that with PowerCenter. While this paper does not discuss this aspect
o PowerCenter (our concern here is the movement o data) it is important to
appreciate that this embedding means that the EII sotware can take advantage
o new eatures within this release o PowerCenter such as the grid architecture,
push-down optimisation and, o course, the products transormation capabili-
ties and PowerExchange. As we shall see in the sections that ollow, much o this
service-oriented approach has been implemented in PowerCenter 8 though the
integration with Composite Sotware is not yet complete: currently it consists o
metadata exchange and connection sharing but is not yet (it will be) integrated at
an architectural level.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
9/26
PowerCenter 8
Bloor Research 2006 Page
Architecture
PowerCenter itsel consists o three elements: the PowerCenter engine, the
metadata repository (which we discuss separately later) and the PowerCenter
Connect options. Te last o these provide connectivity to a wide variety o ront
and back-end resources. However, it should be appreciated that these are not justconnectivity products, because they understand the metadata o the underlying
sotware and thereore provide a degree o integration and metadata exchange
that goes beyond mere connectivity. In addition, there is a range o connectivity
options or supporting real-time data acquisition through web services, in EAI and
B2B environments, through products such as webMethods Integration Platorm,
WebSphere MQ (including guaranteed delivery), ibco Rendezvous, and via
partnerships with vendors such as webMethods. PowerConnect or Remote
Data provides the ability to securely share data across networks. PowerConnect
or Remote Integration (in eect, a restricted-use instance o PowerCenter that
can target data to PowerConnect or Remote Data) provides a high-perormance,
secure and cost-eective method or the integration o data at remote locations.
Te PowerCenter engine, on the other hand, has a number o components, nota-
bly the Server and its associated acilities; the Designer, which does what its name
suggests; management and administration unctions; additional interoperability
capabilities; plus the Real-time and Data Proling options. We will discuss each
o these in turn, ollowed by various complementary products (PowerExchange,
Data Analyzer and Metadata Manager) that are also available.
Informatica Server
Te Server has undergone the most change in this release and the work Inormaticahas done merits a detailed discussion o the new capabilities provided, which pri-
marily relate to the new services-based architecture and the new push-down op-
timiser, each o which is discussed in the sections that ollow. More generically,
PowerCenter is based on what the company reers to as scalable pipeline process-
ing. Tis employs shared, dynamic caching together with thread-based processing.
Further, Inormatica includes both a built-in parallel sorter and simultaneous sup-
port or heterogeneous targets without any need or an intermediate staging area.
Historically, PowerCenter was essentially a black box solution that consisted o
three process engines: the Extractor, Te ransormation Engine and the Loader.
However, with the latest release o the product the architecture is much more
service-oriented with these services being divided between primary and back-up
services. While there are a number o services provided, arguably the most impor-
tant are the Integration, Repository, Domain, Log and Grid services.
Some noteworthy eatures include the ability to conduct dynamic joins (amongst
other real-time transormations), native access to mainrame-based DB2 systems
and automatic partitioning to align with DB2 ESE partition schemes, incremen-
tal aggregation, and bulk loading (when [usually] this is provided by the database
vendor). In addition, PowerCenter includes a built-in recovery mechanism, which
automatically writes recovery inormation to both the data warehouse and the
-
8/14/2019 Bloor_Informatica_PowerCenter_8
10/26
PowerCenter 8
Bloor Research 2006Page 8
metadata repository (see later), whenever data is written to the warehouse. I there
is a ailure o any sort, the system will incrementally process and load those rows
that were not previously committed.
In addition to these acilities, a major point o the last ew releases o Power-
Center, in perormance terms, has been a ocus on improving the perormanceo data movement to and rom fat les (which has again been improved in 8),
with extended n-way parallel and in-memory capabilities. Te latter has also been
extended to provide in-memory support or the union o data rom heterogeneous
data sources.
wo urther eatures are particularly noteworthy. Te rst o these allows you
to capture perormance statistics so that you can identiy any bottlenecks, while
the second allows users to choose rom a variety o dierent data partitioning
schemes. Te oered choices include hash, round robin, pass-through and both
key and range based options. In addition, these can be based on the source or tar-
get and can include provision or data skew. Tis is quite impressive: we can think
o some database vendors who cannot oer this range o support or partitioning.
Further, PowerCenter 8 includes support or dynamic partitioning, which allows
the platorm to leverage existing database partitioning schemes or adjust dynami-
cally based on available resources.
Facilities are provided to let you congure, schedule and monitor the acili-
ties o the various services oered in the new architecture, through the use o
a variety o parameters that dene the what, where and when o each session.
Tis is done by means o the Workfow Manager, which is graphical, easy to
use and does not require any scripting. Workfow Manager includes the ability
to manage both scheduled and always on (as in the PowerCenter Real-ime
option) integration fows, both data and event-driven conditional execution osessions within a workfow, and real-time notication o events both to external
applications and via e-mail to administrators. Tis is enabled by the graphi-
cal Workfow Monitor, which is a dedicated systems management utility. A
workfow API is also available, to enable integration with systems administra-
tion tools including SNMP managers such as ivoli, HP OpenView and CA
Unicenter.
Also included are operational dashboards and reports that have been built using
Data Analyzer to deliver web-based metadata reporting unctionality against the
PowerCenter repository. Detail includes technical metadata relating to tables, ob-
jects and transormations, perormance statistics over time, and error reporting.
Tis enables a DBA not amiliar with the PowerCenter user interace to easily, and
potentially remotely, monitor and manage the PowerCenter environment.
Grid computingTis has had a major re-design in this release. Previously, you could either deploy
the Inormatica Server on a UNIX, Linux or Windows system (either SMP or
single processor architectures) or there was a grid option that you could take up.
However, in this release, not only has grid become the standard oering (with
stand-alone systems being regarded as a single-instance grid) but it is also much
more sophisticated than it was previously.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
11/26
PowerCenter 8
Bloor Research 2006 Page
In PowerCenter 8 the sotware is implemented in domains so that integration
services (that is, PowerCenter) or example, are executed by one or more nodes
that exist within the relevant domain. Similarly, other unctions o the envi-
ronment are represented by services, such as repository services or the special-
purpose SAP BW service (required because SAP uses a push rather than a pull
architecture).
Each node has its own resources, which you can dene; and there is both dynamic
partitioning and load balancing across the nodes. In both cases, this is based on
pre-built algorithms and is dynamic. Note, however, that the load balancing is
statistically based with statistics being gathered on a continuous basis. Tis will
impose an overhead on the relevant system(s) though Inormatica does not believe
this will be large (it has not quantied this). Alternatively, you can interace to
third party load balancing products i you preer.
Another major change to the way that grid support is implemented is that in this
release the workfow you implement on the grid (which is all metadata driven and
dynamically routed) is now based on a session or sessions being load balanced
across nodes as opposed to a serverbased approach in which each session was
limited to a single server, thus providing a more granular level o control and rout-
ing, not to mention perormance.
Finally, in so ar as the new grid architecture o the product is concerned, this has
not just been implemented to improve perormance but also or high availability
and resiliency reasons. Failover across the grid is supported as are checkpoints or
restart purposes. In addition, another new acility is the ability to sustain transient
ailurespreviously, i a network connection (say) ailed then the session would
ail, which was ne i this was a serious ault but which was a headache i it was
only a momentary ailurenow you can set the system to retry in the event o aailure so that the session can be continued.
Push-down optimisationTe other major new perormance eature in this release is the introduction o a
push-down optimiser. Tat is, it is a (cost-based) optimiser that has been designed
to determine the best place to do any part o a data movement process. For exam-
ple, the optimiser might determine that it is better to perorm a join on a source
database rather than extract the data and then join it. Similarly, it might deter-
mine that it would be better to load inormation to the target system and then
perorm transormations. In other words, you are no longer limited to EL but
can also perorm EL or EL or any combination thereo. Moreover, the sotware
can (i you want it to) optimise that placement decision or you.
Te optimiser is, in act, two-aced. Tat is, it analyses movement logic rom the
source orward and rom the target backwards. Further, it is optional. You can
implement the optimiser in ull, you can turn it o completely, or you can opt or
partial optimisation. In this last case you might, or example, just opt or source but
not target optimisation. You can also turn o the optimiser or particular purposes.
I, or example, you want to use the Oracle Loader or that purpose rather than
what the optimiser suggests then there is a checkbox that allows you to do this.
Similarly, you can speciy the use o the optimiser through the Workfow Designer
-
8/14/2019 Bloor_Informatica_PowerCenter_8
12/26
PowerCenter 8
Bloor Research 2006Page 0
(see next section) and you could also build decision points into a workfow where
the optimiser was invoked or not.
Te optimiser has knowledge o target and source databases where they are Oracle,
SQL Server, eradata or DB2; otherwise it is generic (ODBC). Note that you
cannot change the SQL generated by the optimiser in the initial release but youwill be able to in PowerCenter 8.1.
The Workflow Designer
Historically, PowerCenter was designed on the assumption that all development
would be done within the products environment (using Designer). In practice,
o course, some transormations were too dicult or impossible to dene in this
way, so users resorted to external code. However, the problem with this approach
is that you lose traceability and data lineage (see later). While previously not so
much o a problem, current concerns (not to mention legislation) over corporate
governance and compliance means that this is no longer an acceptable solution. As
a result, in this release Inormatica has introduced support or Java. Te resulting
code is compiled and the generated byte code is stored in the repository so that
it can be both reused and inspected. Support is bi-directional in the sense that
Designer-created processes can invoke Java programs and vice versa.
Te Designer itsel is used to actually build the fows that dene data movement
and supports a three or our stage process, with each o these steps being repre-
sented by its own module. In principle, you start by analysing your source data,
continue by dening or identiying the schema that will dene your target system
(traditionally, a data mart or warehouse), and then dene how source data will be
mapped into the target environment. However, this last stage can be broken downinto two parts. I you simply want a unique one-o mapping then you stop with
the Mapping Designer. However, i you want to be able to reuse that mapping,
perhaps through the use o additional lters, then you can do so using the Map-
plet Designer. In addition, there is a Visual Debugger included so that you can
trace and resolve any problems or errors that may occur.
Te our stages supported by the Designer are the:
Source Analyzer. As its name suggests, this is used to read, analyse (you
might also want to prole the data at this pointsee later) and reverse engi-
neer the schema o operational databases and the structures o fat les. Te
inormation retrieved includes relevant table and eld names, types and sizes
and so orth, which can be extracted and used as the basis or mapping into
the appropriate schema. Ater extraction, equivalent structures can be edited
in order to rene the structure o the data denition, and elds can be com-
bined or rearranged as required. Tere is also a table property editor.
In addition to source analysis per se, the Source Analyzer also provides lim-
ited acilities or veriying the accuracy o such things as table relationships
(or example, primary and oreign key relationships). However, given the
mismatch that oten exists between data and metadata, it will oten be useul
-
8/14/2019 Bloor_Informatica_PowerCenter_8
13/26
PowerCenter 8
Bloor Research 2006 Page
to undertake more detailed data analysis and discovery using the Inormatica
Data Proling option, which is provided within Source Analyzer and de-
scribed below.
Warehouse Designer. Tis is a visual tool that makes extensive use o wiz-
ards or warehouse/mart schema design. Tese specically include a StarSchema wizard, a Multi-Dimensional wizard that targets other schema types
including cubes, snowfakes, constellations, and so on, and a Slowly Chang-
ing Dimension wizard. Alternatively, users can enter target table denitions
directly, or may create them by replicating and rearranging existing source
table denitions. I the wizard-based approach is adopted then once the di-
mension levels and relevant measures are dened, the program can automati-
cally generate the underlying tables and primary-oreign key relations or
those tables.
In addition to these wizards, the Warehouse Designer also includes a Di-
mension Editor. Put simply, this allows the user to create, edit or delete di-
mensions. You can also dene any levels or hierarchies that exist within each
dimension. One useul extension that Inormatica has added with version 8
is that it now supports pattern development. Tis will be particularly useul
when working with dimensions as you previously had to dene a separate
process or each dimension; now you can create a pattern which will generate
the relevant rules or each required dimension.
Mapping Designer. Tis is a visual tool or building and editing source-to-
target mappings (business rules), which uses Data Flow Diagrams. Basically,
what you do is to link source and target data via transormation objects, each
o which is dragged and dropped into the model. Tese objects can be one
o a dozen or more dierent types that oer a variety o unctions. For ex-ample, you can perorm standard mathematical unctions, dene customised
calculations, call external or stored procedures, set up lters, look up values,
normalise VSAM les, process COBOL Copybooks, perorm comparisons
and groupings, generate sequential IDs, dene the way that you will handle
updates, and perorm data joins. In the case o these data joins, this can be
done on the fy across heterogeneous sources at any time during the mapping
or transormation process.
Mapplet Designer. Inormatica uses the Mapplet Designer specically as a
tool or reusing mappings (including across dierent PowerCenter imple-
mentations). However, it should be noted that this tool can also work with
externally developed mappings that have been written in Java, C, C++ or Ba-
sic in conjunction with Inormaticas X (transormation expression) API,
which are then imported and registered as transormation objects.
In the Mapplet Designer then, you might dene customised lters that
would allow existing mappings to be reused in particular circumstances
dependent on the lter. Tese are then saved either as a copy o the original
mapping or as an instance o it. In the ormer case, any change to the original
mapping will not aect the copy, while in the case o an instance any changes
to the original are automatically inherited by the instance. In addition to
-
8/14/2019 Bloor_Informatica_PowerCenter_8
14/26
-
8/14/2019 Bloor_Informatica_PowerCenter_8
15/26
PowerCenter 8
Bloor Research 2006 Page
import and export. Multiple labels can be associated with a single object i
necessary, and labels can overlap a number o dierent deployment groups.
Queriesusing a graphical interace within the client tools, sophisticated
queries can be created and executed against the PowerCenter repository
to select objects. Tese queries can be used to dene dynamic deploymentgroups when migrating objects, or simply to identiy objects or analysis.
Queries can be created based on a number o object attributes and can be
saved or later use, or or use in multiple deployment groups.
Centralised management of distributed resources
Te management o distributed resources rests on the use o a global metadata re-
pository, whereby local data marts are registered with the central repository. Tese
data marts may or may not be based on PowerCenter. In particular, PowerCenter
will work alongside SAP BW (Business Inormation Warehouse) data marts and
exchange suitable metadata, as we have discussed. However, it is obvious that data
marts built on PowerCenter will provide the greatest degree o integration and reusa-
bility across the whole environment. In particular, this sort o organisation will more
easily enable local customisation o centrally held transormations and mappings.
Access rom the local data marts to the central denitions is via hyperlinked short-
cuts. Tese not only ensure easy access to centralised data but also allow central
changes to be propagated to local data marts. In other words, this is a two-way
mechanism. ransormations can also be accessed via shortcuts.
Another acility oered by PowerCenter is that it includes an FP streaming serv-
er so that you can stage data to an intermediate data store or stream directly romsource to target via the transormation engine, without ever staging to disk.
Previously, the Workfow Manager provided a single interace or conguring
and monitoring these dierent systems, while the Repository Manager oered a
unied view across all systems repositories within a distributed environment. In
other words, there were several administrative and management tools. While each
was logical in its own right it did make the environment more complex than was
necessary. In this release there is now a single, web-based point o administra-
tion which spans the whole environment. In addition, a number o administra-
tive enhancements have been made: there is now a single, centralised log le, or
example, and there is also a single install, regardless o which options you have
licensed.
Interoperability
Te major elements to discuss in this area are the products support or Web Serv-
ices and security respectively.
In the case o the ormer, an illustration o which is shown in Figure 2, it is im-
portant to note that Inormatica PowerCenter comes with pre-built Web Services
-
8/14/2019 Bloor_Informatica_PowerCenter_8
16/26
PowerCenter 8
Bloor Research 2006Page
interaces out o the box, as opposed to being just Web
Services ready as is oten the case. Tese pre-built Web
Services enable PowerCenter to act as either a client or
provider, to access data via Web Services through Power-
Connect or Web Services, and to punch out midstream
to a Web Services provider.
Te other major interoperability eature is extended secu-
rity capability. Tis includes support or LDAP authen-
tication or custom authentication systems via a security
SDK (sotware development kit), support or RSA data
encryption, a partnership with Verisign or its rust
Gateway to provide certication or Web Services, and
the implementation o object-level permission- and role-based security (with in-
heritance) within the PowerCenter metadata repository. Note that with Verisign
being used or security purposes, a Web Service calling a PowerCenter workfow
is just as secure as when a developer accesses the user interace.
Other interoperability acilities exist within the area o standards, in particular in
trading partner management and e-Commerce, where Inormatica PowerCenter
oers support or the ollowing:
XML data integration provides the ability to natively source XML les and
to learn DD and XML-schema grammar or validation and decomposi-
tion purposes.
Native web server log data parsing uses CLF (common log ormat) to parse
Apache, Netscape and Microsot server log les so that the user can perorm
clickstream analysis using a third party tool.
Perl script support allows existing Windows or UNIX Perl script to be reused
as a transormation object.
External data integration to Acxioms Data Network to provide customer
demographic inormation.
Java and C++ APIs or interaces to deliver comparable unctionality to that
provided via Web Services.
Metadata Manager
Beore discussing Metadata Manager (ormerly known as SuperGlue), it is impor-
tant to point out that PowerCenter is not devoid o metadata management capa-
bilities itsel. However, these are largely limited to the PowerCenter environment
plus some metadata interchange capabilities, whereas Metadata Manager extends
beyond these boundaries. o be specic, PowerCenter was, rom the beginning,
a metadata driven technology but the metadata was (and is) limited to EL proc-
esses. In other words, the metadata in PowerCenter describes sources, mappings
and targets together with operational metadata such as sessions, workfows and
Figure 2: Web Services example usage
-
8/14/2019 Bloor_Informatica_PowerCenter_8
17/26
PowerCenter 8
Bloor Research 2006 Page
schedules. With Metadata Manager, on the other hand, companies can assem-
ble and associate metadata rom dierent products, even beyond Inormaticas
product lines, such data modelling tools, databases and data warehouses, business
intelligence and analysis tools, and so on.
Tis ability to support system-wide metadata is enhanced by intelligent lineagealgorithms (which work at the eld level as opposed to the table level), which
means that you can get insight into the use o data across dierent processes and
systems. Tis has important consequences or data stewardship. It means that you
can track all the data in your Business Objects report, or example: where it came
rom, how it was calculated and so on, in order to meet the requirements o cor-
porate governance standards like Sarbanes-Oxley, Basel II or FDA Part 11 (used
in the pharmaceutical sector). In addition, because o the eld-level approach,
the product is able to support such things as Oracle Views, whereas table-based
approaches are limited to database tables. An example o a data lineage report is
illustrated in Figure 3.
Te reason why Metadata Manager can support non-
Inormatica objects is because it is an implementation
o the Object Management Groups (OMG) Common
Warehouse Metamodel (CWM). Tis has a number o
consequences. First, it means that it is open to third party
tools or queries and browsing. Secondly, it means that it
supports XMI (XML metadata interchange, which is part
o the CWM specication) or the exchange o metadata
with other sources, such as a data warehouse. Tirdly,
Inormatica has used Data Analyzer technology to imple-
ment the J2EE-based Metadata Manager Server, which
enables a Metadata Web Services architecture. Further,Metadata Manager uses PowerCenter technology to
load and maintain the metadata warehouse, which has
standard warehouse advantages such as metadata history
(versions), metadata reporting, and so on. Te connectors to database catalogs
like DB2 and Oracle, and to proprietary repositories like Business Objects, CA
ERwin, and so orth, are called XConnects and are developed and delivered by
Inormatica as well as by third parties. Te latter is possible because all interaces
are standard and documented in an SDK guide.
One more point on the subject o CWM: Inormatica has extended the standard
so that you can also report on non-CWM objects, which is essential to span the
whole environment. We understand that Inormatica has made submissions to
the OMG about these extensions being added to the next version o the CWM
specication. However, as the extensions are implemented based on urther OMG
standards such as MOF (Metadata Object Facility), which is a superset o CWM,
integration with other environments should be straightorward.
At the ront-end, Metadata Manager exhibits many o the characteristics o an
enterprise portal and, indeed, it includes a portal integration kit. In particular, it
includes personalisation capabilities so that individual users will see detail relevant
to themselves, and there are Amazon-like acilities o the i you were interested in
Figure : An example of a data lineage report
-
8/14/2019 Bloor_Informatica_PowerCenter_8
18/26
PowerCenter 8
Bloor Research 2006Page 6
this then you might want to look at that variety. Te query and reporting unc-
tions are based on Inormatica Data Analyzer, which was extended specically to
cater or the needs o Metadata Manager. Tus you can use it to see in which envi-
ronments a specic attribute is used, how a metric is calculated, what a denition
means, how oten a particular report has been run, the number o PowerCenter
sessions, and so orth.
Data Analyzer
Data Analyzer (previously known as PowerAnalyzer) used
to be sold, as was Metadata Manager, as a stand-alone
product. Now it is only available as part o PowerCenter
Advanced Edition. Tus, while it was ormerly available
as a general-purpose business intelligence tool it is now
only marketed or querying and reporting against Power-
Center (and Metadata Manager).
When using Data Analyzer you start with all o the rel-
evant metadata in place (derived rom PowerCenter and/
or Metadata Manager) upon which you can build queries,
activate those queries, drill-down, slice-and-dice, present
data in a variety o graphical ormats, view data within
browser-based dashboards (or an example see Figure 4),
and export data in various ormats, including Excel, CSV
(comma separated value), HML, Adobe Acrobat .pd or
as a data mining tree.
When it comes to designing new queries and reports,there are our signicant eatures o Data Analyzer, as
ollows:
Te Report Wizard provides a our step process or building a report: rst
select the metrics (many are pre-dened or you can create your own) that you
want to use then select the attributes that you want to measure. In the case
o attributes these will only be those that are relevant to the metrics you want
to assess and can be discovered by browsing and selecting, or by using the
products nd option. Te third step is then to dene lters and the order in
which you want rankings to be displayed. And, nally, the ourth step is to
determine the ormat and style o the report.
At rst sight this may not seem anything remarkable. It is easy to miss that
starting with the selection o a metric or metrics is not how you usually go
about dening a report. Usually, you start by dening the set (customers,
say) that you want to report on and what elds you want to display. Ten
you dene how you are going to select those customers. Te Inormatica
Report Wizard turns this process on its head. I you want to investigate cus-
tomer churn you start by selecting that metric and go on rom there, instead
o selecting customers and then deciding how to dene churn. While this
approach may take a little getting used to it is, in our opinion, very intuitive,
Figure : Dashboard view
-
8/14/2019 Bloor_Informatica_PowerCenter_8
19/26
PowerCenter 8
Bloor Research 2006 Page
and should mean that business users require ar less assistance rom the I
department in creating queries.
Analytic Workfow allows you to dene a (reusable) route through relevant
inormation. For example, root cause analysis might help you to identiy why
some critical measure has allen below a pre-dened limit. Te use o work-fows can make this process much simpler and aster. In many cases, these
workfows will map on to the steps o a business process, linking together
inormation rom many sources and unctions within the organisation. One
o the problems with traditional approaches is that you have to know what
you are looking or when you drill down into data. Analytic Workfow allows
you to dene best practices in a consistent way that will guide less experi-
enced users through that process. In other words these workfows act in a
knowledge management capacity to capture the experience o senior business
proessionals, which can then be used or knowledge transer.
Alerts in Data Analyzer are sophisticated. You can, or example, use an Inor-
matica alert as the starting point or an analytic workfow. Or you could use
it to automatically generate a report and then send that to relevant recipients.
You could even, in conjunction with PowerCenter, use an alert to trigger a
data movement task. It is the extension o alerts beyond mere notication
that, in our view, makes this such a useul technology. Te alerts themselves
may be threshold-based (that is, i a measure alls above or below a pre-
dened limit) or time-based (check at regular intervals or at a pre-dened
time) and can be requested rom and delivered to any supported device. Te
requesting and receiving devices do not have to be the same.
Te presentation capabilities in Data Analyzer are web-based, with a portal-
style presentation environment, in that it includes the sort o customisationand personalisation options that you would expect rom a portal; the ability
to move panes around the screen, a shared documents area, and so on. Role
based security, with support or both users and user groups, is also provided.
I you want to integrate these presentation capabilities into a ull-blown En-
terprise Inormation Portal then there are tools provided to do this, while
the company has already developed the relevant portlets to integrate with
a number o third party products. Tese are available as part o a Portal
Integration Kit and there is also a Java API available. aken together, these
capabilities can be used to build highly customised environments, such as
balanced scorecards, that then make use o Data Analyzer as a platorm to
support a wide range o projects and users, thereby extending the range o
environments in which the product can be used. It is also worth noting that
Inormatica PowerAnalyzer has been certied, in the United States, under
Section 508, which is the US legislation regarding disabled access. In other
words the presentation environment is particularly easy to use and view.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
20/26
PowerCenter 8
Bloor Research 2006Page 8
Optional products
The Real-Time option
Tis is a superset o PowerCenter that includes the ZLEngine, a zero latency engine (see Figure 5) that provides
an always on, trickle eed mechanism or processing live
data that allows the data warehousing environment to be
continuously updated with real-time data. For example,
in this illustration an XML report with respect to stu-
dents is read, then there is a lter or students that are
not active, the data is sorted by class and then a calcula-
tion (GPA: grade point average) is perormed beore the
results are written to the student table.
While this diagram should be airly sel-explanatory, there are a couple o par-
ticular points to note. Te rst o these is that the integration with the various
message queuing vendors is bi-directional while the second is that the sotware is
transaction aware. Tat is, it will only process complete transactions. O course,
there is some hyperbole involved: zero wait processing assumes that processing
takes no time, which is never absolutely correct. However, the dependencies are
the volume o data being presented by the message system, and the complexity o
the transormations, and the ZL engine should be able to cope with both o these
with no discernible lag time.
Data quality
Inormatica rst moved ormally into the data quality space with the release o
PowerCenter 7, in which the company introduced its own data proling capabili-
ties. At the same time, it entered into a partnership with FirstLogic, integrating
that companys acilities with its own, or those companies that wanted either
more advanced proling and analysis capabilities, or data cleansing and matching,
or both. Subsequently, in mid-summer 2005, the company announced a compa-
rable partnership with rillium. However, in January 2006 Inormatica acquired
Similarity Systems, whose AHANOR product will now be the companys main
data quality oering, though Inormatica will continue to support third party
environments such as FirstLogic and rillium.
While this is not the place to provide a detailed description o AHANOR (which
would require a ull review o its own) it is appropriate to give a brie understand-
ing o the product.
Perhaps the most important thing to note about AHANOR is that it has
been designed or business users rather than I experts. Tis means that AH-
ANOR is much easier to use and requires a lot less training than some other
tools. Historically, this meant that there was a downside to the product in that
some elements o proling, specically with respect to structure rather than
content, were not present in AHANOR. In order to rectiy this, Similarity
Figure : How the ZL Engine works
-
8/14/2019 Bloor_Informatica_PowerCenter_8
21/26
PowerCenter 8
Bloor Research 2006 Page
acquired Evoke Axio during the course o 2005 to provide this sort o I-level
unctionality.
In terms o the product suite itsel (there is more than one versionor example,
there is a version designed to support one-o data quality projects and another
to support ongoing quality assurance) it is more ruitul to consider AHANORin terms o a logical architecture, and indeed in the way that you would use the
product, rather than simply on how the product is constructed. In practice, there
are ve steps involved:
1. Data Quality Auditconduct a data quality analysis, which consists o pro-
ling the data and then producing a number o reports, both low level drill-
down reports and high level scorecard reports. At this stage you would also
set targets or required and achievable levels o data quality.
2. Standardisebuild and apply standardisation rules based on what you
ound in the proling stage. Tis includes things like parsing names or prod-
uct codes into their constituent parts; enhancing the data where appropriate
by determining the correct value or blank or incorrect elds (or example,
inserting the correct country eld based on the city and/or other parts o the
address eld); removing noise (extraneous data); and removing or replacing
bad or inconsistent data. Tis is achieved through a combination o look-up
(dictionaries) or routine-based techniques.
3. Matchingthis is carried out ater standardisation as it usually generates
better results i you match against a standardised data set. Tis is where you
identiy duplicates and things like households or subsidiaries. Te product
enables matching and reconciliation across databases. Tis uses a combina-
tion o user dened rules (again these can be based on dictionaries) andmathematical matching algorithms.
4. Consolidationthis enables users to manage and automate the data con-
solidation process. In other words it lets you merge duplicate records based
on keys (rather than necessarily overwriting datathough you can do this i
you wish), create linkages between related records, append data rom reer-
ence sources and, when appropriate, replace inaccurate data with reerence
data.
5. Audit again and trackthis is where the on-going nature o the process
kicks in. Once the data has been standardised, matched and consolidated
you would prole it again to see i you have come up to the targets you set
at stage one. Te Data Quality scorecard is a ramework or continuous im-
provement o Data Quality and company-wide intelligence, providing you
with the acility to support any data quality programmes that you may have
put in place.
Similarity Systems, which was ounded in 2000, has established a high reputation
in a short period o time, especially in Europe. Te sotware is based on Unicode
so that it can be used world-wide and we particularly like its emphasis on the busi-
ness user community rather than I (though it does that too since the purchase
-
8/14/2019 Bloor_Informatica_PowerCenter_8
22/26
PowerCenter 8
Bloor Research 2006Page 20
o Evoke). While it is still early days (as we write it is less than a week since the
announcement o Similaritys acquisition) we believe that this is a sound strategic
move on the part o Inormatica that can only enhance its platorm.
PowerExchangePowerExchange is available both as a stand-alone product and in conjunction
with PowerCenter. Tere are basically two main points about PowerExchange.
First, there are the types o data movement that are supported or these environ-
ments, o which there are three options: batch, real-time and change data capture
(CDC). While the rst two o these are sel-explanatory, the last may need some
explanation. What it allows you to do is, once an initial bulk update (say to popu-
late a new data warehouse) has been done then the change data capture option will
allow trickle eeds o just the data that has changed. Tis unctionality is totally
metadata-driven and, in the case o the relational databases that are supported,
it should be noted that change data capture uses log-based capture rather than
burdening the database with external triggers. In this respect, one o the notable
eatures is support or pre-etching in Oracle environments.
Secondly, there are the platorms it supports. Tese include not just the most
popular current databases and WebSphere MQ (across all relevant operating sys-
tems) but also a variety o legacy data sources including IMS, VSAM, Adabas,
CA-Datacom, CA-IDMS, ICL IDMS-X, C-ISAM and fat le systems.
Apart rom the eatures already mentioned, other capabilities worth noting in-
clude multi-byte character support, parameterised SQL, and RSA encryption.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
23/26
PowerCenter 8
Bloor Research 2006 Page 2
Summary
At any time over the last decade you could have asked any vendor in Inormati-
cas market about their biggest competitor and you would always have got the
same response: hand coding. Despite the success o Inormatica and others themajority o data integration tasks continue to be hand coded. However, that is
changing, rstly because legislation such as Sarbanes-Oxley is requiring acilities
such as data lineage and secondly because companies are increasingly realising the
importance o data quality. Both o these things are very dicult and complex
to build into hand-coded solutions which, as an approach, is thereore losing its
ormer popularity.
At the same time there is an increasing demand by large enterprises to reduce the
number o suppliers they deal with and where data movement projects might have
previously been independent decisions, more and more companies are now man-
dating a corporate integration (as opposed to mere movement or EL) solution.
Further, the platorm that users require is expanding, most notably to include
data ederation but also to incorporate support or unstructured and semi-struc-
tured data and transormations.
Tis means that the market or tools such as PowerCenter is expanding, both at
the enterprise-level and or project-based developments in smaller organisations.
Inormatica is well placed to capitalise on these trends.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
24/26
Bloor Research Overview
Bloor Research has spent the last decade developing what is recognised as Europes
leading independent I research organisation. With its core research activities
underpinning a range o services, rom research and consulting to events and pub-lishing, Bloor Research is committed to turning knowledge into client value across
all o its products and engagements. Our objectives are:
Save clients time by providing comparison and analysis that is clear and
succinct.
Update clients expertise, enabling them to have a clear understanding o I
issues and acts and validate existing technology strategies.
Bring an independent perspective, minimising the inherent risks o product
selection and decision-making.
Communicate our visionary perspective o the uture o I.
Founded in 1989, Bloor Research is one o the worlds leading I research, analy-
sis and consultancy organisationsdistributing research and analysis to I user
and vendor organisations throughout the world via online subscriptions, tailored
research services and consultancy projects.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
25/26
Copyright & Disclaimer
Tis document is subject to copyright. No part o this publication may be repro-
duced by any method whatsoever without the prior consent o Bloor Research.
Due to the nature o this material, numerous hardware and sotware products
have been mentioned by name. In the majority, i not all, o the cases, these
product names are claimed as trademarks by the companies that manuacture the
products. It is not Bloor Researchs intent to claim these names or trademarks as
our own.
Whilst every care has been taken in the preparation o this document to ensure
that the inormation is correct, the publishers cannot accept responsibility or any
errors or omissions.
-
8/14/2019 Bloor_Informatica_PowerCenter_8
26/26
uite 4, Town Hall, 86 Watling Street EastOWCESTER, Northamptonshire, NN12 6BS, United Kingdom