Bloor_Informatica_PowerCenter_8

download Bloor_Informatica_PowerCenter_8

of 26

Transcript of Bloor_Informatica_PowerCenter_8

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    1/26

    Informatica PowerCenter 8Philip Howard

    an evaluation from

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    2/26

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    3/26

    PowerCenter 8

    Bloor Research 2006 Page

    Informatica PowerCenter 8

    Fast facts

    Inormatica PowerCenter consists o the traditional data movement tools orwhich Inormatica is well known together with a number o options, including

    data quality, real-time processing, data ederation (though this is not yet com-

    pletely integrated) and team development capabilities, all o which add up to a

    data integration environment with a broad spread o capabilities. A major eature

    o the latest release is that the company has reengineered the underlying architec-

    ture o the product to be more services-based. In doing so, it allows or greater

    distribution o the platorm across a grid environment as well as oering spin-o

    benets (high availability and scalability) or SMP systems.

    It is also worth noting the availability o Metadata Manager, Data Analyzer

    and PowerExchange. Te rst two o these are available as part o PowerCenter

    Advanced Edition, with the ormer providing an extended metadata environment

    that supports such things as data lineage throughout the target environment (that

    is, including business intelligence tools, databases, design tools and so orth),

    while the latter provides extended business intelligence and reporting capabilities

    to users o PowerCenter. PowerExchange, on the other hand, is a set o products

    (each o which is optional) that provide access to mainrame legacy data sources,

    including real-time and change data capture capabilities.

    Key findings

    In the opinion o Bloor Research the ollowing represent the key acts o whichprospective users should be aware:

    Te grid implementation in this release is particularly impressive and should

    provide signicantly improved perormance while, at the same time, enhanc-

    ing availability.

    Another major new eature o this release is the products push-down optimi-

    sation, which allows transormations to be perormed wherever that is most

    appropriate. We especially like the fexibility that this oers, not just in terms

    o mapping but also with respect to the deployment o the optimiser.

    Java development is now supported rom within the PowerCenter environ-

    ment so that data lineage can now include coded transormations. You can

    call code rom PowerCenter processes and vice versa.

    In this release there is a single, web-based point o administration or all pur-

    poses rather than the multiple capabilities that were previously provided.

    Data ederation is provided through Inormatica having licensed source code

    rom Composite Sotware, which the company is integrating with Power-

    Center. However, at the present time this task has not yet been completed

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    4/26

    PowerCenter 8

    Bloor Research 2006Page 2

    and tight integration will be introduced via a point release during the course

    o 2006.

    While the standard PowerCenter product manages and processes structured

    data there is also an Unstructured Data Option that supports access to, and

    transormation o, unstructured (or example, pd les) and semi-structured(such as HIPAA messages) data.

    A major eature is that PowerCenter can be used to design the target database

    schema and it retains detailed and ongoing knowledge o the target database.

    Inormatica leverages this inormation with its dependency analysis acil-

    ity (which we especially like) that allows you to examine the eects o any

    change. You can visually inspect the lineage o a eld through a data fow,

    and control and execute the ripple o a change going orward in that fow.

    We like the analytic workfow capability provided by Data Analyzer, though

    we would preer it to have a graphical icon-based interace. Tis analytic capa-

    bility allows you to dene a path through an analysis or query so that even

    the inexperienced can quickly perorm root cause analysis.

    Tere are some areas o the product where a more user-riendly approach

    would be useul. We are thereore pleased that this will be the emphasis in

    PowerCenter 9.

    Inormatica PowerCenter has quite impressive data proling capabilities in

    its own right. However, in order to be able to oer a complete solution and

    or data cleansing purposes the company has recently acquired Similarity

    Systems. We believe this to be a very sensible move. A uller discussion o

    this purchase and its implications is included in this review.

    We like Metadata Manager and think that all major users o PowerCenter

    should consider its deployment because it extends data lineage throughout

    the environment. In particular, the act that it operates at eld level (and

    thereore supports things such as database views, which other products oten

    do not) is impressive.

    The bottom line

    Inormatica has long been part o a duopoly that dominates the data integration

    market and in terms o PowerCenter we do not expect that position to change.

    In such situations, products tend to leaprog one another but, at present at least,

    we believe that Inormatica has a signicant lead over its main rival, not least with

    respect to perormance. Moreover, the indications are that this is likely to remain

    the case or some time to come. For example, we expect Inormatica to be the rst

    o the leading vendors to have a ully integrated data integration platorm (that is,

    incorporating ully integrated data ederation as well as data movement and data

    quality) in place. All o this bodes well or Inormatica. While one can always nd

    criticisms we are seriously impressed by the latest release o PowerCenter and the

    companys plans going orward.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    5/26

    PowerCenter 8

    Bloor Research 2006 Page

    Vendor information

    Background information

    Inormatica was ounded in 1993 as a services company specialising in helping itscustomers to migrate to a client/server environment. It was not until 1996 that

    it introduced its rst product, Inormatica PowerMart, which was ollowed by

    Inormatica PowerCenter in 1998.

    Inormatica markets and sells its products both through a direct sales orce and by

    means o partners that OEM Inormaticas products. In addition, the company

    has a co-marketing and cross-selling partnership with webMethods, to ocus on

    Business Activity Monitoring, and it has also licensed the source code o Composite

    Sotwares EII (enterprise inormation integration) solution, which Inormatica

    is integrating within PowerCenter (more about this later). Te company is also

    very active in the SI (Systems Integrators) space, and it has a very broad range o

    partners in this area including Accenture, Fujitsu, IBM, KPMG, ata Consulting,

    InoSys, LogicaCMG, CapGemini, AtosOrigen and eradata amongst others.

    Historically, the company has not been particularly acquisitive but there have

    been some notable purchases over the years, most recently Striva and Similarity

    Systems. Te ormers technology now underpins the PowerExchange suite o

    real-time connectors while the latter is a brand new development (January 2006).

    Tis purchase will have a signicant impact, both internally and with respect to

    partners in the data quality space, which we will discuss in detail in due course.

    Product availability

    PowerCenter is Inormaticas main product but there are three additional prod-

    ucts that it also licenses: PowerExchange, which provides real-time connectivity

    capability; Data Analyzer (previously PowerAnalyzer), which provides BI-type

    unctionality or PowerCenter environments; and Metadata Manager (previously

    SuperGlue), which provides extended metadata options. While PowerExchange

    is available as a stand-alone option both Data Analyzer and Metadata Manager

    are available only as a part o the PowerCenter Advanced Edition. In addition,

    there are also various chargeable options within the PowerCenter Connect am-

    ily, which provides the universal data access to source and target systems; while

    AHANOR (the previous Similarity Systems product) and real-time processing

    are also PowerCenter options.

    PowerCenter 8 was released in December 2005 or new customers. Version 8.1,

    scheduled to be available in April 2006, will be the version to which existing users

    can upgrade. Tere are two version o PowerCenter: the Standard Edition and the

    Advanced Edition. In this review we shall concentrate on the latter.

    Te operating systems supported by PowerCenter include Windows 2000/2003,

    HP-UX (including 64-bit Itanium support), AIX (ditto), Sun Solaris and Linux

    (both Red Hat and SuSE).

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    6/26

    PowerCenter 8

    Bloor Research 2006Page

    Since version 7.1.3, which was released in September 2005, PowerCenter has

    supported unstructured and semi-structured data sources as well as structured

    ones, as ollows:

    Structured sources supported include Oracle, Sybase, Inormix, SQL Server,

    DB2, IMS, VSAM, IDMS, ADABAS, AS/400, Netezza, eradata, DAAl-legro, Hyperion Essbase, SAS, Microsot Access, fat les, XML and web logs

    in addition to the PowerConnect products, which support native connectiv-

    ity to data sources such as SAP R/3, PeopleSot and Siebel, as well as other

    acilities detailed in the section on PowerCenter (o which the various Pow-

    erConnect options orm a part). Acquisition o data is via native interaces,

    ODBC, remote integration or EAI. Te PowerExchange system support (or

    real-time connectivity) is detailed in the relevant section that ollows.

    Unstructured support is provided or Microsot Oce products such as

    Excel and PowerPoint as well as Lotus Notes, Adobe Acrobat, HML and so

    orth, which are accessed natively. Semi-structured support or things such

    as HIPAA documents, EDI messages and the like is provided by means o

    template libraries.

    PowerCenter 8 represents a major change to the architecture o the product to

    provide improved perormance and high availability. However, while the empha-

    sis has been on architecture the company has not had time to ocus on things like

    ease o use. While there are some eatures o this release that enhance that, such as

    the single point o administration (see later), the company has not had the time

    that it would have liked to urther develop these capabilities. For this reason the

    next major release o PowerCenter (9.0) will ocus on usability. Details are not

    available at this time.

    Financial results

    Inormatica has grown rapidly in the last ew years. For example, in March 2000

    it had just 200 sta. oday it has over 1000. In part, o course, this is due to the

    companys acquisitions but it is primarily a refection o the companys move

    into global markets. In addition to the United States, the company has oces in

    the UK, Germany, Switzerland, France, the Netherlands, Australia, Singapore,

    Canada, India, China, aiwan, Korea and Japan. In Latin America Inormatica

    products are distributed by Sottek, which has oces in 9 countries in Central

    and South America. Elsewhere there are distributors across Europe (where Inor-

    matica does not have its own oces), in South Arica, the Philippines, Israel and

    Saudi Arabia.

    Inormatica foated on the stock market (NASDAQ) in 1999 and in its most re-

    cent quarter (Q4, 2005) it reported revenues o $79.8m compared to $60m in the

    same period during 2004. On a Pro Forma basis (that is, excluding one-o items)

    net income was $14.7m compared to $5.2m while, on a GAAP basis, net income

    improved to $13.6m rom a loss o $98.7m.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    7/26

    PowerCenter 8

    Bloor Research 2006 Page

    In the last ull year (2005) the company reported total revenues o $267.4m versus

    $219.7m in 2004. Net income was signicantly improved, both on a GAAP and

    Pro Forma basis, improving to $33.8m versus a loss o $104.4m, and to $39.3m

    rom $13.7m, respectively.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    8/26

    PowerCenter 8

    Bloor Research 2006Page 6

    Product description

    Introduction

    PowerCenter 8 represents a major architectural shit in terms o how the productworks. In part, the idea behind this re-design is to oer a platorm or all integra-

    tion capabilities, whether provided by Inormatica or third-parties, based on a

    service-oriented architecture. What is signicant about this is that it separates the

    platorm rom the integration solutions that sit on top o it, with the ormer pro-

    viding common services to the latter, in a similar ashion to the way that a data-

    base supports the applications that make use o it. In simple terms, this means that

    on the one hand you can concentrate on providing perormance and scalability

    via the platorm, while on the other you can ocus on user unctionality through

    the applications that make use o the platorm. Tus, or example, it means that

    the acilities o Inormaticas optimiser will be shareable across any ront-end ap-

    plication that wants to use it. It also means that transormation models will be

    separated rom the engine that implements them, which could potentially lead to

    the standardisation o such models.

    More specically, Inormatica is intent on providing a complete data integra-

    tion platorm and not merely one that provides EL. O course, EL per se is

    no longer necessarily the methodology you would use even or moving data into

    a data warehouse, and nor is it simply about warehousing environments: EL

    today is also about data migration, ERP consolidation, data synchronisation and

    so orth. However, Inormaticas vision is broader even than this. As previously

    mentioned, PowerCenter has already been expanded so that it can support the

    transormation o such things as EDI documents but the company sees the whole

    realm o EII and data ederation (that is, the ability to address heterogeneousront-end [transactional] and back-end data sources within a single query in near

    real-time) as within the compass o its platorm.

    We agree with this vision. Te connectors you use or access in an EII environ-

    ment, the optimisation you require, and the transormation capabilities you need

    are all essentially the same in both EL and EII environments, so it makes sense

    to have a single platorm that supports all o these acilities with common services

    that support both sets o unctionality.

    Rather than create a whole new set o EII capability, Inormatica has chosen in-

    stead to license source code rom Composite Sotware (a leading EII vendor) and

    integrate that with PowerCenter. While this paper does not discuss this aspect

    o PowerCenter (our concern here is the movement o data) it is important to

    appreciate that this embedding means that the EII sotware can take advantage

    o new eatures within this release o PowerCenter such as the grid architecture,

    push-down optimisation and, o course, the products transormation capabili-

    ties and PowerExchange. As we shall see in the sections that ollow, much o this

    service-oriented approach has been implemented in PowerCenter 8 though the

    integration with Composite Sotware is not yet complete: currently it consists o

    metadata exchange and connection sharing but is not yet (it will be) integrated at

    an architectural level.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    9/26

    PowerCenter 8

    Bloor Research 2006 Page

    Architecture

    PowerCenter itsel consists o three elements: the PowerCenter engine, the

    metadata repository (which we discuss separately later) and the PowerCenter

    Connect options. Te last o these provide connectivity to a wide variety o ront

    and back-end resources. However, it should be appreciated that these are not justconnectivity products, because they understand the metadata o the underlying

    sotware and thereore provide a degree o integration and metadata exchange

    that goes beyond mere connectivity. In addition, there is a range o connectivity

    options or supporting real-time data acquisition through web services, in EAI and

    B2B environments, through products such as webMethods Integration Platorm,

    WebSphere MQ (including guaranteed delivery), ibco Rendezvous, and via

    partnerships with vendors such as webMethods. PowerConnect or Remote

    Data provides the ability to securely share data across networks. PowerConnect

    or Remote Integration (in eect, a restricted-use instance o PowerCenter that

    can target data to PowerConnect or Remote Data) provides a high-perormance,

    secure and cost-eective method or the integration o data at remote locations.

    Te PowerCenter engine, on the other hand, has a number o components, nota-

    bly the Server and its associated acilities; the Designer, which does what its name

    suggests; management and administration unctions; additional interoperability

    capabilities; plus the Real-time and Data Proling options. We will discuss each

    o these in turn, ollowed by various complementary products (PowerExchange,

    Data Analyzer and Metadata Manager) that are also available.

    Informatica Server

    Te Server has undergone the most change in this release and the work Inormaticahas done merits a detailed discussion o the new capabilities provided, which pri-

    marily relate to the new services-based architecture and the new push-down op-

    timiser, each o which is discussed in the sections that ollow. More generically,

    PowerCenter is based on what the company reers to as scalable pipeline process-

    ing. Tis employs shared, dynamic caching together with thread-based processing.

    Further, Inormatica includes both a built-in parallel sorter and simultaneous sup-

    port or heterogeneous targets without any need or an intermediate staging area.

    Historically, PowerCenter was essentially a black box solution that consisted o

    three process engines: the Extractor, Te ransormation Engine and the Loader.

    However, with the latest release o the product the architecture is much more

    service-oriented with these services being divided between primary and back-up

    services. While there are a number o services provided, arguably the most impor-

    tant are the Integration, Repository, Domain, Log and Grid services.

    Some noteworthy eatures include the ability to conduct dynamic joins (amongst

    other real-time transormations), native access to mainrame-based DB2 systems

    and automatic partitioning to align with DB2 ESE partition schemes, incremen-

    tal aggregation, and bulk loading (when [usually] this is provided by the database

    vendor). In addition, PowerCenter includes a built-in recovery mechanism, which

    automatically writes recovery inormation to both the data warehouse and the

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    10/26

    PowerCenter 8

    Bloor Research 2006Page 8

    metadata repository (see later), whenever data is written to the warehouse. I there

    is a ailure o any sort, the system will incrementally process and load those rows

    that were not previously committed.

    In addition to these acilities, a major point o the last ew releases o Power-

    Center, in perormance terms, has been a ocus on improving the perormanceo data movement to and rom fat les (which has again been improved in 8),

    with extended n-way parallel and in-memory capabilities. Te latter has also been

    extended to provide in-memory support or the union o data rom heterogeneous

    data sources.

    wo urther eatures are particularly noteworthy. Te rst o these allows you

    to capture perormance statistics so that you can identiy any bottlenecks, while

    the second allows users to choose rom a variety o dierent data partitioning

    schemes. Te oered choices include hash, round robin, pass-through and both

    key and range based options. In addition, these can be based on the source or tar-

    get and can include provision or data skew. Tis is quite impressive: we can think

    o some database vendors who cannot oer this range o support or partitioning.

    Further, PowerCenter 8 includes support or dynamic partitioning, which allows

    the platorm to leverage existing database partitioning schemes or adjust dynami-

    cally based on available resources.

    Facilities are provided to let you congure, schedule and monitor the acili-

    ties o the various services oered in the new architecture, through the use o

    a variety o parameters that dene the what, where and when o each session.

    Tis is done by means o the Workfow Manager, which is graphical, easy to

    use and does not require any scripting. Workfow Manager includes the ability

    to manage both scheduled and always on (as in the PowerCenter Real-ime

    option) integration fows, both data and event-driven conditional execution osessions within a workfow, and real-time notication o events both to external

    applications and via e-mail to administrators. Tis is enabled by the graphi-

    cal Workfow Monitor, which is a dedicated systems management utility. A

    workfow API is also available, to enable integration with systems administra-

    tion tools including SNMP managers such as ivoli, HP OpenView and CA

    Unicenter.

    Also included are operational dashboards and reports that have been built using

    Data Analyzer to deliver web-based metadata reporting unctionality against the

    PowerCenter repository. Detail includes technical metadata relating to tables, ob-

    jects and transormations, perormance statistics over time, and error reporting.

    Tis enables a DBA not amiliar with the PowerCenter user interace to easily, and

    potentially remotely, monitor and manage the PowerCenter environment.

    Grid computingTis has had a major re-design in this release. Previously, you could either deploy

    the Inormatica Server on a UNIX, Linux or Windows system (either SMP or

    single processor architectures) or there was a grid option that you could take up.

    However, in this release, not only has grid become the standard oering (with

    stand-alone systems being regarded as a single-instance grid) but it is also much

    more sophisticated than it was previously.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    11/26

    PowerCenter 8

    Bloor Research 2006 Page

    In PowerCenter 8 the sotware is implemented in domains so that integration

    services (that is, PowerCenter) or example, are executed by one or more nodes

    that exist within the relevant domain. Similarly, other unctions o the envi-

    ronment are represented by services, such as repository services or the special-

    purpose SAP BW service (required because SAP uses a push rather than a pull

    architecture).

    Each node has its own resources, which you can dene; and there is both dynamic

    partitioning and load balancing across the nodes. In both cases, this is based on

    pre-built algorithms and is dynamic. Note, however, that the load balancing is

    statistically based with statistics being gathered on a continuous basis. Tis will

    impose an overhead on the relevant system(s) though Inormatica does not believe

    this will be large (it has not quantied this). Alternatively, you can interace to

    third party load balancing products i you preer.

    Another major change to the way that grid support is implemented is that in this

    release the workfow you implement on the grid (which is all metadata driven and

    dynamically routed) is now based on a session or sessions being load balanced

    across nodes as opposed to a serverbased approach in which each session was

    limited to a single server, thus providing a more granular level o control and rout-

    ing, not to mention perormance.

    Finally, in so ar as the new grid architecture o the product is concerned, this has

    not just been implemented to improve perormance but also or high availability

    and resiliency reasons. Failover across the grid is supported as are checkpoints or

    restart purposes. In addition, another new acility is the ability to sustain transient

    ailurespreviously, i a network connection (say) ailed then the session would

    ail, which was ne i this was a serious ault but which was a headache i it was

    only a momentary ailurenow you can set the system to retry in the event o aailure so that the session can be continued.

    Push-down optimisationTe other major new perormance eature in this release is the introduction o a

    push-down optimiser. Tat is, it is a (cost-based) optimiser that has been designed

    to determine the best place to do any part o a data movement process. For exam-

    ple, the optimiser might determine that it is better to perorm a join on a source

    database rather than extract the data and then join it. Similarly, it might deter-

    mine that it would be better to load inormation to the target system and then

    perorm transormations. In other words, you are no longer limited to EL but

    can also perorm EL or EL or any combination thereo. Moreover, the sotware

    can (i you want it to) optimise that placement decision or you.

    Te optimiser is, in act, two-aced. Tat is, it analyses movement logic rom the

    source orward and rom the target backwards. Further, it is optional. You can

    implement the optimiser in ull, you can turn it o completely, or you can opt or

    partial optimisation. In this last case you might, or example, just opt or source but

    not target optimisation. You can also turn o the optimiser or particular purposes.

    I, or example, you want to use the Oracle Loader or that purpose rather than

    what the optimiser suggests then there is a checkbox that allows you to do this.

    Similarly, you can speciy the use o the optimiser through the Workfow Designer

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    12/26

    PowerCenter 8

    Bloor Research 2006Page 0

    (see next section) and you could also build decision points into a workfow where

    the optimiser was invoked or not.

    Te optimiser has knowledge o target and source databases where they are Oracle,

    SQL Server, eradata or DB2; otherwise it is generic (ODBC). Note that you

    cannot change the SQL generated by the optimiser in the initial release but youwill be able to in PowerCenter 8.1.

    The Workflow Designer

    Historically, PowerCenter was designed on the assumption that all development

    would be done within the products environment (using Designer). In practice,

    o course, some transormations were too dicult or impossible to dene in this

    way, so users resorted to external code. However, the problem with this approach

    is that you lose traceability and data lineage (see later). While previously not so

    much o a problem, current concerns (not to mention legislation) over corporate

    governance and compliance means that this is no longer an acceptable solution. As

    a result, in this release Inormatica has introduced support or Java. Te resulting

    code is compiled and the generated byte code is stored in the repository so that

    it can be both reused and inspected. Support is bi-directional in the sense that

    Designer-created processes can invoke Java programs and vice versa.

    Te Designer itsel is used to actually build the fows that dene data movement

    and supports a three or our stage process, with each o these steps being repre-

    sented by its own module. In principle, you start by analysing your source data,

    continue by dening or identiying the schema that will dene your target system

    (traditionally, a data mart or warehouse), and then dene how source data will be

    mapped into the target environment. However, this last stage can be broken downinto two parts. I you simply want a unique one-o mapping then you stop with

    the Mapping Designer. However, i you want to be able to reuse that mapping,

    perhaps through the use o additional lters, then you can do so using the Map-

    plet Designer. In addition, there is a Visual Debugger included so that you can

    trace and resolve any problems or errors that may occur.

    Te our stages supported by the Designer are the:

    Source Analyzer. As its name suggests, this is used to read, analyse (you

    might also want to prole the data at this pointsee later) and reverse engi-

    neer the schema o operational databases and the structures o fat les. Te

    inormation retrieved includes relevant table and eld names, types and sizes

    and so orth, which can be extracted and used as the basis or mapping into

    the appropriate schema. Ater extraction, equivalent structures can be edited

    in order to rene the structure o the data denition, and elds can be com-

    bined or rearranged as required. Tere is also a table property editor.

    In addition to source analysis per se, the Source Analyzer also provides lim-

    ited acilities or veriying the accuracy o such things as table relationships

    (or example, primary and oreign key relationships). However, given the

    mismatch that oten exists between data and metadata, it will oten be useul

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    13/26

    PowerCenter 8

    Bloor Research 2006 Page

    to undertake more detailed data analysis and discovery using the Inormatica

    Data Proling option, which is provided within Source Analyzer and de-

    scribed below.

    Warehouse Designer. Tis is a visual tool that makes extensive use o wiz-

    ards or warehouse/mart schema design. Tese specically include a StarSchema wizard, a Multi-Dimensional wizard that targets other schema types

    including cubes, snowfakes, constellations, and so on, and a Slowly Chang-

    ing Dimension wizard. Alternatively, users can enter target table denitions

    directly, or may create them by replicating and rearranging existing source

    table denitions. I the wizard-based approach is adopted then once the di-

    mension levels and relevant measures are dened, the program can automati-

    cally generate the underlying tables and primary-oreign key relations or

    those tables.

    In addition to these wizards, the Warehouse Designer also includes a Di-

    mension Editor. Put simply, this allows the user to create, edit or delete di-

    mensions. You can also dene any levels or hierarchies that exist within each

    dimension. One useul extension that Inormatica has added with version 8

    is that it now supports pattern development. Tis will be particularly useul

    when working with dimensions as you previously had to dene a separate

    process or each dimension; now you can create a pattern which will generate

    the relevant rules or each required dimension.

    Mapping Designer. Tis is a visual tool or building and editing source-to-

    target mappings (business rules), which uses Data Flow Diagrams. Basically,

    what you do is to link source and target data via transormation objects, each

    o which is dragged and dropped into the model. Tese objects can be one

    o a dozen or more dierent types that oer a variety o unctions. For ex-ample, you can perorm standard mathematical unctions, dene customised

    calculations, call external or stored procedures, set up lters, look up values,

    normalise VSAM les, process COBOL Copybooks, perorm comparisons

    and groupings, generate sequential IDs, dene the way that you will handle

    updates, and perorm data joins. In the case o these data joins, this can be

    done on the fy across heterogeneous sources at any time during the mapping

    or transormation process.

    Mapplet Designer. Inormatica uses the Mapplet Designer specically as a

    tool or reusing mappings (including across dierent PowerCenter imple-

    mentations). However, it should be noted that this tool can also work with

    externally developed mappings that have been written in Java, C, C++ or Ba-

    sic in conjunction with Inormaticas X (transormation expression) API,

    which are then imported and registered as transormation objects.

    In the Mapplet Designer then, you might dene customised lters that

    would allow existing mappings to be reused in particular circumstances

    dependent on the lter. Tese are then saved either as a copy o the original

    mapping or as an instance o it. In the ormer case, any change to the original

    mapping will not aect the copy, while in the case o an instance any changes

    to the original are automatically inherited by the instance. In addition to

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    14/26

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    15/26

    PowerCenter 8

    Bloor Research 2006 Page

    import and export. Multiple labels can be associated with a single object i

    necessary, and labels can overlap a number o dierent deployment groups.

    Queriesusing a graphical interace within the client tools, sophisticated

    queries can be created and executed against the PowerCenter repository

    to select objects. Tese queries can be used to dene dynamic deploymentgroups when migrating objects, or simply to identiy objects or analysis.

    Queries can be created based on a number o object attributes and can be

    saved or later use, or or use in multiple deployment groups.

    Centralised management of distributed resources

    Te management o distributed resources rests on the use o a global metadata re-

    pository, whereby local data marts are registered with the central repository. Tese

    data marts may or may not be based on PowerCenter. In particular, PowerCenter

    will work alongside SAP BW (Business Inormation Warehouse) data marts and

    exchange suitable metadata, as we have discussed. However, it is obvious that data

    marts built on PowerCenter will provide the greatest degree o integration and reusa-

    bility across the whole environment. In particular, this sort o organisation will more

    easily enable local customisation o centrally held transormations and mappings.

    Access rom the local data marts to the central denitions is via hyperlinked short-

    cuts. Tese not only ensure easy access to centralised data but also allow central

    changes to be propagated to local data marts. In other words, this is a two-way

    mechanism. ransormations can also be accessed via shortcuts.

    Another acility oered by PowerCenter is that it includes an FP streaming serv-

    er so that you can stage data to an intermediate data store or stream directly romsource to target via the transormation engine, without ever staging to disk.

    Previously, the Workfow Manager provided a single interace or conguring

    and monitoring these dierent systems, while the Repository Manager oered a

    unied view across all systems repositories within a distributed environment. In

    other words, there were several administrative and management tools. While each

    was logical in its own right it did make the environment more complex than was

    necessary. In this release there is now a single, web-based point o administra-

    tion which spans the whole environment. In addition, a number o administra-

    tive enhancements have been made: there is now a single, centralised log le, or

    example, and there is also a single install, regardless o which options you have

    licensed.

    Interoperability

    Te major elements to discuss in this area are the products support or Web Serv-

    ices and security respectively.

    In the case o the ormer, an illustration o which is shown in Figure 2, it is im-

    portant to note that Inormatica PowerCenter comes with pre-built Web Services

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    16/26

    PowerCenter 8

    Bloor Research 2006Page

    interaces out o the box, as opposed to being just Web

    Services ready as is oten the case. Tese pre-built Web

    Services enable PowerCenter to act as either a client or

    provider, to access data via Web Services through Power-

    Connect or Web Services, and to punch out midstream

    to a Web Services provider.

    Te other major interoperability eature is extended secu-

    rity capability. Tis includes support or LDAP authen-

    tication or custom authentication systems via a security

    SDK (sotware development kit), support or RSA data

    encryption, a partnership with Verisign or its rust

    Gateway to provide certication or Web Services, and

    the implementation o object-level permission- and role-based security (with in-

    heritance) within the PowerCenter metadata repository. Note that with Verisign

    being used or security purposes, a Web Service calling a PowerCenter workfow

    is just as secure as when a developer accesses the user interace.

    Other interoperability acilities exist within the area o standards, in particular in

    trading partner management and e-Commerce, where Inormatica PowerCenter

    oers support or the ollowing:

    XML data integration provides the ability to natively source XML les and

    to learn DD and XML-schema grammar or validation and decomposi-

    tion purposes.

    Native web server log data parsing uses CLF (common log ormat) to parse

    Apache, Netscape and Microsot server log les so that the user can perorm

    clickstream analysis using a third party tool.

    Perl script support allows existing Windows or UNIX Perl script to be reused

    as a transormation object.

    External data integration to Acxioms Data Network to provide customer

    demographic inormation.

    Java and C++ APIs or interaces to deliver comparable unctionality to that

    provided via Web Services.

    Metadata Manager

    Beore discussing Metadata Manager (ormerly known as SuperGlue), it is impor-

    tant to point out that PowerCenter is not devoid o metadata management capa-

    bilities itsel. However, these are largely limited to the PowerCenter environment

    plus some metadata interchange capabilities, whereas Metadata Manager extends

    beyond these boundaries. o be specic, PowerCenter was, rom the beginning,

    a metadata driven technology but the metadata was (and is) limited to EL proc-

    esses. In other words, the metadata in PowerCenter describes sources, mappings

    and targets together with operational metadata such as sessions, workfows and

    Figure 2: Web Services example usage

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    17/26

    PowerCenter 8

    Bloor Research 2006 Page

    schedules. With Metadata Manager, on the other hand, companies can assem-

    ble and associate metadata rom dierent products, even beyond Inormaticas

    product lines, such data modelling tools, databases and data warehouses, business

    intelligence and analysis tools, and so on.

    Tis ability to support system-wide metadata is enhanced by intelligent lineagealgorithms (which work at the eld level as opposed to the table level), which

    means that you can get insight into the use o data across dierent processes and

    systems. Tis has important consequences or data stewardship. It means that you

    can track all the data in your Business Objects report, or example: where it came

    rom, how it was calculated and so on, in order to meet the requirements o cor-

    porate governance standards like Sarbanes-Oxley, Basel II or FDA Part 11 (used

    in the pharmaceutical sector). In addition, because o the eld-level approach,

    the product is able to support such things as Oracle Views, whereas table-based

    approaches are limited to database tables. An example o a data lineage report is

    illustrated in Figure 3.

    Te reason why Metadata Manager can support non-

    Inormatica objects is because it is an implementation

    o the Object Management Groups (OMG) Common

    Warehouse Metamodel (CWM). Tis has a number o

    consequences. First, it means that it is open to third party

    tools or queries and browsing. Secondly, it means that it

    supports XMI (XML metadata interchange, which is part

    o the CWM specication) or the exchange o metadata

    with other sources, such as a data warehouse. Tirdly,

    Inormatica has used Data Analyzer technology to imple-

    ment the J2EE-based Metadata Manager Server, which

    enables a Metadata Web Services architecture. Further,Metadata Manager uses PowerCenter technology to

    load and maintain the metadata warehouse, which has

    standard warehouse advantages such as metadata history

    (versions), metadata reporting, and so on. Te connectors to database catalogs

    like DB2 and Oracle, and to proprietary repositories like Business Objects, CA

    ERwin, and so orth, are called XConnects and are developed and delivered by

    Inormatica as well as by third parties. Te latter is possible because all interaces

    are standard and documented in an SDK guide.

    One more point on the subject o CWM: Inormatica has extended the standard

    so that you can also report on non-CWM objects, which is essential to span the

    whole environment. We understand that Inormatica has made submissions to

    the OMG about these extensions being added to the next version o the CWM

    specication. However, as the extensions are implemented based on urther OMG

    standards such as MOF (Metadata Object Facility), which is a superset o CWM,

    integration with other environments should be straightorward.

    At the ront-end, Metadata Manager exhibits many o the characteristics o an

    enterprise portal and, indeed, it includes a portal integration kit. In particular, it

    includes personalisation capabilities so that individual users will see detail relevant

    to themselves, and there are Amazon-like acilities o the i you were interested in

    Figure : An example of a data lineage report

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    18/26

    PowerCenter 8

    Bloor Research 2006Page 6

    this then you might want to look at that variety. Te query and reporting unc-

    tions are based on Inormatica Data Analyzer, which was extended specically to

    cater or the needs o Metadata Manager. Tus you can use it to see in which envi-

    ronments a specic attribute is used, how a metric is calculated, what a denition

    means, how oten a particular report has been run, the number o PowerCenter

    sessions, and so orth.

    Data Analyzer

    Data Analyzer (previously known as PowerAnalyzer) used

    to be sold, as was Metadata Manager, as a stand-alone

    product. Now it is only available as part o PowerCenter

    Advanced Edition. Tus, while it was ormerly available

    as a general-purpose business intelligence tool it is now

    only marketed or querying and reporting against Power-

    Center (and Metadata Manager).

    When using Data Analyzer you start with all o the rel-

    evant metadata in place (derived rom PowerCenter and/

    or Metadata Manager) upon which you can build queries,

    activate those queries, drill-down, slice-and-dice, present

    data in a variety o graphical ormats, view data within

    browser-based dashboards (or an example see Figure 4),

    and export data in various ormats, including Excel, CSV

    (comma separated value), HML, Adobe Acrobat .pd or

    as a data mining tree.

    When it comes to designing new queries and reports,there are our signicant eatures o Data Analyzer, as

    ollows:

    Te Report Wizard provides a our step process or building a report: rst

    select the metrics (many are pre-dened or you can create your own) that you

    want to use then select the attributes that you want to measure. In the case

    o attributes these will only be those that are relevant to the metrics you want

    to assess and can be discovered by browsing and selecting, or by using the

    products nd option. Te third step is then to dene lters and the order in

    which you want rankings to be displayed. And, nally, the ourth step is to

    determine the ormat and style o the report.

    At rst sight this may not seem anything remarkable. It is easy to miss that

    starting with the selection o a metric or metrics is not how you usually go

    about dening a report. Usually, you start by dening the set (customers,

    say) that you want to report on and what elds you want to display. Ten

    you dene how you are going to select those customers. Te Inormatica

    Report Wizard turns this process on its head. I you want to investigate cus-

    tomer churn you start by selecting that metric and go on rom there, instead

    o selecting customers and then deciding how to dene churn. While this

    approach may take a little getting used to it is, in our opinion, very intuitive,

    Figure : Dashboard view

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    19/26

    PowerCenter 8

    Bloor Research 2006 Page

    and should mean that business users require ar less assistance rom the I

    department in creating queries.

    Analytic Workfow allows you to dene a (reusable) route through relevant

    inormation. For example, root cause analysis might help you to identiy why

    some critical measure has allen below a pre-dened limit. Te use o work-fows can make this process much simpler and aster. In many cases, these

    workfows will map on to the steps o a business process, linking together

    inormation rom many sources and unctions within the organisation. One

    o the problems with traditional approaches is that you have to know what

    you are looking or when you drill down into data. Analytic Workfow allows

    you to dene best practices in a consistent way that will guide less experi-

    enced users through that process. In other words these workfows act in a

    knowledge management capacity to capture the experience o senior business

    proessionals, which can then be used or knowledge transer.

    Alerts in Data Analyzer are sophisticated. You can, or example, use an Inor-

    matica alert as the starting point or an analytic workfow. Or you could use

    it to automatically generate a report and then send that to relevant recipients.

    You could even, in conjunction with PowerCenter, use an alert to trigger a

    data movement task. It is the extension o alerts beyond mere notication

    that, in our view, makes this such a useul technology. Te alerts themselves

    may be threshold-based (that is, i a measure alls above or below a pre-

    dened limit) or time-based (check at regular intervals or at a pre-dened

    time) and can be requested rom and delivered to any supported device. Te

    requesting and receiving devices do not have to be the same.

    Te presentation capabilities in Data Analyzer are web-based, with a portal-

    style presentation environment, in that it includes the sort o customisationand personalisation options that you would expect rom a portal; the ability

    to move panes around the screen, a shared documents area, and so on. Role

    based security, with support or both users and user groups, is also provided.

    I you want to integrate these presentation capabilities into a ull-blown En-

    terprise Inormation Portal then there are tools provided to do this, while

    the company has already developed the relevant portlets to integrate with

    a number o third party products. Tese are available as part o a Portal

    Integration Kit and there is also a Java API available. aken together, these

    capabilities can be used to build highly customised environments, such as

    balanced scorecards, that then make use o Data Analyzer as a platorm to

    support a wide range o projects and users, thereby extending the range o

    environments in which the product can be used. It is also worth noting that

    Inormatica PowerAnalyzer has been certied, in the United States, under

    Section 508, which is the US legislation regarding disabled access. In other

    words the presentation environment is particularly easy to use and view.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    20/26

    PowerCenter 8

    Bloor Research 2006Page 8

    Optional products

    The Real-Time option

    Tis is a superset o PowerCenter that includes the ZLEngine, a zero latency engine (see Figure 5) that provides

    an always on, trickle eed mechanism or processing live

    data that allows the data warehousing environment to be

    continuously updated with real-time data. For example,

    in this illustration an XML report with respect to stu-

    dents is read, then there is a lter or students that are

    not active, the data is sorted by class and then a calcula-

    tion (GPA: grade point average) is perormed beore the

    results are written to the student table.

    While this diagram should be airly sel-explanatory, there are a couple o par-

    ticular points to note. Te rst o these is that the integration with the various

    message queuing vendors is bi-directional while the second is that the sotware is

    transaction aware. Tat is, it will only process complete transactions. O course,

    there is some hyperbole involved: zero wait processing assumes that processing

    takes no time, which is never absolutely correct. However, the dependencies are

    the volume o data being presented by the message system, and the complexity o

    the transormations, and the ZL engine should be able to cope with both o these

    with no discernible lag time.

    Data quality

    Inormatica rst moved ormally into the data quality space with the release o

    PowerCenter 7, in which the company introduced its own data proling capabili-

    ties. At the same time, it entered into a partnership with FirstLogic, integrating

    that companys acilities with its own, or those companies that wanted either

    more advanced proling and analysis capabilities, or data cleansing and matching,

    or both. Subsequently, in mid-summer 2005, the company announced a compa-

    rable partnership with rillium. However, in January 2006 Inormatica acquired

    Similarity Systems, whose AHANOR product will now be the companys main

    data quality oering, though Inormatica will continue to support third party

    environments such as FirstLogic and rillium.

    While this is not the place to provide a detailed description o AHANOR (which

    would require a ull review o its own) it is appropriate to give a brie understand-

    ing o the product.

    Perhaps the most important thing to note about AHANOR is that it has

    been designed or business users rather than I experts. Tis means that AH-

    ANOR is much easier to use and requires a lot less training than some other

    tools. Historically, this meant that there was a downside to the product in that

    some elements o proling, specically with respect to structure rather than

    content, were not present in AHANOR. In order to rectiy this, Similarity

    Figure : How the ZL Engine works

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    21/26

    PowerCenter 8

    Bloor Research 2006 Page

    acquired Evoke Axio during the course o 2005 to provide this sort o I-level

    unctionality.

    In terms o the product suite itsel (there is more than one versionor example,

    there is a version designed to support one-o data quality projects and another

    to support ongoing quality assurance) it is more ruitul to consider AHANORin terms o a logical architecture, and indeed in the way that you would use the

    product, rather than simply on how the product is constructed. In practice, there

    are ve steps involved:

    1. Data Quality Auditconduct a data quality analysis, which consists o pro-

    ling the data and then producing a number o reports, both low level drill-

    down reports and high level scorecard reports. At this stage you would also

    set targets or required and achievable levels o data quality.

    2. Standardisebuild and apply standardisation rules based on what you

    ound in the proling stage. Tis includes things like parsing names or prod-

    uct codes into their constituent parts; enhancing the data where appropriate

    by determining the correct value or blank or incorrect elds (or example,

    inserting the correct country eld based on the city and/or other parts o the

    address eld); removing noise (extraneous data); and removing or replacing

    bad or inconsistent data. Tis is achieved through a combination o look-up

    (dictionaries) or routine-based techniques.

    3. Matchingthis is carried out ater standardisation as it usually generates

    better results i you match against a standardised data set. Tis is where you

    identiy duplicates and things like households or subsidiaries. Te product

    enables matching and reconciliation across databases. Tis uses a combina-

    tion o user dened rules (again these can be based on dictionaries) andmathematical matching algorithms.

    4. Consolidationthis enables users to manage and automate the data con-

    solidation process. In other words it lets you merge duplicate records based

    on keys (rather than necessarily overwriting datathough you can do this i

    you wish), create linkages between related records, append data rom reer-

    ence sources and, when appropriate, replace inaccurate data with reerence

    data.

    5. Audit again and trackthis is where the on-going nature o the process

    kicks in. Once the data has been standardised, matched and consolidated

    you would prole it again to see i you have come up to the targets you set

    at stage one. Te Data Quality scorecard is a ramework or continuous im-

    provement o Data Quality and company-wide intelligence, providing you

    with the acility to support any data quality programmes that you may have

    put in place.

    Similarity Systems, which was ounded in 2000, has established a high reputation

    in a short period o time, especially in Europe. Te sotware is based on Unicode

    so that it can be used world-wide and we particularly like its emphasis on the busi-

    ness user community rather than I (though it does that too since the purchase

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    22/26

    PowerCenter 8

    Bloor Research 2006Page 20

    o Evoke). While it is still early days (as we write it is less than a week since the

    announcement o Similaritys acquisition) we believe that this is a sound strategic

    move on the part o Inormatica that can only enhance its platorm.

    PowerExchangePowerExchange is available both as a stand-alone product and in conjunction

    with PowerCenter. Tere are basically two main points about PowerExchange.

    First, there are the types o data movement that are supported or these environ-

    ments, o which there are three options: batch, real-time and change data capture

    (CDC). While the rst two o these are sel-explanatory, the last may need some

    explanation. What it allows you to do is, once an initial bulk update (say to popu-

    late a new data warehouse) has been done then the change data capture option will

    allow trickle eeds o just the data that has changed. Tis unctionality is totally

    metadata-driven and, in the case o the relational databases that are supported,

    it should be noted that change data capture uses log-based capture rather than

    burdening the database with external triggers. In this respect, one o the notable

    eatures is support or pre-etching in Oracle environments.

    Secondly, there are the platorms it supports. Tese include not just the most

    popular current databases and WebSphere MQ (across all relevant operating sys-

    tems) but also a variety o legacy data sources including IMS, VSAM, Adabas,

    CA-Datacom, CA-IDMS, ICL IDMS-X, C-ISAM and fat le systems.

    Apart rom the eatures already mentioned, other capabilities worth noting in-

    clude multi-byte character support, parameterised SQL, and RSA encryption.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    23/26

    PowerCenter 8

    Bloor Research 2006 Page 2

    Summary

    At any time over the last decade you could have asked any vendor in Inormati-

    cas market about their biggest competitor and you would always have got the

    same response: hand coding. Despite the success o Inormatica and others themajority o data integration tasks continue to be hand coded. However, that is

    changing, rstly because legislation such as Sarbanes-Oxley is requiring acilities

    such as data lineage and secondly because companies are increasingly realising the

    importance o data quality. Both o these things are very dicult and complex

    to build into hand-coded solutions which, as an approach, is thereore losing its

    ormer popularity.

    At the same time there is an increasing demand by large enterprises to reduce the

    number o suppliers they deal with and where data movement projects might have

    previously been independent decisions, more and more companies are now man-

    dating a corporate integration (as opposed to mere movement or EL) solution.

    Further, the platorm that users require is expanding, most notably to include

    data ederation but also to incorporate support or unstructured and semi-struc-

    tured data and transormations.

    Tis means that the market or tools such as PowerCenter is expanding, both at

    the enterprise-level and or project-based developments in smaller organisations.

    Inormatica is well placed to capitalise on these trends.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    24/26

    Bloor Research Overview

    Bloor Research has spent the last decade developing what is recognised as Europes

    leading independent I research organisation. With its core research activities

    underpinning a range o services, rom research and consulting to events and pub-lishing, Bloor Research is committed to turning knowledge into client value across

    all o its products and engagements. Our objectives are:

    Save clients time by providing comparison and analysis that is clear and

    succinct.

    Update clients expertise, enabling them to have a clear understanding o I

    issues and acts and validate existing technology strategies.

    Bring an independent perspective, minimising the inherent risks o product

    selection and decision-making.

    Communicate our visionary perspective o the uture o I.

    Founded in 1989, Bloor Research is one o the worlds leading I research, analy-

    sis and consultancy organisationsdistributing research and analysis to I user

    and vendor organisations throughout the world via online subscriptions, tailored

    research services and consultancy projects.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    25/26

    Copyright & Disclaimer

    Tis document is subject to copyright. No part o this publication may be repro-

    duced by any method whatsoever without the prior consent o Bloor Research.

    Due to the nature o this material, numerous hardware and sotware products

    have been mentioned by name. In the majority, i not all, o the cases, these

    product names are claimed as trademarks by the companies that manuacture the

    products. It is not Bloor Researchs intent to claim these names or trademarks as

    our own.

    Whilst every care has been taken in the preparation o this document to ensure

    that the inormation is correct, the publishers cannot accept responsibility or any

    errors or omissions.

  • 8/14/2019 Bloor_Informatica_PowerCenter_8

    26/26

    uite 4, Town Hall, 86 Watling Street EastOWCESTER, Northamptonshire, NN12 6BS, United Kingdom