TDWI Best Practices Report | Next Generation Data Integration

35
NEXT GENERATION DATA INTEGRATION SECOND QUARTER 2011 By Philip Russom TDWI BEST PRACTICES REPORT tdwi.org TDWI RESEARCH

Transcript of TDWI Best Practices Report | Next Generation Data Integration

Page 1: TDWI Best Practices Report | Next Generation Data Integration

Next GeNeratioN Data iNteGratioN

Second Quarter 2011

By Philip Russom

TDWI besT pracTIces reporT

tdwi.org

TDWI rese a rch

Page 2: TDWI Best Practices Report | Next Generation Data Integration

Research Sponsors

DataFlux

IBM

Informatica

SAP

Syncsort

Talend

Page 3: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 1

© 2011 by TDWI (The Data Warehousing InstituteTM), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited

except by written permission. E-mail requests or feedback to [email protected]. Product and company names mentioned herein may be trademarks and/or

registered trademarks of their respective companies.

Ne x t GeNer at ioN Data iNteGr at ioN

second QUARTeR 2011TDWI besT pracTIces reporT

By Philip Russom

Table of ContentsResearch Methodology and Demographics 3 Introduction to Next Generation Data Integration 4

Ten Rules for Next Generation Data Integration . . . . . . . . . 4

Why Care About NGDI Now? . . . . . . . . . . . . . . . . . . 6

Leading Generational Changes for Data Integration 7

Expanding Into More DI Techniques . . . . . . . . . . . . . 7

Users’ Data Integration Tool Portfolios . . . . . . . . . . . 9

DI Tool and Platform Replacements . . . . . . . . . . . . . 10

Data Types Being Integrated . . . . . . . . . . . . . . . . 13

Data Integration Architecture . . . . . . . . . . . . . . . . 14

Organizational Issues for NGDI 17

Organizational Structures for DI Teams . . . . . . . . . . . . . 17

Unified Data Management . . . . . . . . . . . . . . . . . . . 20

Collaborative Data Integration . . . . . . . . . . . . . . . . .22

Catalog of NGDI Practices, Tools, and Platforms 23

Potential Growth versus Commitment for DI Options . . . . . . 24

Trends for Next Generation Data Integration Options . . . . . . 26

Vendor Products and Platforms for NGDI 28

Recommendations 30

Page 4: TDWI Best Practices Report | Next Generation Data Integration

2 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

About the AuthorPHILIP RUSSOM is a well-known figure in data warehousing and business intelligence, having published more than 500 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Today, he’s TDWI Research Director for Data Management at The Data Warehousing Institute (TDWI), where he oversees many of TDWI’s research-oriented publications, services, and events. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research, Giga Information Group, and Hurwitz Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected].

About TDWITDWI, a division of 1105 Media, Inc., is the premier provider of in-depth, high-quality education and research in the business intelligence and data warehousing industry. TDWI is dedicated to educating business and information technology professionals about the best practices, strategies, techniques, and tools required to successfully design, build, maintain, and enhance business intelligence and data warehousing solutions. TDWI also fosters the advancement of business intelligence and data warehousing research and contributes to knowledge transfer and the professional development of its Members. TDWI offers a worldwide Membership program, five major educational conferences, topical educational seminars, role-based training, onsite courses, certification, solution provider partnerships, an awards program for best practices, live Webinars, resourceful publications, an in-depth research program, and a comprehensive Web site: tdwi.org. 

About the TDWI Best Practices Reports SeriesThis series is designed to educate technical and business professionals about new business intelligence technologies, concepts, or approaches that address a significant problem or issue. Research for the reports is conducted via interviews with industry experts and leading-edge user companies and is supplemented by surveys of business intelligence professionals.

To support the program, TDWI seeks vendors that collectively wish to evangelize a new approach to solving business intelligence problems or an emerging technology discipline. By banding together, sponsors can validate a new market niche and educate organizations about alternative solutions to critical business intelligence issues. Please contact TDWI Research Director Philip Russom ([email protected]) to suggest a topic that meets these requirements.

AcknowledgmentsTDWI would like to thank many people who contributed to this report. First, we appreciate the many users who responded to our survey, especially those who responded to our requests for phone interviews. Second, our report sponsors, who diligently reviewed outlines, survey questions, and report drafts. Finally, we would like to recognize TDWI’s production team: Jennifer Agee, Rod Gosser, and Denelle Hanlon.

SponsorsDataFlux, IBM, Informatica, SAP, Syncsort, and Talend sponsored the research for this report.

Page 5: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 3

Research Methodology and Demographics

Position

Corporate IT professional 67%

Consultants 26%

Business sponsors/users 7%

Industry

Financial services 17%

Consulting/professional services 16%

Insurance 9%

Software/Internet 8%

Telecommunications 6%

Healthcare 5%

Manufacturing (non-computers) 5%

Retail/wholesale/distribution 4%

Government: federal 4%

Education 3%

Pharmaceuticals 3%

Media/entertainment/publishing 3%

Utilities 3%

Other 14%

Research Methodology and DemographicsReport Scope. Data integration (DI) has changed so quickly and completely in recent years that it scarcely resembles older definitions. For example, some people still think of DI as merely ETL for data warehousing or data movement utilities for database administration. Those basic tasks and use cases are still prominent in DI practice. Yet, DI practices and tools have broadened into many more techniques and use cases. While it’s good to have options, it’s hard to track them and determine in which situations they are ready for use. The purpose of this report is to accelerate users’ understanding of the many new products and options that have entered DI practices in recent years. It will also help readers map newly available technologies, products, and practices to real-world use cases.

Survey Methodology. In November 2010, TDWI sent an invitation via e-mail to the data management professionals in its database, asking them to complete an Internet-based survey. The invitation was also distributed via Web sites, newsletters, and publications from TDWI and other firms. The survey drew responses from almost 350 survey respondents. From these, we excluded incomplete responses and respondents who identified themselves as academics or vendor employees. The resulting completed responses of 323 respondents form the core data sample for this report.

Survey Demographics. The wide majority of survey respondents are corporate IT professionals (67%), whereas the remainder consists of consultants (26%) or business sponsors/users (7%). We asked consultants to fill out the survey with a recent client in mind.

The financial services (17%) and consulting (16%) industries dominate the respondent population, followed by insurance (9%), software (8%), telecommunications (6%), and other industries. Most survey respondents reside in the U.S. (51%) or Europe (25%). Respondents are fairly evenly distributed across all sizes of companies and other organizations.

Other Research Methods. In addition to the survey, TDWI Research conducted many telephone interviews with technical users, business sponsors, and recognized data management experts. TDWI also received product briefings from vendors that offer products and services related to the best practices under discussion.

(“Other” consists of multiple industries, each represented by 2% or less of respondents.)

Based on 323 survey respondents.

Geography

United States 51%

Europe 25%

Asia 8%

Australia 4%

Canada 4%

Africa 2%

Central or South America 2%

Middle East 1%

Other 3%

Company Size by Revenue

Less than $100 million 22%

$100–500 million 14%

$500 million–$1 billion 11%

$1–5 billion 16%

$5–10 billion 9%

More than $10 billion 18%

Don’t know 10%

Page 6: TDWI Best Practices Report | Next Generation Data Integration

4 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

Introduction to Next Generation Data IntegrationData integration (DI) has undergone an impressive evolution in recent years. Today, DI is a rich set of powerful techniques, including ETL (extract, transform, and load), data federation, replication, synchronization, changed data capture, data quality, master data management, natural language processing, business-to-business data exchange, and more. Furthermore, vendor products for DI have achieved maturity, users have grown their DI teams to epic proportions, competency centers regularly staff DI work, new best practices continue to arise (such as collaborative DI and agile DI), and DI as a discipline has earned its autonomy from related practices such as data warehousing and database administration.

To help user organizations understand and embrace all that next generation data integration (NGDI) now offers, this report catalogs and prioritizes the many new options for DI. This report literally redefines data integration, showing that its newest generation is an amalgam of old and new techniques, best practices, organizational approaches, and home-grown or vendor-built functionality. The report brings readers up to date by discussing relatively recent (and ongoing) evolutions of DI that make it more agile, architected, collaborative, operational, real-time, and scalable. It points to new platforms for DI tools (open source, cloud, SaaS, and unified data management) and DI’s growing coordination with related best practices in data management (especially data quality, metadata and master data management, data integration acceleration, data governance, and stewardship). The report also quantifies trends among DI users who are moving into a new generation, and it provides an overview of representative vendors’ DI tools.

The goal is to help users make informed decisions about which combinations of DI options match their business and technology requirements for the next generation. But the report also raises the bar on DI, under the assumption that a truly sophisticated and powerful DI solution will leverage DI’s modern best practices using up-to-date tools.

Ten Rules for Next Generation Data IntegrationData integration has evolved and grown so fast and furiously in the last 10 years that it has transcended ancient definitions. Getting a grip on a modern definition of DI is difficult, because

“data integration” has become an umbrella term and a broad concept that encompasses many things. To help you get that grip, the 10 rules for next generation data integration listed on the next page provide an inventory of techniques, team structures, tool types, methods, mindsets, and other DI solution characteristics that are desirable for a fully modern next generation DI solution. Note that the list is a summary that helps you see the new-found immensity of DI; the rest of the report will drill into the details of these rules.

Admittedly, the list of 10 rules is daunting because it’s thorough. Few organizations will need or want to embrace all of them; you should pick and choose according to your organization’s requirements and goals. Even so, the list both defines the new generation of data integration and sets the bar high for those pursuing it.1

1 For a similar list with more details, see the TDWI Checklist Report Top Ten Best Practices for Data Integration, available on tdwi.org.

All aspects of DI have improved significantly

of late

This report brings the reader up to date on

DI’s many changes

DI’s 10 rules define desirable traits of its

next generation

Page 7: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 5

Introduction

1 DI is a family of techniques Some data management professionals still think of DI as merely ETL tools for data warehousing or data replication utilities for database administration. Those use cases are still prominent, as we’ll see when we discuss TDWI survey data. Yet, DI practices and tools have broadened into a dozen or more techniques and use cases.

2 DI techniques may be hand coded, based on a vendor’s tool, or both TDWI survey data shows that migrating from hand coding to using a vendor DI tool is one of the strongest trends as organizations move into the next generation. A common best practice is to use a DI tool for most solutions, but augment it with hand coding for functions missing from the tool.

3 DI practices reach across both analytics and operations DI is not just for data warehousing (DW). Nor is it just for operational database administration (DBA). It now has many use cases spanning across many analytic and operational contexts, and expanding beyond DW and DBA work is one of the most prominent generational changes for DI.

4 DI is an autonomous discipline Nowadays, there’s so much DI work to be done that DI teams with 13 or more specialists are the norm; some teams have more than 100! The diversity of DI work has broadened, too. Due to this growth, a prominent generational decision is whether to staff and fund DI as is, or to set up an independent team or competency center for DI.

5 DI is absorbing other data management disciplines The obvious example is DI and data quality (DQ), which many users staff with one team and implement on one unified vendor platform. A generational decision is whether the same team and platform should also support master data management, replication, data sync, event processing, and data federation.

6 DI has become broadly collaborative The larger number of DI specialists requires local collaboration among DI team members, as well as global collaboration with other data management disciplines, including those mentioned in the previous rule, plus teams for message/service buses, database administration, and operational applications.

7 DI needs diverse development methodologies A number of pressures are driving generational changes in DI development strategies, including increased team size, operational versus analytic DI projects, greater interoperability with other data management technologies, and the need to produce solutions in a more lean and agile manner.

8 DI requires a wide range of interfaces That’s because DI can access a wide range of source and target IT systems in a variety of information delivery speeds and frequencies. This includes traditional interfaces (native database connectors, ODBC, JDBC, FTP, APIs, bulk loaders) and newer ones (Web services, SOA, and data services). The new ones are critical to next generation requirements for real time and services. Furthermore, as many organizations extend their DI infrastructure, DI interfaces need to access data on-premises, in public and private clouds, and at partner and customer sites.

9 DI must scale Architectures designed by users and servers built by vendors need to scale up and scale out to both burgeoning data volumes and increasingly complex processing, while still providing high performance at scale. With volume and complexity exploding, scalability is a critical success factor for future generations. Make it a top priority in your plans.

10 DI requires architecture It’s true that some DI tools impose an architecture (usually hub and spoke), but DI developers still need to take control and design the details. DI architecture is important because it strongly enables or inhibits other next generation requirements for scalability, real time, high availability, server interoperability, and data services.

DI encompasses many techniques that may be hand coded or tool based, either analytic or operational

Don’t do DI in a vacuum It needs coordination with many technical and business teams

Like any enterprise application, DI deserves architecture, which affects whether it can support next generation requirements

Page 8: TDWI Best Practices Report | Next Generation Data Integration

6 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

Why Care About NGDI Now?Businesses face change more often than ever before Recent history has seen businesses repeatedly adjusting to boom-and-bust economies, a recession, financial crises, shifts in global dynamics or competitive pressures, and a slow economic recovery. DI supports real-world applications and business goals, which are affected by economic issues. Periodically, you need to adjust DI solutions to align with technical and business goals for data.

The next generation is an opportunity to fix the failings of prior generations For example, most lack a recognizable architecture, whereas achieving next generation requirements—especially real time, data services, and high availability—requires a modern architecture. Older ETL solutions, in particular, are designed for serial processing, whereas they need to be redesigned for parallel processing to meet next generation performance requirements for massive data volumes.

Some DI solutions are in serious need of improvement or replacement For example, most DI solutions for business-to-business (B2B) data exchange are legacies, based on low-end techniques such as hand coding, flat files, and file transfer protocol (FTP). These demand a serious makeover—or rip and replace—if they’re to bring modern DI techniques into B2B data exchange. Similar makeovers are needed with older data warehouses, customer data hubs, and data sync solutions.

Even mature DI solutions have room to grow Successful DI solutions mature through multiple lifecycle stages. In many cases, NGDI focuses on the next phase of a carefully planned evolution.

For many, the next generation is about tapping more functions of DI tools they already have For example, most DI platforms have supported data federation for a few years now, yet only 30% of users have tapped this capability. Also to be tapped are newer capabilities for real time, micro-batch processing, changed data capture (CDC), messaging, and complex event processing (CEP).

Unstructured data is still an unexplored frontier for most DI solutions Many vendor DI platforms now support text analytics, text mining, and other forms of natural language processing. Handling non-structured and complex data types is a desirable generational milestone in text-laden industries such as insurance, healthcare, and federal government.

DI is on its way to becoming IT infrastructure For most organizations, this is a few generations away. But you need to think ahead to the day when data integration infrastructure is open and accessible to most of the enterprise the way that local area networks are today. Evolving DI into a shared infrastructure fosters business integration via shared data.

DI is a growing and evolving practice More organizations are doing more DI, yet staffing hasn’t kept pace with the growth. And DI is becoming more autonomous every day. You may need to rethink the headcount, skill sets, funding, management, ownership, and structure of DI teams.

The recession has changed business, so DI

needs to realign with new business goals for data

Most DI solutions are out-of-date or feature–poor, in some respect

Plan to evolve DI into shared enterprise

infrastructure

Many DI teams need a next generation

reorganization

Page 9: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 7

Leading Generational Changes

Leading Generational Changes for Data Integration

Expanding Into More DI TechniquesAs pointed out earlier, DI consists of multiple, related data management techniques. The number of techniques applied in a DI solution or used by a DI team can be an indicator of DI maturity. For example, many DI solutions begin with a focus on one technique, then add others as the solution moves through project phases or generations. Increasing the number of techniques is often paralleled by increases in team head count and DI tools in the software portfolio. Many teams are driven to adopt more techniques because they begin supporting a new user constituency, which demands new approaches to data integration (as when new performance management requirements demand data federation). Hence, the number of DI techniques and the priority each receives are milestones on the road to the next generation of a DI solution.

To quantify this situation, a survey question asked respondents which DI techniques they’re using today, and in what priority order (see Figure 1). Respondents selected techniques from a short list of the most common ones. (Later, we’ll see responses from a much longer list.)

ETL is the most common first priority Extract, transform, and load (ETL) is without doubt the preferred DI technique for business intelligence (BI) and data warehousing (DW). Given that a large portion of respondents are BI/DW professionals, it’s no surprise that ETL is in use at 95% of surveyed organizations. In fact, 75% identified it as their first priority among DI techniques.

ELT is the leading secondary priority As a variant of ETL, ELT also scored well with the survey audience. TDWI sees ELT as a gainer, its use being driven up by the increased processing power of recent DBMS releases, the arrival of new analytic DBMSs, increased use of in-database processing, lingering hand-coded traditions for SQL, and increased use of secondary ETL tools (especially open source tools, which support in-database transforms).

Replication and data synchronization are a significant, though tertiary, priority At 45% total, these fared well in the survey. For moving data with little or no transformation (for which ETL may be preferred), these kinds of tools are a good choice because of their low cost (relative to ETL), simplicity, changed data capture functions, minimal invasiveness, and their ability to run in real time or be event driven. Given their strengths, it seems odd that replication and synchronization aren’t used more in BI and DW contexts.

Application integration technologies often transport data for integration Judging by Figure 1, almost 40% of organizations surveyed are doing this today. This form of technology uses some type of bus to support messages, events, and services. Although not designed for DI, a bus can carry data in its messages and processing instructions via services. For organizations with a hefty bus implementation in place, this infrastructure is often open to and effective for some DI functions, especially those that must reach operational applications or operate in real time.

Data federation is finally ensconced as a DI technique Federation has been around for years in low-end forms such as distributed queries and materialized views. Modern tools, however, provide superior design and maintenance functions for federation (plus higher performance) that make it far more compelling as a feature you’d depend on. Federation is also more compelling as it becomes ever more virtual. These advances help explain why federation has recently become an ensconced DI technique (30% in Figure 1).

Although not popular in DW, replication and sync are big elsewhere

ETL is by far the top DI priority, seconded by its variant, ELT

Federation has a new presence, and event processing has just arrived

Page 10: TDWI Best Practices Report | Next Generation Data Integration

8 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

Event processing is a recent addition to the DI arsenal More than 20% of survey respondents have incorporated some form of event processing into their DI solutions, which is significant given the newness of this practice.

Which of the following DI techniques are you using in your DI solutions today? Click one or more of the following answers, in priority order from most used to least used

Figure 1. Based on 323 responses. Sorted by first priority.

USER STORY GENERaTIONaL ChaNGE CaN ENTaIL a DEEpER DIvE INTO a vENDOR’S TOOL.

“To enhance our ability to track data lineage, standardize load scripts, validate domains, and cleanse

our customer data, we purchased a vendor’s data integration platform. We have now replaced our old

hand-coded scripts with this platform,” said Rick Ellis, the enterprise data architect at Frost Bank. “Today,

the platform is up and running. We now need to enhance our knowledge of the integration platform’s

functionality to perform data analysis and integrate new data stores, as well as address the business’ next

generation of requirements.

“For example, we’ve made our first pass with a data quality solution, and this will continue to be a high

priority. Our grass-roots data stewardship program made a meaningful contribution to the quality

solution, and we have morphed stewardship into a broader data governance board to assist with other

data management disciplines. Upcoming priorities are to get beyond matching, de-duping, and name-

and-address cleansing and go into other quality functions. Before any changes are made, impact

analysis is essential.

“In the longer term, our ETL team will assist with database migrations, consolidations, and upgrades to help

to keep the data clean. Plus, they will probably inherit business-to-business data exchange with partnering

financial services companies. The vendor platform we acquired has functions for these, which should help

as we grow beyond data warehousing into operational data integration.”

Extract, transform, and load (ETL)

Data federation or virtualization

Extract, load, and transform (ELT)

Messaging or application integration

Replication or data synchronization

Event processing

First Second Third Fourth Fifth Sixth

Priority Order

Percentage of Respondents

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Page 11: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 9

Leading Generational Changes

Users’ Data Integration Tool PortfoliosThere are different ways to characterize a user’s software portfolio. For DI tools, it’s interesting to assess portfolios by the number of tools and the number of tool providers. This is what the survey question in Figure 2 quantifies. A few generational trends are suggested by comparing results for

“today” and “would prefer”:

Users would prefer to simplify their portfolios If user preferences pan out, fewer will acquire DI tools from multiple vendors. According to the survey data, the number of user organizations using multiple DI tools from multiple vendors will drop from 44% to 25%. Part of this is the “one throat to choke” issue concerning support and maintenance. Related reasons may include the ongoing trends toward tool standardization and focusing on preferred suppliers for the sake of bulk discounts and other preferential treatment.

Users want to reduce the amount of hand coding Only 18% of respondents report depending mostly on hand coding for DI. This seems low compared to other surveys TDWI has run. With this survey population, hand coding will drop down to a miniscule 1%. Migrating from hand coding to tool use as the primary development medium is, indeed, a prominent generational change for DI.

Users are very interested in integrated suites of tools Only 9% report using one today, yet 42% of respondents would prefer one. Integrated suites are available today from a few software vendors. This kind of suite typically has a strong DI and/or DQ tool at its heart, with additional tools for master data management, metadata management, stewardship, governance, changed data capture, replication, event processing, data services development, data profiling, data monitoring, and so on. As you can see, the list can be quite long, amounting to an impressive arsenal of related data management tools and tool features. As more user organizations coordinate diverse data management teams and their solutions, it makes sense for the consolidated team to use a single platform for easier collaboration. Coordinated teams of this sort generally want to share meta and master data, profiles, development templates, and other development artifacts. Thus, one of the noticeable generational trends in DI is the movement toward the use of integrated suites.

Which of the following best describes your organization’s portfolio of DI tools today? For your organization’s next generation DI implementation, how would you prefer that the DI portfolio be?

TODAy WOULD PREFER

Using multiple DI tools from multiple vendors 44% 25%

Using just one DI tool 22% 24%

Mostly hand coded without much use of vendor DI tools 18% 1%

Using a DI tool that’s part of an integrated suite of data management tools from one vendor

9% 42%

Using multiple DI tools from one vendor 3% 6%

Other 4% 2%

Figure 2. Based on 323 respondents. Sorted by “today.”

DI tools and platforms from vendors tend to be feature-rich, especially when a single product supports multiple DI techniques. DI tools are like all enterprise software: Users employ the functionality they need and ignore the rest—at least for the time being. Eventually, business and

Many users desiremore tools, but from fewer vendors

Approximately 60% of DI tool functions are untouched today

Page 12: TDWI Best Practices Report | Next Generation Data Integration

10 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

technology requirements or resources change, and the DI team starts to employ functions they’ve previously ignored. For example, many users stick to core ETL functions for years before expanding their usage into functions that are tangential to ETL, such as changed data capture, services, and interoperability with buses. With the integrated data management suites discussed earlier in this report, users typically start with a particular tool type—usually for data integration or data quality—and later start using other tools built into the suite.

TDWI suspects that users have tapped a relatively small percentage of their DI tools’ functions. To test this, a survey question asked: “What approximate percentage of your primary DI tool’s functions are you using?” The question demanded responses for today and for three years from now. See Figure 3.

Survey responses show that, indeed, the percentage is rather low today, but will increase substantially in three years. For example, on the area graph, you can see that the largest concentration of users is employing between 30% and 50% of their DI tool’s functions today. In other words, the average DI shop is only using roughly 40% of tool functions, leaving the other 60% untouched. However, in three years, the largest concentration will be employing 50% to 80% of functions, for an average of approximately 65%.

What approximate percentage of your primary DI tool’s functions are you using?

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percentage of Tool Functions Used

Figure 3. Based on 323 respondents.

DI Tool and Platform ReplacementsOne of the most extreme generational changes you can make for your DI solution is to rip out and replace its underlying tool or platform. As discussed later, the top reason for a replacement is the need for a unified platform that supports multiple tool types, including business-oriented functions (stewardship, exception processing). Other leading reasons are to get a DI platform that supports scalability and/or real-time functionality better than the current one does.

Those sound reasonable. But how many users really need to replace their DI platforms now? According to the survey, the answer is that relatively few user organizations are considering such an extreme change. (See Figure 4.) One-third of respondents are planning a platform replacement in 2011 (19%)

Percentage of respondents—today Percentage of respondents—in three years

Most organizations are content with their current

DI tool or platform

Page 13: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 11

Leading Generational Changes

or 2012 (14%). Yet, a whopping 62% report they have no plans to replace their DI platform. The conclusion is that most DI users are content with their current DI platform and tool portfolio.

When do you plan to replace your current primary data integration platform?

No plans to replace DI platform 62%

2011 19%

2012 14%

2013 2%

2014 1%

2015 1%

2016 0%

2017 or later 1%

Figure 4. Based on 323 respondents.

It’s apparent that most users are satisfied with their current DI platform and see no need to replace it. Even so, it’s interesting to hear what kinds of problems would be so onerous as to drive a user to rip and replace. The question expressed in Figure 5 speaks to the heart of this matter by asking:

“What problems will eventually drive you to replace your current primary data integration platform?” Responses reveal a few generational trends:

Again, users are interested in integrated DI suites At the top of the survey results in Figure 5, the multiple-choice answer most often selected is: “We need a unified platform that supports DI, plus data quality, governance, MDM, etc.” (40%). We also noted this interest in in Figure 2. Here, respondents are going a step further to say that the demand for an integrated suite or platform would be so strong as to drive them to a platform replacement. Note that this is a dramatic generational shift, given that multi-vendor best-of-breed approaches to data management software portfolio management have been the norm for many years.

There’s a growing need for DI tool functions that business people can use Note that 19% of respondents selected: “We need a platform with tools for some business users.” The growing inclusion of business people in the DI user community is a noteworthy trend. Some vendors are responding to this demand by supplying new, easy-to-use functionality for business-oriented tasks, such as stewardship, exceptions processing, and collaboration with a multi-functional team. This is yet another generational decision that planners of DI must consider.

Scalability is naturally a concern for DI Scalability problems can manifest themselves in different ways, such as the cost of scaling up (37%) and inadequate data processing speed (35%). With any IT system, frustrations over scalability can lead to a change of platform, and DI platforms are especially susceptible, due to increases in data volumes and processing complexity.

Real-time and related capabilities are enabled or inhibited by a DI platform A substantial 33% of respondents fear their DI platform may be poorly suited to real-time or on-demand workloads. They’re also concerned that the platform may suffer inadequate support for Web services and SOA (30%) or inadequate high availability (20%). These are all related, because users need services for real-time interfaces, and the interfaces aren’t real-time if the DI platform isn’t highly available. For

User fascination with single-vendor, integrated DI platforms recurs in survey questions

Scalability and real time are DI’s most pressing requirements

Page 14: TDWI Best Practices Report | Next Generation Data Integration

12 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

many organizations, accelerating DI functions into real time is just as pressing a generational goal as scaling to massive data volumes.

Legacy platforms or platform components can be a problem A DI tool, like any IT system, can reach the end of its useful lifecycle. Apparently, a few survey respondents are at that stage, because they report that their “current platform is a legacy we must phase out” (18%). Legacy and related upgrade issues are also seen in responses to survey answers such as “current platform is 32-bit, and we need 64-bit” (12%) and “current platform is SMP, and we need MPP” (5%). Note that these upgrades are on the critical path to achieving generational goals in DI platform performance.

What problems will eventually drive you to replace your current primary data integration platform? (Select nine or fewer )

We need a unified platform that supports DI, plus data quality, governance, MDM, etc.

40%

Cost of scaling up is too expensive 37%

Inadequate data processing speed 35%

Poorly suited to real-time or on demand workloads 33%

Inadequate support for Web services and SOA 30%

Inadequate high availability 20%

We need a platform with tools for some business users 19%

Current platform is a legacy we must phase out 18%

Can’t secure the data properly 18%

We need a platform better suited to cloud or virtualization 15%

Inadequate support for in-memory processing 14%

Current platform is 32-bit, and we need 64-bit 12%

Current vendor has questionable practices or viability 8%

Current platform is SMP, and we need MPP 5%

Other 5%

Figure 5. Based on 1,100 responses from 323 respondents (3.4 responses per respondent on average).

USER STORY a pRIvaTE CLOUD IS a LIkELY NExT GENERaTION pLaTfORM fOR DI.

“Our data integration server runs in a shared server environment, which uses a popular operating

system for server resource virtualization,” said the lead data integration specialist at an insurance

company. “My team was concerned when we moved to the private cloud provided by IT, because we’re

used to owning the servers, plus having one each for data integration, reporting and analysis, and the

data warehouse. Not all software servers cohabitate and perform well under virtualized services, you

know. But the data integration server I’m using does really well. IT recently upgraded the server farm

controlled by virtualization, as part of our migration from legacy UNIx systems to LINUx. With greater

server bandwidth, I’m now able to set up larger virtual machines for ETL jobs and other routines. Data

warehouse loads that used to take 20 hours now complete in about two.”

Page 15: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 13

Leading Generational Changes

Data Types Being IntegratedThe majority of data handled via DI tools and platforms today falls under the rubric structured data. This is primarily about the tables and other data structures of relational databases. But other sources yield predictable structures, such as the record formats of most applications and the character-delimited rows of many flat files. In our survey, a whopping 99% of respondents report handling structured data today, and 78% will continue to do so in three years. See Figure 6.

The hegemony of structured data types has been the norm in DI for decades, and that’s old news. The latest news is that DI solutions have begun handling a wider range of data types. In particular, 84% of respondents report handling some form of complex data today (hierarchical or legacy sources) with their DI tools. Almost as many respondents anticipate handling complex data in three years. Similarly, 62% reporting handling semi-structured data today (XML and similar standards), and this should grow to 87% in three years.2

Three data types are poised for explosive growth, namely event data (messages, usually in real time), spatial data (longitude and latitude coordinates, GPS output), and unstructured data (mostly text expressing human language). All three will go from limited use today to over 90% use in three years. These and other non-structured data types are driven up by increased use of industry standards (SWIFT, ACORD, HL7), smart devices (smart meters, RFID), digital content (images, video), social media (Twitter, Facebook), and many types of Web applications.

Once again, the survey data of this report shows that more people than anticipated are handling events and their data through DI platforms. While that’s surprising, it’s not surprising that spatial data is on the rise. For years, TDWI has noted its Members adding tables and other structures to their data warehouses to fulfill new requirements for location data in support of asset management, actuarial assessments, delivery address augmentation, and consumer demographics. In fact, it’s a bit surprising that the handling of unstructured text is so low at present; TDWI has interviewed many of its Members who apply text mining or text analytic capabilities (whether built into their DI tool or supplied via a separate tool) to convert facts (discovered in textual documents) into structured data (typically a record or table row per discovered fact). For example, insurance companies regularly extract facts from text gathered in the claims process, then use that data to extend their analytic data sets for risk management and fraud detection.

For the types of data on the following list, which are you integrating today through your primary data integration implementation? Which do you anticipate using in three years or so?

Structured data (tables, records) 99%

78%

Complex data (hierarchical or legacy sources) 84%79%

Semi-structured data (xML and similar standards) 62%87%

Event data (messages, usually in real time) 43%93%

Spatial data (long/lat coordinates, GPS output) 29%95%

Unstructured data (human language, audio, video) 21%95%

Figure 6. Based on varying numbers of responses from 323 respondents. Sorted by “using now.”

2 For more information about how various data types are handled via data integration, see the TDWI Monograph Complex Data: A New Challenge for Data Integration.

Using nowUsing in 3 years

Structured data is still the bread and butter of DI, but other data types are catching up

Event, spatial, and textual data types are experiencing greater demand

Page 16: TDWI Best Practices Report | Next Generation Data Integration

14 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

USER STORY aDDRESSING COMpLEx DaTa ON ITS OwN TERMS CaN bE GENERaTIONaL.

“Traditionally, our enterprise data warehouse—or EDW—housed mostly source data for highly detailed

reports. In terms of ETL, that means a lot of E and L, but little T,” said an enterprise data architect at a

manufacturing company. “As I work on our next generation of data integration, I’m focused on integrating

core data, not just collecting it, as in the past. I’m developing numerous transformations that will yield

aggregated, enterprisewide views of data, instead of the concatenated data marts we have now. The

data product of my work goes into an enterprise data model our group has recently designed, in close

collaboration with a wide range of other technical and business people. It’s still in review, but we feel

confident that the logical model is an accurate view of how the business needs to be represented.

“To ensure that I populate the enterprise data model from the most appropriate data sources, master

data has become a priority. Our primary data domain is products, and there are many definitions of

product here. We believe we can reduce all these to a single, master definition. But it will be complex and

hierarchical, so we’re investigating an xML-based representation of product data. The catch is that few

data modeling tools support complex data types, like xML. Plus, we’ll have to move xML hierarchies into

and out of our EDW, which is cast in third normal form. These challenges are worth overcoming, because

we really need to handle complex data like xML, if we’re to design a master hierarchy that accurately

represents the relations among products, parts, subassemblies, and bills of material.”

Data Integration ArchitectureTo many people, the term data integration architecture sounds like an oxymoron. That’s because they don’t think that data integration has its own architecture. For example, a few data warehouse professionals still cling to the practices of the 1990s, when data integration was subsumed into the larger data warehouse architecture. Today, many data integration specialists still build one independent interface at a time—a poor practice that is inherently anti-architectural. A common misconception is that using a vendor product for data integration automatically assures architecture.

Here’s the problem: If you don’t fully embrace the existence of data integration architecture, you can’t address how architecture affects data integration’s scalability, high availability, staffing, cost, and ability to support real-time operation, master data management, SOA, and interoperability with related integration and quality tools. All of these are worth addressing.3

To get a sense of generational trends in DI architecture, the survey asked which architectural types respondents are using today, in priority order. The survey also asked what they’d prefer. See Figure 7 (page 16).

No consistent architecture This is risky for any DI solution and the businesses that count on it. Without an architecture, there are few or no data standards, preferred interfaces, coding guidelines, or any other form of consistency. In turn, their absence works against reuse and performance tuning. Though 27% of respondents today lack a DI architecture, only 3% anticipate still being in this undesirable position in the future.

3 For a detailed discussion of DI architectures, see TDWI’s What Works in Data Integration (Volume 25) feature article, “Data Integration Architecture: What It Does, Where It’s Going, and Why You Should Care.”

DI demands architecture, as any

application type would

Page 17: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 15

Leading Generational Changes

Collections of point-to-point interfaces Most point-to-point (P2P) interfaces are designed and built in a vacuum, without reference to standards. Most are hand coded. The colloquial name for this is

“spaghetti coding.” Of course, you realize that P2P is not really an “architecture”—spaghetti is the antithesis of architecture! This is the last thing you want to inherent from other developers, because it’s nearly impossible to see relations among interfaces, much less the big picture of the DI solution. Lamentably, at 53%, P2P is the most common DI architecture today, and it’s the approach with the (current) highest first priority. Luckily, users surveyed anticipate cutting their dependence on P2P in half in the near future.

Hub-and-spoke architecture This has become the preferred architecture for most integration technologies today, including the form of data integration known as extract, transform, and load (ETL). (Variations of ETL—such as TEL and ELT—may or may not have a recognizable hub.) However, this is not true of ETL alone; for example, hubs are common in deployments of data federation. Replication usually entails direct interfaces between databases, without a hub, but high-end replication tools support a control server or other device that acts as a hub. Data staging areas and operational data stores (ODSs) often serve as hubs, which are then critical for customer data integration and MDM. Enterprise application integration (EAI) tools and their buses depend on message queue management, and the queue is usually supported by a central integration server (i.e., a hub) through which messages are managed. Hub-and-spoke rated well in our survey, and users surveyed anticipate applying this architecture in the future.

Hub-and-spoke is popular, but that’s no reason to be doctrinaire about its application. At some point, most architectures evolve into some form of hybrid. Many successful DI implementations are mostly hub-and-spoke, but with a little bit of spaghetti thrown in. A common best practice in DI is to replace a spoke with a point-to-point interface when the spoke doesn’t scale or perform. Sometimes performance and scalability take precedence over architecture.

Data service architecture Data integration architecture is heading out on the leading edge by incorporating service-oriented architecture (SOA). Note that SOA won’t replace current hub-based architectures for data integration. Hubs will remain, but be extended by services. The goal is to provide the ultimate spoke, namely the data integration service or simply data service. According to the survey, this type of DI architecture is set to grow the most, nearly doubling from 41% of respondents using it today to 73% in users’ next generation DI solutions.

Buses for messages, events, and services Similar to services, the use of buses with DI solutions is set to grow significantly (from 23% to 56%). Note that services and buses are related. Most services (regardless of type) are transported over a bus, as are responses to services. In addition, recall that survey questions discussed earlier in this report show that event processing is a new but growing technique for DI. It, too, may depend on an enterprise bus for event delivery and reaction. As the need for data services and event processing grow for next generation data integration solutions, so will the need for DI tools and platforms to access enterprise buses.

Hand-coded spaghetti is not an architecture

Services and buses will reinvigorate DI architecture

Page 18: TDWI Best Practices Report | Next Generation Data Integration

16 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

Which of the following approaches to DI architecture are you using in your data integration infrastructure today? (Click one or more of the following answers, in priority order from used most to used least ) For your next generation data integration infrastructure, which DI architectures would you prefer to be using, in priority order?

Figure 7. Based on 323 respondents. Sorted by “today” and first priority. Note that values for fifth and sixth priorities (all 1% or 0%) are omitted to simplify the chart.

Collection of point-to-point

interfaces

Types of DI Architectures

TOD

AY

NEX

T G

EN

TOD

AY

NEX

T G

EN

TOD

AY

NEX

T G

EN

TOD

AY

NEX

T G

EN

TOD

AY

NEX

T G

EN

Hub-and-spoke

architecture

No consistent

architecture

Data service architecture

Bus for messages, events, and

services

First

32%

16%

3%

5%

7%

6%

12%

27%

12%

6%

1%

11%

15%

28%

2%4%

21%

3%

16%

15%

8%

40%

24%

8%

1%

10%

3%

8%

2%

15%

24%

15%

2%

Second

Third

Fourth

KEY:Priority Order

2%

2%

2%

70%

60%

50%

40%

30%

20%

10%

0%

Page 19: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 174 The total annual compensation of the average DI specialist finally broke $100,000 in 2010. For details, see the 2011 TDWI Salary, Roles, and Responsibilities Report (available to TDWI Members on tdwi.org).

Organizational Issues

USER STORY ON ThE LEaDING EDGE: DaTa INTEGRaTION aS INfRaSTRUCTURE.

“When I came into my current position, I immediately saw a need for data integration as a shared

enterprise infrastructure. It would be analogous to a local area network that’s accessible to just about

anyone, with ample bandwidth for everyone,” said the lead data integration specialist at a pharmaceutical

company. “A generous site license from a leading data integration vendor was key to making this feasible.

Today, use of the platform is free to any group, without much review of their purposes. Due to the large

size of the company and the honest need for data integration, we’ve spawned over 400 implementations,

supported by over 400 data integration developers worldwide.

“The site license isn’t cheap, but the business feels it’s worth the expense. Pharma companies tend to

suffer dozens of siloed business units, each focused on a different pharma product. Data integration as

a shared enterprise infrastructure has greatly accelerated the sharing of data across these units, which

results in desirable knowledge transfers and more accurate reporting across the entire enterprise.”

Organizational Issues for NGDI

Organizational Structures for DI TeamsCorporations and other user organizations have hired more in-house data integration specialists in response to an increase in the amount of data warehousing work and operational data integration work outside of warehousing. In the “old days,” an organization had one or maybe two data integration specialists in house (if any), whereas a dozen or more are common today.

To quantify the size of DI teams today, the report survey asked: “How many full-time data integration specialists work in your organization?” See Figure 8. The survey required respondents to type an integer between zero and 99. A simple average of the entries tallies to 16.4 DI specialists per organization, on average. Admittedly, this number is a bit skewed, because a few respondents reported having zero (7%) or 99 (5%). Treating these as outliers and omitting them brings the average down to 13.1. Either way, these numbers indicate rather sizable DI teams.

To give this growth a context, let’s compare surveys. In a TDWI Technology Survey from May 2007, one-quarter of surveyed organizations reported having five or more DI specialists. In this report’s survey, roughly half of respondents fit that bill. By that standard, the number of DI specialists has doubled in the last four years. As another data point, the number of DI specialists filling out the TDWI Salary Survey has almost doubled in the same time frame—and their salaries have increased substantially!4

The average number of DI specialists per organization is in the range of 13 1 to 16 4

Page 20: TDWI Best Practices Report | Next Generation Data Integration

18 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

How many full-time data integration specialists work in your organization?

Figure 8. Based on 323 responses.

As the number of DI specialists grows and the breadth of their work expands over analytic and operational tasks, organizations are driven to reevaluate how and where they manage DI specialists and their work. Today, a number of team structures organize the work of data integration specialists, as seen in Figure 9:

BI/DW team In many organizations, the bulk of DI still centers on data warehousing (DW) and business intelligence (BI), so it makes sense to manage DI work through the BI/DW team (59%).

Data architecture and administration DI doesn’t just originate from BI/DW teams. Another common starting point is the database administration group (DBA, 15%). A common reorganization nowadays is for the DBA group to be subsumed into an enterprise data architecture group (EDA, 24%). This makes sense, because a lot of information lifecycle management work that EDA groups initiate involves operational DI to migrate, consolidate, sync, and upgrade operational databases.

DI managed by IT One of the newer trends in DI is to treat DI platforms and solutions like shared infrastructure, akin to how networks and storage are managed centrally and openly made accessible to many enterprise organizations and their IT systems. In these cases, central IT management (25%) or the CIO’s office (12%) manages DI specialists and their work.

DI-specific teams For many firms, a next generation priority is to find an appropriate home for DI specialists, as well as their tools, platforms, and solutions. A conclusion more and more organizations are coming to is that there should be an independent data integration team as a standalone unit (23%). The standalone unit often takes the shape of a data integration competency center (17%), although DI may also be folded into other forms of competency centers that are not exclusive to DI (12%). Among TDWI Members, the BI competency center is a common example. In all these cases, the competency center (sometimes called a center of excellence) provides shared human resources (namely, DI specialists) who can be allocated by the center’s manager to DI work as it arises, whether it’s analytic, operational, or both.

Recent years have seen the birth of the

DI-specific team, often in a competency center

Percentage of Respondents

Zero 1 to 5 6 to 10 11 to 15 16 to 25 26 to 50 99+

51 to 98Number of Full-Time DI Specialists

7% 39% 19% 8% 9% 10% 2% 5%

Most DI specialists workin a BI/DW team The

rest are strewn acrossother enterprise teams

Page 21: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 19

Data stewardship and governance Stewardship and governance are, themselves, evolving into a new generation. Both originated as advisory boards, where committee members identify data quality or data compliance problems and opportunities, then recommend that data experts in other groups take action to correct or leverage them. In the next generation, expect to see more data management professionals—especially DI and DQ specialists—reporting to a data governance board (5%) or data stewardship program (5%), so they can do the technical work that the board identifies as a priority.

Where you work, what kind of organizational structure coordinates the work of most data integration specialists? Select all that apply

Data warehouse or business intelligence team 59%

Central IT management 25%

Enterprise data architecture group 24%

Data integration team—as a standalone unit 23%

Data integration competency center 17%

Database administration group 15%

CIO’s office 12%

Competency center—not exclusive to DI 12%

Data governance board 5%

Data stewardship program 5%

Other 5%

Figure 9. Based on 652 responses from 323 respondents (2 responses per respondent on average).

USER STORY COMpETENCY CENTERS aND OThER CENTRaL TEaMS OffER aDvaNTaGES.

“My employer is a large, multi-billion-dollar company that has grown mostly through mergers and

acquisitions,” said Ron Woodyard, the primary integration manager at Cardinal Health. “This helps explain

why we have so many data integration specialists and so much work to do. To handle it, we’ve brought close

to 250 employees into our Integration Services Center (ISC). Around 140 members of the ISC constitute the

pure integration team, while the other folks work on MDM, content management, EDI services, and so on. I

run the ISC like a business, based on shared human resources and technology services.

“Centralizing data integration and similar work in the ISC has its advantages. Having most of the eggs in

one basket makes it easier to align our work with the firm’s information agenda. With all data integration

processing flowing through one center, planning capacity is more accurate, as opposed to quantifying

many tools in many business units on many platforms. Having development standards and enforcing

them through a code review process is a lot smoother. We can now source data once, then distribute it

multiple times. And the ISC has saved money by replacing hand coding with vendor tools as the primary

development method.”

Organizational Issues

Page 22: TDWI Best Practices Report | Next Generation Data Integration

20 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

Unified Data ManagementIn most organizations today, data and other information are managed in isolated silos by independent teams using various data management tools for data quality, data integration, data governance and stewardship, metadata and master data management, B2B data exchange, database administration and architecture, information lifecycle management, and so on. In response to this situation, some organizations are adopting what TDWI calls unified data management (UDM), a practice that holistically coordinates teams and integrates tools.

TDWI Research defines unified data management (UDM) as a best practice for coordinating diverse data management disciplines, so that data is managed according to enterprisewide goals that promote technical efficiencies and support strategic, data-oriented business goals.

The “big picture” that results from bringing diverse data disciplines together through UDM yields several benefits, such as cross-system data standards, cross-tool architectures, cross-team design and development synergies, leveraging data as an organizational asset, and assuring data’s integrity and lineage as it travels across multiple organizations and technology platforms. However, the ultimate goal of UDM is to achieve strategic, data-driven business objectives, such as fully informed operational excellence and business intelligence, plus related goals in governance, compliance, business transformation, and business integration.5

Data integration is but one of the many data management disciplines that may be coordinated via UDM and similar organizational practices. Yet, the need for UDM affects DI, in that DI specialists and their managers must revisit when and how certain DI work should be coordinated with related work by other data management teams. The priority and importance of such collaboration by DI specialists varies from one data management team to the next. These priorities are sorted in Figure 10.

BI and DW DI specialists have their priorities straight. Coordinating with BI/DW teams is both the greatest first priority and the greatest second priority. As pointed out earlier, TDWI’s survey populations tend to have a strong representation of DW and BI professionals. Even if we pare back the survey results to compensate for the survey population, the DI specialist’s commitment to BI/DW coordination is still clear.

Application integration and SOA As we’ve seen in other data points of this report, DI specialists are continuing the trend of integrating some data (usually time sensitive) over application buses. In another trend, they’re embracing data services and the concept of data virtualization. Both trends require more coordination between the DI specialist and application integration teams. These trends have progressed to the point that this coordination is now a high priority.

Data architecture and modeling There’s a long-standing tradition in which DI specialists get a lot (if not all) of the requirements they need to design and build a solution from a data architect or modeler. This is the case for most DI specialists working in a traditional DW team. More and more DI specialists work on database architecture and administration teams, where they get much of their direction from an enterprise data architect or similar team leader. (In some organizations, the data architect is called a data analyst.) As the next generation takes DI specialists off to independent teams, this coordination will most likely continue, but without the DI person reporting directly to an architect.

A good enterprise data management

strategy will demand coordination among

data disciplines

5 For a detailed discussion of unified data management (UDM), see the TDWI Best Practices Report Unified Data Management: A Collaboration of Data Disciplines and Business Strategies.

Page 23: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 21

Governance, stewardship, and quality In one trend, DI specialists are getting involved as committee members for data governance and stewardship. In another trend, DI specialists coordinate their efforts ever deeper with DQ specialists. Put these together, and you can expect increased coordination between DI specialists and teams or boards for governance, stewardship, and quality.

Meta and master data According to TDWI surveys, most implementations of master data management (MDM) are home-grown, built atop a data integration tool (usually in the ETL style). All data management professionals have to do a fair amount of metadata management in the due course of their work.

Secondary, supporting data management disciplines Data integration is a primary data management discipline in that it generates a deliverable, similar to other primary disciplines such as data quality and MDM. DI and the other primary disciplines demand a fair amount of coordination with secondary, supporting disciplines such as metadata management and data profiling.

Data archiving The use of DI tools in data archiving has come out of nowhere in recent years to become a sizable presence. That’s because enterprises are struggling to manage the giant volumes of data they’ve amassed. To reduce the burden of less valuable or older data on primary storage systems, they’re aggressively moving data into archives. Doing that with efficiency and sophistication requires DI tools and techniques. Suddenly, data archiving is part of the DI workload, and will increase in the next generation.

With which other data management practices or teams do you coordinate DI work?

Figure 10. Based on 323 respondents. Sorted by first priority. Note that seventh through thirteenth priorities (all 2% or less) are omitted to simplify the chart.

Business intelligence and data warehousing

Application integration and SOA

Enterprise data architecture

Data modeling

Data governance

Data quality

Master data management

Data archivingMetadata management

Data stewardship

Content management, including text analytics

Inter-enterprise (or B2B) data exchange

Data pro�ling

First Second Third Fourth Fifth Sixth

0% 10% 20% 30% 40% 50% 60% 70% 80%

Percentage of Respondents

Priority Order

Besides risk and compliance, good data governance also provides a medium for coordinating data management work

Organizational Issues

Page 24: TDWI Best Practices Report | Next Generation Data Integration

22 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

USER STORY SELECTING a pLaTfORM IS a kEY GENERaTIONaL DECISION.

“I spearhead my firm’s data management initiative, which involves the coordination of teams and solutions

for data integration, data quality, MDM, warehouse design, and business intelligence,” said James

Brousseau, the enterprise data architect at SonoSite, Inc. “Early on, we decided that coordinating this

many tool types and disciplines would be easier and yield more sustainable results if we standardized on a

platform that supports as many of these disciplines as possible. We also knew that data quality and data

integration would be immediate needs. So we acquired a vendor platform that excels in both quality and

integration, plus has other tools. To be sure we’d get the enterprisewide coordination we need, we made

the platform an enterprise resource, owned and maintained by central IT, but accessible to various teams

on an as-needed basis.

“With this foundation successfully deployed, we can now focus on next generation goals. Most of these

revolve around transforming our data warehouse. Today, it’s mostly an operational data store for

ERP reporting. We’ll keep that, plus evolve the warehouse into an enterprise-scope view of corporate

performance that’s more appropriate to business intelligence. After that, the next priority will be to develop

a gold copy of addresses and other customer data.”

Collaborative Data IntegrationThe need for collaboration around data integration has increased recently. On the technology side, data integration specialists are growing in number, data integration work is increasingly dispersed geographically, and data integration is more tightly coordinated with other data management practices (especially data quality and MDM). On the business side, business people have long taken an interest in data integration related to business intelligence and mergers, but they now need direct involvement due to new requirements for compliance and governance.

TDWI Research defines collaborative data integration as a collection of user best practices and tool functions that foster collaboration among the increasing number of technical and business people who are involved in data integration projects and initiatives.6

The leading business benefits of collaborative data integration are that it supports governance and gives business people self-service visibility into the details and progress of data integration projects. Technology benefits include more efficient and effective collaboration between the business and IT, the reuse of development objects, and more options for IT management to manage geographically dispersed teams.

Despite its benefits, there are barriers to collaborative data integration:

DI in terms business people can understand Business and technical people speak different languages, according to 60% of respondents in Figure 11. The problem is exacerbated because most DI tool implementations today lack a business-friendly view of data (52%). To alleviate this problem, some organizations create a semantic layer or data virtualization layer with a DI tool, using its metadata management, data services, and related capabilities.

DI tools for business people According to survey respondents, their current tools lack functions for business people to use (41%). As explained earlier, a number of vendor tools now include business-oriented functions for data governance, stewardship, exception processing, business views of data, requirements and specifications, and annotations for metadata and data profiles.

Collaborative tool features A number of respondents complained that their current tools lack adequate version control (20%). DI tools need the kind of collaborative functions that have been

Collaboration reaches within newly expanded DI teams, plus across to related teams and

business management

The point of new DI tool functions for business

users is to let them collaborate over DI

6 For in-depth discussions of collaborative DI, see the two TDWI Monographs Collaborative Data Integration: Coordinating Efforts within Teams and Beyond and Second-Generation Collaborative Data Integration.

Page 25: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 23

common in application development tools for years. For example, check in/out and versioning for routines, data flows, and other DI development artifacts are absolute requirements. Optional features include project management, project progress reports, object annotations, and discussion threads. Most collaborative functions should be accessible via a browser, so a wide range of people (regardless of location) can collaborate.

What are some barriers to collaboration for DI in your organization? Select all that apply Collaboration is not an issue for us.

(If you check this, do not check other answers.)17%

Business and technical people speak different languages 60%

Lack of a business-friendly view of data 52%

Our current tools lack functions for business people to use 41%

Our current tools lack adequate version control 20%

Other 5%

Figure 11. Based on 632 responses from 323 respondents (2 responses per respondent on average).

Catalog of NGDI practices, Tools, and platformsAt this point in the report, we’ve defined the terms and concepts of next generation data integration (NGDI), listed the drivers that push organizations into a new generation, and discussed common generational changes. As you have likely noticed, the next generation of data integration involves many different options, which include tool features and tool types, user-oriented techniques and methods, and team or organizational structures. Now it’s time to draw the big picture so we can answer questions about these options, such as:

• What are the many options that users need to incorporate into the next generation of their data integration solutions?

• Which ones are users adopting and growing the most?

• Which are in decline?

• At what rate is generational change occurring?

To help quantify these and other questions, TDWI presented survey respondents with a long list of options for data integration. (See the left side of Figure 12, page 25.) These options include a mix of vendor-oriented product features and product types, as well as user-oriented techniques and organizational structures. The list includes options that have arrived fairly recently (real-time functions, complex event processing), have been around for a few years but are just now experiencing broad adoption (changed data capture, high availability for DI servers, services), or have been around for years and are firmly established (ETL, hand coding, batch processing). The list is a catalog of available options for DI, and survey responses enable us to sort and interpret the list in a variety of ways.

Concerning the list of DI options presented in the survey, TDWI asked: “For the techniques, features, and practices on the following list, which are you using today in or around your primary data integration implementation?” To get a sense of how this will change over time, TDWI also asked:

“Which do you anticipate using in three years or so?” Survey responses for these two questions are

The options available for DI today are diverse in type and maturity

Survey responses enable us to predict the level of increased usage for a DI option

Practices, Tools, and Platforms

Page 26: TDWI Best Practices Report | Next Generation Data Integration

24 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

charted as pairs of bars on the left side of Figure 12. The “potential growth” chart in the middle of Figure 12 simply shows the per-row delta between responses for “using now” and “using in 3 years,” to provide an indication of how much the usage of a DI option will increase or decrease.

The survey question told the respondents: “Checking nothing on a row means you have no plans for using that technique now or in the future.” This enables us to quantify the approximate percent of user organizations surveyed that are using a particular DI option, whether now, in the future, or both. The cumulative usage measured here is a sign of how committed to a particular DI option users are, on average. These percentages are charted in the “commitment” column of Figure 12.

Potential Growth versus Commitment for DI OptionsFigure 12 is fairly complex, so let’s explain how to read it. First off, Figure 12 is sorted by the

“potential growth” column in descending order. “Master data management (MDM)” appears at the top of the chart, because—with a delta of 45%—this option has the greatest potential growth. However, not all organizations plan to use this option. In the commitment column, we see that 72% of survey respondents have committed to implement MDM at some point. Apparently, 28% of respondents have no plans to implement MDM. By scanning the commitment column in Figure 12, you can see that 72% is a very high level of commitment for a DI option. Coupled with the very high potential growth, it’s obvious that, in the wide majority of organizations, the next generation of DI will include some form of MDM.

From this, we see that there are two forces at work in Figure 12, as well as in the planning processes of user organizations.

•Potential growth The potential growth chart subtracts “using now” from “using in 3 years,” and the delta provides a rough indicator for the growth or decline in use of DI options over the next three years. The charted numbers are positive or negative. Note that a negative number indicates that the use of an option may decline or remain flat instead of grow. A positive number indicates growth, and that growth can be good or strong.

•Commitment Collected during the survey process, the numbers in the commitment column represent the number of survey respondents who selected “using now” and/or “using in 3 years.” However, that number is expressed as a percentage of 323, which is the total number of respondents who answered the questions in Figure 12. Note that the measure of commitment is cumulative, in that the commitment may be realized today, sometime in the near future, or both.

•Balance of commitment and potential growth To get a complete picture, it’s important to look at the metrics for both growth and commitment. For example, some features or techniques may have significant growth rates, but within a weakly committed segment of the user community (clouds, open source DI, Saas). Or, they could have low growth rates, but be strongly committed through common use today (ETL, batch processing). Options seeing the greatest activity in the near future will most like be those with strong ratings for both growth and commitment (MDM, data governance, data quality).

To help you visualize the balance of growth and commitment, Figure 13 includes the potential growth and commitment numbers from Figure 12 as opposing axes of a single chart. DI options are plotted in terms of growing or declining usage (x-axis) and narrow or broad commitment (y-axis).

Commitment and potential growth are two different metrics for the

future of DI options

Survey responses also show which DI

options are relatively common today

Page 27: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 25

For the techniques, features, and practices on the following list, which are you using today in or around your primary data integration implementation? Which do you anticipate using in three years or so? (Answer these two questions for each row in the following table Checking nothing on a row means you have no plans for using that technique now or in the future )

USING IN 3 yEARS USING NOW POTENTIAL GROWTH COMMITMENT

Master data management (MDM) 69% 24% 45% 72%

Real-time data quality 50% 7% 42% 52%

Real-time data integration 56% 16% 40% 60%

Data governance and stewardship 69% 29% 40% 74%

Complex event processing (CEP) 46% 12% 34% 49%

Tool functions for business people 41% 9% 32% 42%

Metadata management 67% 37% 29% 72%

Text analytics or text mining 36% 7% 29% 38%

Real-time alerts 44% 15% 29% 46%

In-memory processing without landing data to disk 41% 14% 27% 44%

Data quality functions 73% 47% 26% 84%

Data profiling 68% 42% 26% 76%

Data federation and virtualization 47% 22% 25% 52%

Web services 51% 28% 24% 56%

Service-oriented architecture (SOA) 47% 25% 23% 52%

Interoperability with message bus or service bus 35% 14% 20% 37%

Single integrated platform for DI, DQ, MDM, etc. 31% 10% 20% 34%

Changed data capture (CDC) 67% 47% 20% 76%

High availability (HA) for DI server 35% 16% 19% 38%

Private cloud as a DI platform 23% 4% 19% 24%

Cross-team collaborative functions 40% 21% 19% 46%

Trickle or streaming data loads 25% 7% 18% 27%

Metadata repository used for non-metadata 28% 11% 17% 31%

Micro batches during business day 32% 17% 15% 36%

xML as source data or message type 53% 40% 13% 59%

DI tool licensed via open source 20% 8% 12% 23%

Hadoop-based data processing 15% 3% 12% 16%

DI tool licensed via software-as-a-service (SaaS) 17% 7% 10% 20%

Data synchronization 55% 45% 10% 65%

Public cloud as a DI platform 11% 2% 10% 12%

Inter-enterprise or B2B data exchange 22% 14% 9% 25%

Java messaging service (JMS) 20% 14% 6% 25%

Secondary DI tool to clear specific bottlenecks 11% 6% 5% 13%

Sort tool, to augment main DI tool 8% 5% 3% 10%

Replication 31% 29% 2% 39%

Extract, load and transform (ELT) 51% 49% 2% 61%

Extract, transform and load (ETL) 68% 80% -11% 85%

Batch processing 67% 91% -24% 92%

Hand-coded DI routines 22% 49% -27% 51%

Figure 12. Based on 323 respondents. The above charts are sorted by “potential growth.”

Practices, Tools, and Platforms

Page 28: TDWI Best Practices Report | Next Generation Data Integration

26 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

Next Generation DI Options Plotted for Growth and Commitment

Figure 13. Plots are approximate, based on values from Figure 12.

Trends for Next Generation Data Integration OptionsFigures 12 and 13 show that most DI options will experience some level of growth in the near future. The figures also indicate which options will grow the most, and they reveal a number of trends concerning how users plan to apply various options to their next generation data integration solutions. In particular, six groups of options stand out based on combinations of growth and commitment. (See the groups circled, numbered, and labeled in Figure 13.)

1 Strong to moderate commitment, strong potential growth The options most likely to live up to our great expectations and sustain growth over the long haul are those that have solid survey results for both commitment and potential growth. Group 1 in Figure 13 has those numbers, and it includes some of the most hotly pursued features and techniques of recent years. In many ways, group 1 is the epitome of next generation data integration because of its mix of leading-edge options supported by real-world organizational commitment. Group 1 is a mix of growing real-time techniques, data management disciples, and organizational practices. The real-time techniques include real-time data integration, real-time data quality, complex event processing (CEP), and real-time alerts. Real-time techniques appear prominently in other groups in Figure 13, reminding us that the gradual migration of DI solutions toward real-time operation is possibly the strongest trend in DI today. Among these, CEP is a relatively new addition to the DI inventory

0%25

%50

%75

%St

rong

Good

Mod

erat

eW

eak

100%

CO

MM

ITM

ENT

GROWTH

-50% -25% FlatDeclining 0% Good +25% Strong +50%

3. MODERATE COMMITMENT,GOOD POTENTIAL GROWTH

4. WEAK COMMITMENT,GOOD POTENTIAL GROWTH

5. STRONG COMMITMENT,DECLINING GROWTH

2. GOOD COMMITMENT,GOOD POTENTIAL GROWTH

1. STRONG-TO-MODERATECOMMITMENT, STRONG

POTENTIAL GROWTHBatch processing

Extract,transform,

and load (ETL)Change data

capture (CDC)

Extract, load,and transform

(ELT)

Datasync XML

Collaborative DIService bus

HAIntraday microbatches

Uni�ed DI platformRepository for non-metadata

Trickle feed

Replication

JMS B2B DI

Webservices

SOAData federationHand-coded routines

Private cloud for DIOpen source DI

SaaS DIHadoop

Public cloud for DI

SecondaryDI tool

Sorttool

Dataquality

Datapro�ling

Data governanceMetadata

managementMaster datamanagement

(MDM)Real-time

DI

Real-timeDQ

Complex eventprocessing (CEP)Real-time alerts

In-memory DITools for business folkText analytics

Rates of growth and commitment identify

six groups of next generation options

Options capable of real-time operation are numerous, and they are

seeing strong growth

Page 29: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 27

of options; it has come on strong, and TDWI expects CEP to become common in DI contexts in upcoming years.

The data management disciplines of group 1 include MDM, data quality, data profiling, metadata management, and text analytics. Organizational practices include data governance and business people’s new hands-on involvement using DI tools.

2 Good commitment, good growth As with group 1, features and techniques seen in group 2 have real-time data movement in common, ranging from changed data capture (CDC) to Web services and SOA to data federation and data sync. Group 2 also includes ELT (which has replaced ETL in many user solutions and vendor tools) and XML (which is quickly becoming a common data type for DI thanks to its use in B2B data exchange and other operational DI practices).

3 Moderate commitment, good potential growth This group is an eclectic collection of DI options. Again, there are options that can move data in real time or close to it, as with trickle feeds, intraday microbatches, and message/service buses. High availability has become a priority because real-time DI isn’t real-time if it’s not highly available. Group 3 also includes collaborative DI, an organizational practice that has skyrocketed in recent years to coordinate work among burgeoning numbers of DI specialists. In a related practice, users often enable collaboration via shared project documents and development artifacts managed in a metadata repository; as you can see, such repositories now manage much more than metadata, handling master and reference data, browser views of data, discussion threads, object annotations, and a wide range of productivity documents.

4 Weak commitment, good growth It’s interesting that this category includes some of the newest options for data integration, including software as a service (SaaS), public and private clouds, and open source software for DI and related data management disciplines. The appearance of Hadoop in this group (plus text analytics in group 1) reminds us that DI solutions are progressively embracing the integration of unstructured data, especially in the form of natural language text. These options are so new to data integration that they have only a minimal commitment so far, but they should see good growth soon.

Group 4 also includes the use of sort tools and secondary DI tools to augment primary ones. TDWI has seen organizations clear performance bottlenecks with such tools. In a distributed DI architecture, these extra tools help to offload processing workloads from over-taxed DI servers at the hub of the architecture.

5 Strong commitment, declining growth This group includes three of the great pillars of traditional data integration, namely: extract, transform, and load (ETL), batch processing, and hand-coded routines. In fact, these are some of the most common components found in data integration solutions deployed today. If these are so popular, then why does the survey show them in decline?

Think of the many new real-time capabilities that users are employing in DI, plus the strong trend toward data services. Batch processing will never go away, because it’s still very useful. Yet, it’s being used less and being replaced in a growing number of use cases with other speeds and frequencies for processing and information delivery. Likewise, hand coding is being progressively supplanted by solutions built primarily atop a vendor DI tool, as described earlier. Hand coding won’t go away, either, because it’s indispensible for custom work that complements vendor tool capabilities. Long story short, batch processing and hand coding are becoming a smaller percentage of the options applied to DI, as more of the newer options become prominent.

Repositories manage more than metadata, and they enable new collaborative options

Older DI options won’t go away, but will be a lesser percentage of DI functions as they’re joined by new ones

Practices, Tools, and Platforms

Page 30: TDWI Best Practices Report | Next Generation Data Integration

28 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

ETL is a similar case. A common knee-jerk reaction to ETL is that it’s only for overnight batch processing, with heavy transformational processing in support of data warehousing. That might have been true in the early 1990s, but today’s ETL tools support most of the options listed in Figures 12 and 13. Ironically, as users progressively tap into more of these new functions, they usually don’t think of them as ETL, even when the functionality is available directly from an ETL tool or a DI platform with an ETL lineage. Similar to batch processing and hand coding, ETL is not going away. It’s just contracting as a percentage of DI capabilities as new options join it.

USER STORY aN aLTERNaTIvE vIEw Of DaTa INTEGRaTION.

“First, I’m not a fan of ETL, so I’m looking for a solution that will replace it,” said a data architect and

solution architect at a large bank in the United States. “It’s ironic that ETL specialists are hardened

technology guys, yet they’re supposed to satisfy business requirements. I need a solution that gives

business users control over metadata, instead of the ETLers. That way, sales can view data one way this

week, another way next week. Second, if I can’t replace ETL, then I’ll at least improve it by moving from a

time-consuming waterfall development method to an agile one. Third, data integration should just expose

data to mathematicians and statisticians for analytic purposes. The deliverable is mostly transactional

data, with little or no transformation. Hence, there’s no real need for ETL in my department.”

vendor products and platforms for NGDISince the firms that sponsored this report are all good examples of software vendors that offer tools, platforms, and services conducive to the next generation of DI, let’s take a brief look at the product portfolio of each, with a focus on next generation trends and requirements. The sponsors form a representative sample of the vendor community. Yet their DI offerings illustrate different approaches to DI tools and platforms.7

From a vendor’s viewpoint, one of the most challenging next-generation requirements to satisfy is the demand for data management tools that are appropriate to business people. For years, DataFlux has offered a mature DQ suite, and more recently they’ve built out its stewardship functions to evolve them toward data governance, exceptions processing, management dashboards for quality metrics, business-friendly views of data, and other needs specific to business users. DataFlux is a subsidiary of SAS, and a few years ago the two executed a reorganization that moved SAS’s DI products to DataFlux. This has helped them deepen the integration between DQ and DI tools. These tools, of course, also have tight integration with SAS’s DW, BI, and analytic tools. All of these together comprise a broad and deep portfolio of data management tools.

For many user organizations, DI’s next generation is about tapping more functions outside basic DI ones, which often requires acquiring more tool types. In response to this demand, the IBM Software Group provides a comprehensive portfolio of integrated products and capabilities for a variety of use cases. The IBM InfoSphere Information Server platform has common metadata services and integrated user-centric tooling designed to promote enterprisewide collaboration between lines of business and IT. The platform also supports automated integration of best practices, reference architectures, and control for reducing risk for future projects. Integrated capabilities include DI, DQ, CDC, replication, data federation, and many other data management disciplines. Multiple approaches to MDM are supported through the IBM Master Data Management Server. Data modeling and process tools are available through IBM’s Rational Software product line. IBM

DataFlux

IBM

7 The vendors and products mentioned here are representative, and the list is not intended to be comprehensive.

Page 31: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 29

has taken a leadership position in the big data and analytics domain with the introduction of InfoSphere Streams and Hadoop-based InfoSphere BIG Insights.

TDWI survey data reveals that most users would prefer to acquire as many DI and related tools as possible from a single vendor—but only if the tools are fully integrated. Toward this end, Informatica has built up a broad portfolio encompassing DI, DQ, MDM, profiling, stewardship, data services, changed data capture, unstructured data processing, B2B data exchange, cloud data integration, information lifecycle management, CEP, and messaging. But Informatica has gone the extra mile by assuring a deep level of integration across development environments, expanded data analyst and steward capabilities, and interoperability among deployed servers. In recent years, Informatica has shown thought leadership with a number of next-generation DI issues by helping define and make practical DI competency centers, data services, cloud-based DI, business self service, and lean DI development methods.

Coordinating DI with other data management disciplines is a priority for next generation DI. SAP enables this goal by providing a comprehensive suite of enterprise information management (EIM) tools that are integrated. Furthermore, SAP has extended this priority by providing tight ties among its multiple portfolios of applications for data management, operational applications, and business intelligence. The EIM portfolio includes a mature, integrated solution for DI, DQ, text analytics, data profiling, and metadata management. There are also tools for several next generation hot spots such as CEP, text analytics, CDC, and MDM. The recent acquisition of Sybase adds Sybase IQ (a columnar analytic database) and Sybase Replication Server (for high-end replication and synchronization). To serve the business user who needs to actively support data management work, the new SAP Business Objects Information Steward pulls together a business user interface for profiling, metadata, data definitions, and DQ rules.

Scalability and speed are near the top of the priority list for next generation data integration solutions, and Syncort has long served organizations that have a pressing need to accelerate their data integration environments. Well-known for its high-speed mainframe sorting product (Syncsort MFX), Syncsort Incorporated offers a sophisticated portfolio of high-performance data integration solutions for open systems running on commodity hardware (Syncsort DMExpress) and data protection (Syncsort BEX). These can be deployed as standalone implementations. But DMExpress is often deployed to extend the data performance capabilities of existing DI environments—or independent software vendor (ISV) applications—to clear their performance and scalability bottlenecks. DMExpress is known for its efficiency, easy learning curve, flexible deployment options, and ability to integrate with other DI and data management tools to deliver extremely high performance at scale.

The Talend Unified Platform is in tune with a number of generational trends in data integration. Many users surveyed are interested in a unified platform, and Talend’s platform includes tools for DI, DQ, MDM, and data profiling. All four tools are built atop a shared platform with a unified metadata repository, only one metadata and administration server to deploy, and a common development GUI integrated into Eclipse. Talend has recently acquired application integration vendor Sopera, whose tool will soon be integrated into the platform. Another generational trend is to use a single tool or platform for analytic DI, operational DI, and other use cases. Talend has a reputation for serving multiple DI user constituencies. Finally, some users are also looking for cost-effective data management tools, and Talend’s open source tools are available at a modest price.

Informatica

SAP

Syncsort

Talend

Vendor Products and Platforms

Page 32: TDWI Best Practices Report | Next Generation Data Integration

30 TDWI rese arch

NE x T GENERAT ION DATA IN TEGRAT ION

RecommendationsModernize your definition of data integration DI has evolved so much in recent years that even data integration specialists find it hard to keep up with the changes. Avoid outmoded mindsets that banish data integration to a dark corner of data warehousing or database administration. You’ll never grasp the next generation of data integration if you can’t see its newly acquired diversity.

Help your colleagues understand that DI is a family of techniques It’s not just ETL or a DBA utility. The list of techniques is already long, and it will get longer.

Note that DI practices reach across analytic and operational boundaries This affects everything, from staffing and funding to tool selection and solution designs to development standards and architecture. Plan the next generation accordingly.

Get out more often DI has a new requirement for collaboration. You’re not doing the job fully unless you’re involved in stewardship and governance. Assume you should coordinate your work with that of other data management disciplines, especially data quality and master data management.

Think of stewardship and governance as data management disciplines They aren’t per se, but they might as well be, because these collaboration and control groups have tremendous influence on next generation data management.

Create a home for wayward DI specialists As the number of DI specialists and the diversity of DI work increases, expect to re-org the DI team. Most organizations continue to be successful with DI sourced from teams for data warehousing and database administration, but there’s a trend toward independent DI teams, sometimes organized as a competency center.

Admit that DI needs an architecture If you don’t have one, get one. Architecture can enable or inhibit critical next generation functions such as real time, scalability, and services. Tools assume certain architectures, but you still have to design your own. Besides, no rule says you must have only one DI tool. Many DI architectures have room for specialized tools that assist with scalability and speed.

Dig deeper into the DI tool you already have Modern tools are amazingly feature-rich, and survey data shows that organizations are using only about 40% of tool functionality.

Use a tool Hand coding is feature-poor and non-productive by nature, and there’s no way you can hand code most leading-edge requirements for the next generation, such as event processing, text analytics, and advanced DQ functions (e.g., identity resolution).

Anticipate integrating new data types Complex data (as in hierarchies and XML) and text (human language) are the most likely new data types for the average DI implementation.

Look into the newest DI functions—whether you need them or not Stay educated so you can map available DI options to new requirements as they arrive.

Be open to new platform choices It’s just a matter of time before DI tools are commonly running on private or public clouds and being licensed as open source or software-as-a-service.

Keep an eye on the DI techniques poised for greatest growth See the right-most column in Figure 13.You may not need all of these now, but you will someday.

Don’t forget the meat and potatoes ETL has lost its sex appeal for some people, but it’s still the heart and soul of most DI solutions. Likewise, protect and grow the DI disciplines that have the strongest demand from your user base, such as data quality, metadata management, CDC, data sync, and MDM. All future generations will be a mix of old and new, legacy and leading edge.

Redefine DI for yourself and your peers

Collaborate and coordinate to truly

know and satisfy DI solution requirements

Most likely changes: DI team structure and

DI architecture

More and deeper tool use is inevitable for

upcoming generations

Know the new stuff, but don’t forget the old

Page 33: TDWI Best Practices Report | Next Generation Data Integration

tdwi.org 31

Consider DI as infrastructure If your organization truly needs to share lots of data broadly across business units, making DI a centrally owned resource that’s openly shared is more likely to achieve enterprise goals than a plague of departmentally owned DI solutions.

Expect DI to keep evolving It’s just now exploring new frontiers such as extended collaboration and coordination, complex data, clouds, open source, services, and DI as infrastructure.

Assume there is a new generation of DI in your future Either business changes will force you into one or your current generation will age to the point that you need to bring it up to date. Most DI solutions are out of date or feature poor in some respect, anyway. Leverage one generation after the next to fix the failings of prior ones or reposition for tomorrow’s computing needs.

The rampant changes in DI aren’t over Revel in what’s to come!

Recommendations

Page 34: TDWI Best Practices Report | Next Generation Data Integration

Research Sponsors

Talendwww.talend.com

Talend is the recognized market leader in open source data management and application integration. Talend revolutionized the world of data integration when it released the first version of Talend Open Studio in 2006.

Talend’s data management solution portfolio now includes operational data integration, ETL, data quality, and master data management. Through the acquisition of Sopera in 2010, Talend also became a key player in application integration.

Unlike proprietary, closed solutions, which can only be afforded by the largest and wealthiest organizations, Talend makes middleware solutions available to organizations of all sizes, for all integration needs.

Datafluxwww.dataflux.com

DataFlux is a software and services company that enables business agility and IT efficiency. A wholly owned subsidiary of SAS (sas.com), DataFlux provides data management technology that helps organizations reduce costs, optimize revenue, and mitigate risks as well as manage critical aspects of data.

By providing solutions that meet the needs of business and IT users, DataFlux offers complete enterprise solutions, including enterprise data quality, data integration, data migration, data consolidation, master data management (MDM), and data governance. It also provides a full range of training and consulting services.

IbMwww.ibm.com/software/data/integration

IBM InfoSphere Information Server is a data integration platform that helps enterprises understand, cleanse, transform, and deliver trusted information to critical business initiatives. The platform provides everything needed to integrate heterogeneous information from across disparate systems, including capabilities to support information governance, data quality, data transformation, and data synchronization so that information is consistently defined, accurately represented, reliably transformed, and updated on an ongoing basis. Business and IT professionals use these capabilities to design, deploy, and monitor the core business rules, data integration, and data quality processes they need to deliver effective business analytics and to optimize their information architecture.

Informaticawww.informatica.com

Informatica is the world’s number one independent leader in data integration software. With Informatica, thousands of organizations around the world gain a competitive advantage in today’s global information economy with timely, relevant, and trustworthy data for their top business imperatives. With Informatica, enterprises gain a competitive advantage from all their information assets to grow revenues, increase profitability, further regulatory compliance, and foster customer loyalty. The Informatica Platform provides corporations with a comprehensive, unified, open, and economical approach to lower IT costs and gain competitive advantage from their information assets held in the traditional enterprise and in the Internet cloud.

Sapwww.sap.com

As market leader in enterprise application software, SAP (NYSE: SAP) helps companies of all sizes and industries run better. From back office to boardroom, warehouse to storefront, desktop to mobile device—SAP empowers people and organizations to work together more efficiently and use business insight more effectively to stay ahead of the competition. SAP applications and services enable more than 109,000 customers to operate profitably, adapt continuously, and grow sustainably.

Syncsortwww.syncsort.com

Syncsort is a global software company that helps the world’s most successful organizations rethink the economics of data. Syncsort provides extreme data performance and rapid time to value through easy-to-use data integration and data protection solutions. With over 12,000 deployments, Syncsort has transformed decision making and delivered more profitable results to thousands of customers worldwide.

Page 35: TDWI Best Practices Report | Next Generation Data Integration

TDWI rese a rch

TDWI Research provides research and advice for business intelligence and data warehousing professionals worldwide. TDWI Research focuses exclusively on BI/DW issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence and data warehousing solutions. TDWI Research offers in-depth research reports, commentary, and inquiry services as well as custom research, topical conferences, and strategic planning services to user and vendor organizations.

1201 Monster Road SW

Suite 250

Renton, WA 98057-2996

T 425.277.9126

F 425.687.2842

E [email protected]

tdwi.org