The Glue in Data-Driven Businesses - Bitpipedocs.media.bitpipe.com/io_12x/io_123357/item...flow and...

13
EDITOR’S NOTE EDWs YIELD TO DISTRIBUTED SYSTEMS, VIRTUAL INTEGRATION INTEGRATION FOCUS DRIVES HEALTHY ANALYTICS CLOUD APPS TAKE INTEGRATION NEEDS TO NEW LEVEL The Glue in Data-Driven Businesses A data integration strategy can help organizations get the most out of information that lies in every corner—from data warehouses to cloud applications.

Transcript of The Glue in Data-Driven Businesses - Bitpipedocs.media.bitpipe.com/io_12x/io_123357/item...flow and...

EDITOR’S NOTE EDWs YIELD TO DISTRIBUTED SYSTEMS, VIRTUAL INTEGRATION

INTEGRATION FOCUS DRIVES HEALTHY ANALYTICS

CLOUD APPS TAKE INTEGRATION NEEDS TO NEW LEVEL

The Glue in Data-Driven BusinessesA data integration strategy can help organizations get the most out of information that lies in every corner—from data warehouses to cloud applications.

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES2

EDITOR’S

NOTE

A Clearer Path for Data

in 2010, officials at financial services company CDPQ realized that its data manage-ment architecture needed a transformation—badly. Point-to-point connections between systems made moving data around an arduous process that wasted IT resources.

About 1,600 extraction jobs ran each night, many of them similar but required because of the lack of a well-integrated architecture, said Alexandre Synnett, vice president of data management at the Montreal-based company. Further illustrating the complexity, he told attendees at the 2015 TDWI Executive Summit in Las Vegas that there were as many servers as business users in the organization.

There were business ramifications, too. Users had a hard time accessing data to analyze it. CDPQ decided to bring its IT environment out of the 1980s and become more data-driven. Synnett said that as of February 2015, the com-pany was close to completing the deployment

of a new architecture in which data flows more fluidly from operational to analytics systems and the data integration process is much sim-pler. Now end users can more easily get to the data they need to make more-informed—and hopefully better—investment decisions.

CDPQ’s example points to the importance of effective data integration processes and the predicament companies can find themselves in when they lack smooth data pathways between systems. This guide offers insight and advice on building an integration strategy to support diverse pools of data and modern application needs. First, we look at how to create a virtual data warehouse architecture. Next, we examine a healthcare company’s integration-fueled ana-lytics efforts. We close with tips on integrating on-premises and cloud apps. n

Craig StedmanExecutive Editor, SearchDataManagement

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES3

ARCHITECTURE

EDWs Yield to Distributed Systems, Virtual Integration

A well-functioning enterprise data warehouse combines information from differ-ent subject areas in a central repository, provid-ing senior executives, business managers and operational workers with easy access to clean and consistent data to support the decision-making process. The EDW traditionally has been the only way to provide that kind of data access. But now the need for increased agility and flexibility is leading many organizations to rethink their strategies and move toward a distributed data management architecture for storing, integrating and managing business intelligence and analytics data.

Without doubt, we’re seeing a significant change in the data landscape. It’s said that 90% of the world’s total data was generated over the past three years. This unprecedented, large-scale surge has made it extremely dif-ficult and expensive for organizations to inte-grate and maintain data from disparate sources

in a central data warehouse. The challenges are further heightened by the increasing focus on unstructured and semi-structured data types and the exponential growth of new data sources—for example, syndicated data services, mobile applications and social networks.

In addition, many end users now expect real-time or on-demand access to data. To top it off, data replication and consolidation processes are becoming more complicated as sources multiply, which is adding to mainte-nance overhead and creating more data quality concerns.

DIMINISHING RETURNS

Integrating data from across an organization in an enterprise data warehouse has its advan-tages—doing so gives users a comprehensive view of all aspects of the business. But the reality is that most of the time, data from only

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES4

ARCHITECTURE

a subset of subject areas is analyzed together. As a result, the ROI of centrally integrating everything in an EDW starts to diminish as more data sources are added.

For example, in a manufacturing company, business executives frequently need to access and analyze a combination of sales, inventory and forecast data. Similarly, having access to a mix of data on raw materials, packaging and procurement contracts can provide a lot of insights to manufacturing managers. But sales and procurement are two different points in a supply chain, and integrating and maintaining data about them in a central repository might not provide enough of a financial return to jus-tify the cost.

That raises a pair of questions: What really needs to be stored in the persistent layer of a physical data warehouse? And how can IT teams best offer access to integrated views of data? At a growing number of organizations, answering those questions is pointing the way

to distributed architectures designed to sim-plify data management and better serve end users than a monolithic EDW does.

DISTRIBUTED DATA PUZZLE PIECES

A distributed architecture lets IT manage data in separate systems and create a logical data model that the company can use to integrate information for analysis without moving it to a single location. But you need more than relational databases and extract, transform and load tools to make the distributed data management approach work. Such architec-tures must include the following components as well.

Data virtualization tools. These technologies enable the development of virtual data ware-houses that provide access to data without first having to extract it from source systems and load it into an EDW. Data virtualization

Data virtualization abstracts data from multiple sources to create a unified view for information delivery, analysis and reporting.

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES5

ARCHITECTURE

abstracts data from multiple sources to create a unified view for information delivery, analysis and reporting. By avoiding the need to physi-cally move data to a central repository, virtual-ization makes it easier to add new data sources as information needs evolve; it also gives bet-ter access to real-time data and reduces the need to maintain data at multiple layers of an architecture.

Centralized master data management processes. MDM is crucial to a distributed data warehouse architecture to help ensure that the logical data model functions effectively. Since data is being integrated on the fly from various source systems or siloed data marts that hold subsets of information, it’s imperative that the underlying master data for all the different sources conforms to common specifica- tions and formats. Otherwise, data inconsis-tencies could hamper business intelligence efforts.

Metadata management. Metadata is data about data. In the data warehouse context, there are three main categories: technical metadata that defines tables, fields, partitions and other data structures; business metadata that defines business rules and calculation logic; and pro-cess metadata that catalogs what data is avail-able, where it comes from and how different data sets are related to one another.

Metadata must be properly maintained in a distributed architecture to help IT teams iden-tify the lineage of information, avoid the intro-duction of redundant data and optimize the flow and use of data.

Hadoop systems and NoSQL databases. The advent of new data sources in unstructured and semi-structured formats requires organizations to look beyond relational databases geared to structured transaction data. The need for alter-natives can be effectively catered to by Hadoop clusters and NoSQL database systems that

When properly set up, a distributed data management architecture can house relational and NoSQL databases, Hadoop systems and more.

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES6

ARCHITECTURE

address data variety and can scale to handle large volumes.

Properly set up, a distributed data manage-ment architecture is able to house relational and NoSQL databases, Hadoop systems and other types of technologies under the same virtual roof. Data stored in any of them can be accessed as required, with no con-straints from the end-user perspective. The

distributed approach has the potential to save significant amounts of money on data integration, replication, storage and manage-ment processes. It also provides more agile integration capabilities, enabling organizations to respond faster to ever-changing analytics requirements—and gain more insight to help drive better business results. —Saurabh Jain

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES7

IMPLEMENTATION

Integration Focus Drives Healthy Analytics

Advanced analytics these days requires special attention to data sources and a robust data management plan. That shouldn’t be obscured by the excitement over the latest graduating class of model-wielding data sci-entists. To some extent, things haven’t really changed since the days when the phrase garbage in, garbage out was coined.

High-powered analytics tools can uncover inefficiencies and disclose opportunities. But, as a conversation with the head of analytics at a specialty pharmacy company shows, being able to cope with the incoming data is still the first order of business.

First, some background. Health organizations are at the center of a lot of the latest big data activity. That’s because there is a concerted effort underway to operate more efficiently and reduce costs in the healthcare sector. You may have noticed that the times, they are a-changin’ if you had to do something extraordinary—like,

oh, take a physical, or fill a prescription. Depending on your medical coverage, you may be channeled toward a particular test service, or a mail-order medicine dispenser. Be sure and have your insurance card with you!

That’s how it plays out at the micro level. At the macro level, a lot of diverse data has to be churned to streamline the process of health delivery, and that has to happen before analyt-ics begin.

NEW ANALYTICS NEEDS

Healthcare is starting to resemble a regular business. As such, healthcare organizations require more sophisticated data analytics. Goals such as shorter hospitalizations and cost reductions on medications are behind the drive for stronger analytics. It’s also intended to help industry players achieve better outcomes on patient care.

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES8

IMPLEMENTATION

But achieving broad success with analytics requires special attention to data management, according to Craig Willis, director of analyt-ics at Physicians Pharmacy Alliance (PPA), a company based in Cary, N.C., that provides pharmacy services to patients with complex medication needs.

Willis comes from the operations side of PPA. He began in patient services, without any particular background in statistics. What led him to what he’s doing now—at least in part—is that he has always been “data-driven.” His closeness to the company’s data enables him to work with the IT team to manage a moveable feast of data—clinical intervention, prescription and telephone records, medical claims data and more—that holds the key to improving care for chronically ill individuals who can account for the lion’s share of medical spending.

PPA has expanded its initial analytics efforts around business intelligence (BI) and data visu-alization tools from Tableau Software and is now working with SAS Institute’s Visual Ana-lytics platform.

“We used Tableau for a year in a small

deployment,” Willis said. “The results were great. But we wanted an enterprise platform that provided especially good back-end pro-cessing. What we really needed was something that could handle data management.”

DATA MANAGEMENT DEMANDS

Willis added that the variety of disparate data PPA collects and the speed at which it is gener-ated were factors leading the company to seek more advanced data management capabili-ties to complement the advanced analytics it’s looking to do.

“Due to the amount and velocity of our data, it wasn’t possible to achieve our goals with-out powerful computing in the background,” Willis said. “If you’re working with small data sets, that’s not a requirement. But we have gigs and gigs and gigs of data—thousands of rows of data that can’t be successfully viewed otherwise.”

Also among the drivers for better data man-agement tooling at PPA are new measures that estimate the effectiveness of health plans. Like other healthcare companies, PPA uses the

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES9

IMPLEMENTATION

National Committee for Quality Assurance’s Healthcare Effectiveness Data and Informa-tion Set (HEDIS) performance metrics to help ensure that their quality of care is as high as possible. But Willis said the metrics introduce their own complexity.

“The difficulty is that these measures change annually—they’re evolving,” he said. Moreover, the data required to calculate HEDIS scores comes from “a lot of different places,” including clinical applications and pharmacy and medical claims systems.

That data is diverse, evolving and requires integration may be obvious. But stories like PPA’s bear repeating. Indications are that inte-grating data will continue to be the precursor to successful BI and big data analytics initiatives. That’s part of the reason research company Gartner Inc. forecasts annual growth of 9.6% between 2013 and 2018 for the data integration tools market, pegging it to reach $3.6 billion annually at the end of that period.

The temptation on some people’s part to overlook the importance of data management is natural. Hadoop hoopla has been near-deafen-ing for several years now. The inclination to get on with the work and to do analytics that lead to beneficial business outcomes is valid. But many of the battles will only be won if an orga-nization’s arsenal includes a solid data manage-ment plan. —Jack Vaughan

Indications are that inte grating data will continue to be the precursor to successful BI and big data analytics initiatives.

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES10

STRATEGY

Cloud Apps Take Integration Needs to New Level

With the corporate use of cloud applica-tions increasing, the integration points that IT and data management teams are responsible for are growing as well. More companies are open-ing up to the idea of going to the cloud, par-ticularly for sales and marketing applications, “and now they need to get the data integrated with their on-premises applications,” consul-tant Rick Sherman said in an interview with SearchDataManagement.

Sherman, founder of consultancy Athena IT Solutions, added that doing so isn’t always a simple task. IT managers often have to pick up the integration pieces after individual business units deploy cloud applications on their own, he said. In the interview, Sherman discusses the hurdles typically faced during cloud data integration, the available technology options for integrating cloud and on-premises applica-tions, and how to get started on an integration project.

What barriers are there to integrating data

sources in the cloud and on-premises?

Each of these new cloud applications is another data silo, so there’s a tendency for [the data] to diverge or not be consistent. As far as technical issues, a lot of the integration that IT is used to doing is in data warehousing and business intelligence.

Some of the needs of data integration for the cloud are a little different because we’re not only dealing with a one-way transfer from a data source system to a warehouse—we’re also dealing with application-to-application inte-gration, where you’re loading the data onto the cloud platform and synchronizing it between applications. There are different technolo-gies you can use to do that: enterprise service buses, enterprise message services. But a lot of times, it presents issues to the IT group because they’re not familiar with those other technologies. They’re used to data integration

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES11

STRATEGY

tools that are in the ETL category—extract, transform and load.

Is integration in the cloud something that’s still

primarily being done with ETL software?

ETL tools have still been the primary [choice]. When the first wave of integration came about for the cloud, you also got something called iPaaS—integration platform as a service. What happened was the data integration vendors that use mainly ETL started incorporating other technologies into their integration tech-niques—the ESB variant, the EMS one. That has been beneficial to IT groups because within their existing tools, they can just add the newer ones that their vendors are offering.

How functional are the iPaaS

offerings that are available now?

I would say a lot of the iPaaS tools are still in the data loading and data synchronization use cases, so they can do that well. When it comes to integrating and cleansing the data and mak-ing it more consistent, they’re not as mature or sophisticated. That’s not the use case they’re used to. We still have a lot of [cloud users] that

are just synchronizing data. Whether you’re a big or small company, the first wave that hap-pens is you get these applications and you want to load data into them and synchronize the data between applications. But at some point, you need to go beyond that.

What are some of the factors that IT and

data management teams should consider

when determining which cloud data

integration option is right for a company?

They should look to see what integration ven-dors and products are in use now. Sometimes, integration technologies are embedded in or bundled with a cloud application to get you started, so they should do a quick assessment. If they’re using [ETL] tools and have expertise in them, they should look at those vendors’ iPaaS capabilities and see if they can expand that way.

If they don’t have a big need for ETL, or at the current time they’re just trying to syn-chronize applications, they should probably look at the iPaaS vendors and stick with those [platforms] because they’ll be on a subscrip-tion basis and won’t be overwhelmed with the

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES12

STRATEGY

sophisticated data integration [capabilities] that the other tools have.

And what steps should they take to get started

on a cloud and on-premises integration project?

First, they should take an inventory of what on-premises applications and cloud applications they have. You’d think it would be simple, but the business may or may not have kept them abreast of all the cloud applications. [Second], they should do an assessment of what kind of data volumes and data updates each of those

applications has, to [determine] the demand on the integrations they need to do. Third is to figure out, with these cloud applications, what types of integrations are needed. Do they need to move data between cloud applications? Between cloud and on-premises applications? Do they need to bring the data to a data ware-house or comparable database to do analytics?

Based on that, they can start to look at what technologies make the most sense in order to be able to complete those integration tasks. —Corlyn Voorhees

HOME

EDITOR’S NOTE

EDWs YIELD

TO DISTRIBUTED SYSTEMS,

VIRTUAL INTEGRATION

INTEGRATION FOCUS

DRIVES HEALTHY

ANALYTICS

CLOUD APPS TAKE

INTEGRATION NEEDS

TO NEW LEVEL

THE GLUE IN DATA-DRIVEN BUSINESSES13

ABOUT

THE

AUTHORS

SAURABH JAIN is senior director of Mindtree Ltd.’s Data and Analytics Solutions consulting services practice. Jain has 15 years of industry experience and has worked in a variety of roles on a wide range of business intelligence, analytics and data warehouse initiatives. Email him at [email protected].

JACK VAUGHAN oversees editorial coverage for Search-DataManagement. Previously he was editor in chief for SearchSOA. Before joining TechTarget in 2004, he was editor at large at Application Development Trends and ADTmag.com. Email him at [email protected] and follow him on Twitter: @JackVaughanatTT.

CORLYN VOORHEES is working as an editorial assistant for SearchBusinessAnalytics and SearchDataManagement through Northeastern University’s co-op program. She is an undergraduate student at Northeastern, where she is double-majoring in journalism and communication stud-ies. Email her at [email protected].

The Glue in Data-Driven Businesses is a SearchDataManagement.com e-publication.

Jason Sparapani | Managing Editor

Moriah Sargent | Associate Managing Editor

Craig Stedman | Executive Editor

Jacqui Biscobing | Site Managing Editor

Linda Koury | Director of Online Design

Doug Olender | Publisher | [email protected]

Annie Matthews | Director of [email protected]

TechTarget 275 Grove Street, Newton, MA 02466

www.techtarget.com

© 2015 TechTarget Inc. No part of this publication may be transmitted or re-produced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The YGS Group.

About TechTarget: TechTarget publishes media for information technology professionals. More than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and pro-cesses crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice. At IT Knowledge Exchange, our social community, you can get advice and share solutions with peers and experts.

COVER ART: FOTOLIA

STAY CONNECTED!

Follow @sDataManagement today.