Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

10
This research note is restricted to the personal use of Stephen Oudet ([email protected]). Does the 21st-Century "Big Data" Warehouse Mean the End of the Enterprise Data Warehouse? 25 August 2011 | ID:G00213081 Mark A. Beyer | Donald Feinberg The ideal enterprise data warehouse has been envisaged as a centralized repository for 25 years, but the time has come for a new type of warehouse to handle "big data." This "logical data warehouse" demands radical realignment of practices and a hybrid architecture of repositories and services. Overview The new data warehouse needed for the information management demands of the 21st century is not a replacement for existing practices. Rather, it involves a fundamental realignment of almost every existing practice in order to provide specific functionality within a restyled architecture that capitalizes on the greatest strength of every technique, approach and strategy. At the same time, it introduces fresh techniques and architectural capabilities to meet the demand, created by "big data," cloud utilization, operational technology and social media, for delivery of data to traditional, readily available and consumer-style analytics tools. The focus is on the data-processing or information management logic, not the physical infrastructure — this is a "logical data warehouse" (LDW). Key Findings The vast majority of organizations (judging from over 75% of the data warehouse inquiries received from Gartner clients) select a single deployment style for what they term an enterprise data warehouse (EDW). In doing so they create a compromised environment that fails to deliver on some aspect of the associated SLA. Organizations that deploy an EDW almost all create second and third data warehouses or marts to support additional user needs (judging from up to 90% of the data warehouse inquiries received from Gartner clients), despite strict instructions to use the EDW. The architectural style of a data warehouse is usually determined by the available skills and tools, and secondarily by time-to-delivery, in preference to the anticipated future flexibility or extensibility of the solution. Recommendations Start your evolution toward a LDW by identifying data assets that are not easily addressed by traditional data integration approaches and/or easily supported by a "single version of the truth." Consider all technology options for data access and do not focus only on consolidated repositories. This is especially relevant to "big data" issues. Identify pilot projects in which to use LDW concepts by focusing on highly volatile and significantly interdependent business processes. Use an LDW to create a single, logically consistent information resource independent of any semantic layer that is specific to an analytic platform. The LDW should manage reused semantics and reused data. Table of Contents Analysis Ending the Era of Deficient Compromise Service Level and Benefit Expectations — Revisited A Combined Services and Information Asset Management Platform The Logical Data Warehouse Architecture Evolving Toward the Logical Data Warehouse How Existing Technology Can Fit In Page 1 of 10 Print Document 15/01/2014 http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Transcript of Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

Page 1: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

This research note is restricted to the personal use of Stephen Oudet ([email protected]).

Does the 21st-Century "Big Data"

Warehouse Mean the End of the Enterprise

Data Warehouse?25 August 2011| ID:G00213081

Mark A. Beyer|Donald Feinberg

The ideal enterprise data warehouse has been envisaged as a centralized repository for 25 years, but the time has come for a new type of warehouse to handle "big data." This "logical data warehouse" demands radical realignment of practices and a hybrid architecture of repositories and services.

Overview

The new data warehouse needed for the information management demands of the 21st century is not a replacement for existing practices. Rather, it involves a fundamental realignment of almost every existing practice in order to provide specific functionality within a restyled architecture that capitalizes on the greatest strength of every technique, approach and strategy. At the same time, it introduces fresh techniques and architectural capabilities to meet the demand, created by "big data," cloud utilization, operational technology and social media, for delivery of data to traditional, readily available and consumer-style analytics tools. The focus is on the data-processing or information management logic, not the physical infrastructure — this is a "logical data warehouse" (LDW).

Key Findings

• The vast majority of organizations (judging from over 75% of the data warehouse inquiries received from Gartner clients) select a single deployment style for what

they term an enterprise data warehouse (EDW). In doing so they create a compromised environment that fails to deliver on some aspect of the associated SLA.

• Organizations that deploy an EDW almost all create second and third data warehouses or marts to support additional user needs (judging from up to 90% of the data warehouse inquiries received from Gartner clients), despite strict instructions to use the EDW.

• The architectural style of a data warehouse is usually determined by the available skills and tools, and secondarily by time-to-delivery, in preference to the anticipated future flexibility or extensibility of the solution.

Recommendations

• Start your evolution toward a LDW by identifying data assets that are not easily addressed by traditional data integration approaches and/or easily supported by a "single version of the truth." Consider all technology options for data access and do not focus only on consolidated repositories. This is especially relevant to "big data" issues.

• Identify pilot projects in which to use LDW concepts by focusing on highly volatile and significantly interdependent business processes.

• Use an LDW to create a single, logically consistent information resource independent of any semantic layer that is specific to an analytic platform. The LDW should manage reused semantics and reused data.

Table of Contents

Analysis

Ending the Era of Deficient Compromise

Service Level and Benefit Expectations — Revisited

A Combined Services and Information Asset Management Platform

The Logical Data Warehouse Architecture

Evolving Toward the Logical Data Warehouse

How Existing Technology Can Fit In

Page 1 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 2: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

Table 1. Data Warehouse Architecture Principles, Service Drivers and Primary Limitations

Figure 1. Summary of Standard Data Warehouse Service Contracts

Figure 2. Information Capabilities Framework Management and Semantic Services Categories

Figure 3. Traditional Data Warehouse and Business Intelligence Infrastructure

Figure 4. Services-Oriented Analytics Information Management

Recommended Reading

List of Tables

List of Figures

Analysis

This document was revised on 5 September 2011. For more information, see the Corrections page on gartner.com.

Data warehouse architecture is undergoing an important evolution, as compared with the relative stasis of the previous 25 years. While the term "data warehouse" was coined around 1989, the architectural style predated the term (at American Airlines, Frito-Lay and Coca-Cola).

At its core, a data warehouse is a negotiated, consistent logical model that is populated using predefined transformation processes. Over the years, the various options —centralized EDW, federated marts, hub-and-spoke array of central warehouse with dependent marts, and virtual warehouse — have all served to emphasize certain aspects of the service expectations for a data warehouse. The common thread running through all styles is that they were repository-oriented. This, however, is changing: the data warehouse is evolving from competing repository concepts to include a fully enabled data management and information-processing platform. This new warehouse forces a complete rethink of how data is manipulated, and where in the architecture each type of processing

occurs to support transformation and integration. It also introduces a governance model that is only loosely coupled with data models and file structures, as opposed to the very tight, physical orientation previously used.

This new type of warehouse — the LDW — is an information management and access engine that takes an architectural approach which de-emphasizes repositories in favor of new guidelines:

• The LDW follows a semantic directive to orchestrate the consolidation and sharing of information assets, as opposed to one that focuses exclusively on storing integrated datasets.

• The semantics are described by governance rules from data creation and use case business processes in a data management layer, instead of via a negotiated, static transformation process located within individual tools or platforms.

• Integration leverages both steady-state data assets in repositories and services in a flexible, audited model via the best available optimization and comprehension solution available.

Ending the Era of Deficient Compromise

Some would say that the result of compromise is that everyone is equally unhappy. The new data warehouse is expected to meet all previous data warehouse service-level expectations and to deliver all the originally intended benefits of a warehouse or integration platform — but without any artificial limitations based on use cases or deficient technology. At the same time, the new warehouse must integrate very non-traditional information assets.

Every data warehouse is deployed essentially to meet specific service-level expectations for the delivery and management of data. These expectations have been met using a wide variety of architectures and approaches. The basic premise behind the new data warehouse is that it will combine the strengths of every engineering approach previously used to create a variety of architectural styles into a new model that supports easy switching between styles or a hybrid of diverse delivery approaches. Existing architectures must be altered radically to meet these new demands.

There are many components and expectations associated with each of the traditional warehouse approaches. But for each of the traditional approaches, there is a principal service expectation, a primary design driver and some predominant limitation (otherwise

Page 2 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 3: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

alternatives would not have been necessary). Table 1 compares traditional data warehouse architectures and the LDW.

Table 1. Data Warehouse Architecture Principles, Service Drivers and

Primary Limitations

Warehouse

Architecture

Principal Service-

Level Expectation

Primary Design Driver Primary Limitations

Centralized

repository:

Normalized or

slightly denormalized

data in a single

database; the

traditional

"enterprise data

warehouse."

Integrate and abstract

data for reuse in

analytics, or serve as a

data-sharing platform for transactional

systems.

Need to resolve similar

or the same data that

was designed and

deployed in different applications and

systems, because in

those systems the data

was designed specifically

to support transactional

roles.

• Performance

optimization is often

difficult due to the more normalized

nature of the data.

• Comprehension and

usage barriers arise

due to users' lack of familiarity with third

normal form

approaches.

• Inherited data

governance from authoring

applications makes

ongoing

rationalization and

extensions difficult.

Federated

marts:Multiple individual

data models with

join tables or

views of selected

information

deployed in one

or more

databases.

Isolate cost and

deployment for rapid deployment, while

producing more

comprehensible

reports in a short time

-to-delivery model.

The demand for dynamic

reporting within a well-described business

process, based on one

business process or

business unit's specific

information governance

demands. Enables

analysis by drilling down

into well-organized

reports.

• Perpetuates parochial definitions

and data design,

merely deferring

costs for

rationalization of

sometimes

incompatible data

models.

• Forces multiple

maintenance points

without actually

integrating disparate

data.

Virtual

warehouse:

A view or

semantic layer

over the top of

transactional

systems data,

usually without a

dedicated repository but

sometimes using

cache

technology.

Permit the abstraction

of disparate models

from disparate

locations without

actually moving the

data.

Allow for consolidated

reporting across multiple

systems without having

to add to the storage

environment, while also

avoiding significant

additions to the

compute/processing

environment or server demands.

• Dependent on

external limitations

for data volume,

network capacity

and source

availability.

• Pressured by desired

end-user and

application

connections. Often

disrupted by

downtime from

these issues.

• Even the best-

designed virtual

warehouse often has

to resort to some

form of physically stored cache.

Hub-and-spoke

array:

Summaries,

aggregates and

even variants of similar

dimensions, all

derived from a

central repository

of transformed,

remodeled and

relocated

transactional

data. A second

variant of the

Provide for integration

of designated subsets

of data, while

delivering high-

performance and comprehension-

optimized data access.

The desire for multiple

renderings of the same

data for different use

cases, each optimized

for performance.

• Time-to-deployment

requires phased

rollout, and poor

planning of initial rollouts often forces

a radical redesign

two to five iterations

later.

• Same issues as for the centralized

repository.

Page 3 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 4: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

Source: Gartner (August 2011)

Service Level and Benefit Expectations — Revisited

Every data warehouse is expected to meet well-established and persistent service-level expectations as part of industry best practices in order to deliver the desired benefits of deploying that warehouse. In the past, many of the architectural, design and engineering approaches used to deploy warehouses equated to a series of compromises that favored some of these "service contracts" to the detriment of others, or even sacrificed some requirements due to time-to-market pressures. Figure 1 summarizes the services contracts of a data warehouse (see Note 1).

Figure 1. Summary of Standard Data Warehouse Service Contracts

Source: Gartner (August 2011)

The new warehouse has the same service expectations, but is not specifically a repository and it now includes a series of information management services. So, what precisely is the new architectural form?

A Combined Services and Information Asset Management Platform

The LDW incorporates best practices for service-oriented architecture, information governance, data warehouses and information management. It shifted the debate and the focus of data warehouse design from choosing between fixed implementation and architectural styles to applying best practices in multiple IT delivery areas for the most appropriate use.

The "old" data warehouse usually favored one specified engineering approach, often using procedural processing to extract data from designated repositories, validating the transformations against somewhat static business rules, and then loading the data. For example, traditional extraction, transformation and loading (ETL) identifies the table and column where the source data is and moves it in some type of processing stream to a target. The format and content is known at both source and target — such as when

Warehouse

Architecture

Principal Service-

Level Expectation

Primary Design Driver Primary Limitations

"enterprise data

warehouse."

Logical data warehouse

Combine the benefits of previous

approaches in a "best

fit" architecture. Add

support for distributed

data assets and

parallel distribution of

processing

requirements with

predictable and

repeatable results, while continuing to

support data

centralization when

appropriate. Support

all previous forms of

data warehouse

architecture with easy

switching between

data management and

delivery styles.

The need to account for the reuse of information

transformation, quality

and access services,

regardless of

information/data

formats or locations, to

support data

redistribution or

analytics. Also, the need

to support new and diverse data types at

the same time.

• More of a barrier

than a limitation,

given that existing

warehouse platforms

and architectures

were designed with

centralized

processing as an

underlying

assumption — even for federated

approaches — and

that existing

semantics and data

processing code will

be difficult to adapt

and reuse.

Page 4 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 5: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

moving "first name" in a customer table to "given name" in a warehouse. The quality rules might even be coded directly into the ETL system. By contrast, the LDW takes a data services approach to managing these various requirements.

A data services approach separates data access from processing, processing from transformation, and transformation from delivery. In a data services approach, the pieces are written separately to enable flexible job flows and easily coupled processing. Let's

assume, for example, that there are seven sources for "given name." One level of services would manage the connection string. Another level would read the metadata table indicating that in three of the systems the column desired is named "fname_24," in two of the systems it is listed as "cus_firstname" and in another two systems "name_given" and "namenerstmal." All of these are equivalent to "given_name" and therefore subject to the same data quality rule. So the service that accesses the data passes each of them to one common quality service. Then, after the completion of quality operations, the data is passed on to a delivery service. Say, however, that one of the targets is a data warehouse that needs an insert service to a database management system (DBMS), that another delivery location is an XML message which needs XML structure around it, and that a third delivery location is an application which needs to write data to an array or cursor, etc. It would then be possible to write code so that the XML is always created and additional services render the insert and the array build, or the three delivery functions (insert, XML and array) could be written as three services. Based partly (or nominally) on the reuse rate of a function, it would be necessary to code that function in a loosely coupled fashion or a tightly coupled procedure.

The LDW participates in, and is a beneficiary of, a wider information capabilities services-style approach (see "Information Management in the 21st Century."). In a 21st-century information management architecture, the new warehouse participates in an information capabilities framework (ICF): see "The Information Capabilities Framework: An Aligned Vision For Information Infrastructure."

Since a data warehouse serves primarily as a rationalization and integration engine, it is expected to perform most of its information management duties using data management verbs that integrate and organize data. Additionally, the warehouse is expected to deliver integrated information in an optimized fashion, supporting both comprehension and performance. The new warehouse, therefore, must "decide" when a consolidated repository or a transient (virtual) style of delivery is appropriate. Organization will take place at two levels — first putting information assets together and then determining whether a summary/aggregated dataset is the best organization approach for an end-use case.

An ICF specifies that, regardless of how an application or repository is designed, the information management approach used is expected to perform duties and services from

well-established categories of information management functions (see Figure 2). Further, information itself is treated as an object with value, integrity and rules of behavior. Some of these rules can be deployed as logical policies that are enacted against any asset type, as long as the actual content is the same. For example, a person's name is his or her name regardless of whether it is in a database, a document or spoken in an audio clip. The LDW architect simply determines where each of these verb classes will be provided — in a database, on a services bus, in a view layer and so on. Importantly, an EDW uses a "dedicated" semantic style only, but an LDW uses all semantic styles based on which is most appropriate for the applicable SLA.

Figure 2. Information Capabilities Framework Management and Semantic

Services Categories

Source: Gartner (August 2011)

The Logical Data Warehouse Architecture

In a services-oriented approach to data management it is imperative to understand that nothing is required to execute in the same order every time. Orchestration of services can be declared or dynamic. Declared orchestration executes almost as modularized procedural code, the difference being that the services are free-standing operations and can also be called by other composite processes to occur in a different order. Dynamic orchestration reacts to metadata instructions that are often received as audits of the environment. For

Page 5 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 6: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

example, an analytic query that anticipates putting multiple sources together could get the information from an integrated repository or from a federated view; it might decide which is best by comparing the latency of the integrated repository data with the intention of the querying user to capture newer or older data.

Within an ICF, the data warehouse, like any other use case, must determine a primary semantic "entry point" to begin using a services architecture. For the data warehouse this

primary entry point is defined by the primary service contract, to deliver a consolidated view of disparate data in optimal fashion. A warehouse needs to access sources and deliver that consolidated view.

Therefore, the LDW is designed, first and foremost, using a combination of services and physical data repositories. Secondly, it can be designed with a focus on declared or dynamic orchestration. Finally, it is possible to design some of the LDW using any combination of physical repositories, virtual data objects, declared orchestration or dynamic orchestration. It is also possible to begin with a physical repository approach with highly dedicated, declared access, and then evolve slowly toward more dynamic and mixed data delivery approaches.

Evolving Toward the Logical Data Warehouse

Traditional data warehouses and BI environments have a fairly consistent architecture (see Figure 3). Some of the capabilities are on different platforms, but there is primarily a unidirectional flow of data toward one set of new models and data governance rules.

Figure 3. Traditional Data Warehouse and Business Intelligence

Infrastructure

BI = business intelligence; DBMS = database management system; DW = data warehouse; ETL =

extraction, transformation and loading; LDAP = Lightweight Directory Access Protocol; ODS =

operational data store; OLAP = online analytical processing; RDBMS = relational database

management system

Source: Gartner (August 2011)

If we assume an initial state with a traditional data warehouse, the following points most likely apply:

• You already have a data integration process with "describe" and "organize" functions that specify both the source and target states of the data. They may or may not be deployed as modular code or metadata that drives the process.

• You already have functions that resolve differences between the governance rules of sources and your warehouse target. You also have integration processes that resolve formatting issues.

• You have some implementation rules — sometimes embedded at design time, sometimes deployed when ready for runtime.

Page 6 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 7: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

• You may or may not have auditing capabilities built into your processing (such as for profiling, record counts of completed versus dropped transformations and data quality "outs"). However, they are probably designed for permanent use of a combined "consolidation" and "dedicated" semantic layer. In other words, they are

probably procedural and not dynamic (at least not without returning to the design tools and redeploying).

• Your existing orchestration is most likely not dynamic — and, unless you are using a virtual warehouse strategy, the concept of using registries for data sources and target objects is most likely nonexistent.

• There is probably little or no ability to use other repositories as information assets in query responses — such external assets are either loaded directly during a transformation-and-load process or loaded from one of your source systems (as with postal data added to an ERP system and then relayed to the warehouse).

Let's assume that, instead of accepting this unidirectional, static orchestration, you want to develop an LDW. To do this, you start by introducing the LDW concepts used for dynamic consolidation, integration and implementation, as depicted in Figure 4 (note that the diagram uses today's terminology — transformation using "ETL/ELT," "federation" and so on — but that these concepts are deconstructed, for evolution toward a modern architecture, in "Information Management in the 21st Century" ):

• The data integration process can be broken into sourcing, collation, data quality, formatting and domain governance segments, based on information availability and governance rules. For example, the sourcing/extraction process can be a registry semantic layer using "describe" verbs that tell the service "where" the data is. If data for "person" is located in documents, clickstreams and enterprise systems, one service can use textual analysis and search for documents, another service can use MapReduce to read massive volumes of tags in "clicks," and a traditional native driver access approach can pull data from the enterprise system database. A data quality process can then verify the work done by each service and undertake an enrichment and/or value substitution process, before prepping the data for delivery. If the data is dynamic and constantly changing, the data integration process can deliver a virtual data object, but if the data is already validated by a master data management process and fairly static, it can be loaded into a table or file. A final service can determine the appropriate load or access format and put the data into that format.

• In relation to latency issues, you are no longer bound by load restrictions. It is possible to indicate in a metadata layer that there are different requirements for different analytic end-use cases. For example, one department may require higher-quality data but tolerate higher-latency delivery (it would get data from fully validated tables), while another department might be prepared to risk inconsistencies in data but require low latency (it would get a combined-registry delivery of yesterday's data in the tables with today's data from the OLTP system —"dirty but fast"). Or, instead of this fixed approach, you could have a service that negotiates whether the quality SLA is being met for each of the departments and switches between strategies dynamically. For example, the department requiring low latency might receive data from the warehouse repository in the morning, after the previous night's load had brought everything up to date, but in the afternoon it might receive a composite view. And, instead of switching at a predetermined time of day, the switch would be based on how far out of synchronization the two sources are, based on record counts and data quality ratings.

• A dynamic service that determines when to write summary or aggregate data is generally faster than one that performs a query-time summary of detailed rows. It could even switch on the basis of CPU and storage utilization/performance audits, and change its approach throughout the day. It could also switch dynamically between approaches on the basis of system audits that determine whether more memory is added for caching, or even if it is worthwhile to perform caching.

• Adding external data based on services written to read and analyze those data sources also becomes easier. For example, adding operational-technology data such as the millions of records generated each day by RFID-enabled supply chain management tracking systems or utility smart grid meters requires massive data integration processing in a procedural manner when using traditional warehouses But by developing two or three variants of the same MapReduce function, the LDW can orchestrate the preferred approach for different analyst audiences and leave the data in the source or the historian software (see "Historian Software and the Smart Grid: Questions and Misconceptions").

Note that with the LDW approach, the differing styles of support, such as federation, data repositories, messaging and reductions, are not mutually exclusive. They are just manifestations of data delivery. The focus is on getting the data first, then figuring out the delivery approach that best achieves the SLA with the querying application. The transformations occur in separate services.

Figure 4. Services-Oriented Analytics Information Management

Page 7 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 8: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

ETL = extraction, transformation and loading; ELT = extraction, loading and transformation

Source: Gartner (August 2011)

How Existing Technology Can Fit In

Note that Figure 4 does not argue for specific technologies to perform each approach. This is because multiple engineered solutions can be used to deliver the same architecture and

design, as noted in the two different design scenarios:

1. Use a BI platform and DBMS stack. While tending toward a more dedicated semantic, a BI platform deployed in tandem with a dedicated DBMS can deliver the entire approach. For example, the BI platform could negotiate with the DBMS when to use a table as opposed to a federated view of data. But any form of dynamic approach to using federation, materialized views or tables would have to be leveraged by the DBMS optimizer — and all the options would have to be maintained in the database. Of course, some semantic layers in BI platforms fail to properly combine platform optimization with DBMS optimization, while others can accomplish this task, and still others are improving. This is one disadvantage of an engineering approach to "use what is available," instead of "designing to purpose."

2. Use an enterprise service bus (ESB), data integration tools and DBMS. An ESB can define discrete services or register services provided by the data integration tool (which becomes a development workbench with orchestration occurring in the ESB). The DBMS works in its usual fashion — optimizing for view and table use to respond to queries (by maintaining cubes, views, indices, etc.). In addition, the DBMS or ESB could manage external calls to nondatabase types of information as service calls to other application services, or by orchestrating calls to the functions of other tools or repositories. This could even include calls to content management systems, sentiment analysis tools and text analysis tools.

In addition, many data integration tool vendors support variations of this infrastructure to some degree. Database vendors support capabilities to deploy access to external information assets and even to externally managed parallel distributed processes.

The main shortcoming of an approach that uses existing technologies is the inability to integrate data management with business process management. A business process management tool could add the ability to link analytics data sources with analytics processing, and then provide the results to an operational application for use in on-demand analytics. The ability to link process management with analytics is the first step in a Pattern-Based Strategy.

Page 8 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 9: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

Recommended Reading

Some documents may not be available as part of your current Gartner subscription.

"The State of Data Warehousing in 2011"

"Magic Quadrant for Data Warehouse Database Management Systems"

"Analytics and Learning Technology: CIOs, CTOs Should Rethink Art of the Possible"

"Magic Quadrant for Data Integration Tools"

"Data Architectures to Support Performance Management Applications"

"Magic Quadrant for Data Quality Tools"

"Hype Cycle for Data Management, 2011"

"Applying Gartner's Pace Layer Model to Human Capital Management"

"Cool Vendors in Data Management and Integration, 2011"

Strategic Planning Assumptions

• By 2014, 85% of organizations will fail to deploy new strategies to address data complexity and volume in their analytics.

• Organizations which fail to deploy strategies to address data complexity and volume issues for their analytics by 2012 will experience more than doubling costs of ownership for their data warehouse and mart environments in disorganized attempts to meet this new demand.

• By 2014, organizations which have deployed analytics to support new complex data types and large volumes of data in analytics will outperform their market peers by more than 20% in revenue, margins, penetration and retention.

Note 1

How "Able" Is Your Data Warehouse?

Many organizations recognize that best practices demand a data warehouse that provides for subject-oriented, integrated, consistent and time-variant data for critical corporate data. The overall architecture of the warehouse can achieve these objectives by adhering to six basic architectural principles.

Data warehouses should be:

• Extensible. It should be easy to add more data sources or to change data sources during the life of the data warehouse.

• Flexible. The data warehouse should be modeled to a level of abstraction that supports modifications to the data model as more data subject areas are added.

• Repeatable. Data warehouses should provide consistent, predictable query response times; as a result, they may themselves introduce redundancy as needed.

• Reusable. Data in the warehouse should be fully qualified to allow multiple departments to use it in a variety of contexts. This relates to the abstraction rules in the data model, and to the data integration transformation rules that consolidate and collate data to support the introduction of commonly held data enrichment and cleansing rules.

• Scalable. The data warehouse must be able to support more rows of data, and the data architecture must account for storage of and access to data, as well as its archive and retirement.

• Available. The data warehouse must be able to operate in virtually nonstop mode, with

provisions for reconfiguration, migration, backup, data insertion and performance optimization.

These "-ables," which were originally conceived as a group by other analysts, have existed for years. However, many organizations have attempted to achieve all six in a single data architecture tier, an approach that has proved untenable in the end-user market. It is best to think of these six expectations as clauses in a service contract — which the warehouse is expected to fulfill.

Note 2

Gartner's ICF Definitions

Gartner's information capabilities framework (ICF) is the collection of technical capabilities required to create business value from information assets. It is a conceptual model that is people, process and technology independent and allows IT leaders to think holistically about the capabilities required to describe, organize, integrate, share and govern information in an application-independent manner. It is independent of use case and

Page 9 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...

Page 10: Does the 21st-Century _Big Data_ Warehouse Mean the End of the Enterprise Data Warehouse

information source and does not rely on, nor advocate, any technology or architectural style. However, it does take into account the specifics of use cases.

An "information capability" is a representation of the actions needed for the information to be used, treated, organized or developed for the general management of, and for specific purposes throughout, the organization.

An "information use case" represents the usage of information throughout the organization to create business value.

The ICF's common capabilitieslayer provides the range of functionalities used to describe,

organize, integrate, share and govern the information, and the capabilities required to interact with physical data stores (operate), to prepare the information for consumption (provision) and to increase the value of the information by making it more easily used and found, and by providing context (enrich).

The ICF's information semantic styleslayer provides the specific entry or "gate" into information management functions or capabilities. These services follow styles or approaches that support specific assumptions on how an application interacts with the data it uses.

The ICF's specialized capabilities layer deals with the range of functionalities used to support use-case-specific requirements.

© 2011 Gartner, Inc. and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained

from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner's research may discuss legal issues related to the information

technology business, Gartner does not provide legal advice or services and its research should not be

construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to

change without notice.

Page 10 of 10Print Document

15/01/2014http://my.gartner.com/portal/server.pt/gateway/PTARGS_0_2776563_353_256_2350...