Data Mashups Defined and the Differences from Traditional Data Integration Approaches

22
Data Mashups Defined and the Differences from Traditional Data Integration Approaches Byron Igoe Product Manager InetSoft Technology for the Minnesota Chapter of The Data Management Association

description

Data Mashups Defined and the Differences from Traditional Data Integration Approaches. Byron Igoe Product Manager InetSoft Technology. for the Minnesota Chapter of The Data Management Association. Presentation Outline. Traditional Data Integration ETL & EII Spreadmarts - PowerPoint PPT Presentation

Transcript of Data Mashups Defined and the Differences from Traditional Data Integration Approaches

Page 1: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

Data Mashups Defined and

the Differences from Traditional Data

Integration Approaches

Byron Igoe

Product Manager

InetSoft Technology

for the Minnesota Chapter of The Data Management Association

Page 2: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

2

Presentation Outline

I. Traditional Data Integration

a. ETL & EII

b. Spreadmarts

II. Meaning and Origins of Data Mashup

a. In-Memory Data Federation

b. Combining Formal and Informal Data Sources

c. Differences from Traditional Techniques

III. Data Management and Data Mashup

a. Data Warehousing

b. Meta Data

c. Data Governance

d. Enterprise Content Management

e. Data Modeling

Page 3: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

3

Traditional Data Integration: ETL

Extract, Transform and Load

a well-understood convention for preparing data for analysis

reasons for being:

reorganization

conversion

cleansing

mapping

pre-calculations of business metrics

transformations

aggregations

save processing resources during analyses

ensure data quality

Page 4: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

4

ETL (continued)

Data warehousing trends

growth in number of data sources

range of 3 to 30 “official” data sources currently

users desire to use data sources discovered via the Web

using reports or feeds from vendors & partners

growth in data¹

Annual global data production: 5 exabytes

5,000,000,000,000,000,000 – 18 zeroes

Equivalent of 37K US Libraries of Congress

Almost 1 GB per person on earth

Growing at 30% per year

1 zetabyte by 2010 – 21 zeroes

what are the data sizes and growth rates at your enterprise?

¹Source: UC Berkeley study, 2003

Page 5: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

5

ETL (continued)

Limitations and challenges of traditional ETL & data warehousing

cumbersome to add data sources

bottleneck for ever increasing user demands

overkill for some data sources, especially transient ones

rigidity of business metric definitions

inflexibility to process changes

lag in data availability

Page 6: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

6

Traditional Data Integration: EII

Enterprise Information Integration

same principle as ETL, creating a single data source from many

arose from data warehouse’s limitation of data timeliness

difference from data warehousing: a virtual data warehouse

benefits:

data is "real-time"

more adaptable to changes in definitions/processes

limitations:

bottlenecks and slow turnaround time to incorporate changes to definitions and processes

still relies on IT efforts to respond to demands

Page 7: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

7

Spreadmarts

The “bane” of the business intelligence specialist!

the use of spreadsheets to store copies of enterprise data

arose from users’ frustrations with

lack of any business intelligence front-end application, or

too-hard-to-use versions of early (and some current) applications

graphical charting limitations of a BI app

tedious change request form processes

slow turnaround times to change requests

not having a way to bring in external data

Page 8: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

8

Spreadmarts (continued)

The current position in business intelligence

now BI vendors and enterprises are learning to accept the spreadsheet as a very user-friendly tool

but still aim to reign in the use of spreadmarts per se because they are:

error prone

institutionalizing labor inefficiency

can become corrupted

have data size limitations

are not ideal for sharing

knowledge is “locked up”

don’t have governance controls

violate Sarbanes-Oxley requirements

in search of the “right” solution

Page 9: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

9

Meaning and Origins of Data Mashup

A mashup is “the creation of a new work from two sources that were not initially designed to be combined"

first used in music in the early ’00’s, especially rap music

next used in Web 2.0 environment, especially Web portals, like My Yahoo

next entered enterprise application space, limited to “screen scraping”

now we define “data mashup” as “data transformation and integration that can be done by users with minimal skills”

examples:

joining two datasets that weren’t previously combined

creating a new business metric on the fly

importing external or user-created data

Page 10: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

10

The Differences from Traditional Techniques

it’s the middle ground between "IT controlled" and "User defined“

“collaboration" is born

in the traditional models, IT defines how multiple sources are connected

painstaking process; especially for mergers, process changes, etc.

with data mashup, the connections are created on the fly

Page 11: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

11

The Business Case Benefits of Mashups

Higher ROI on BI investment

higher success rate of deployment due to higher:

end-user satisfaction

usage rates

adoption rates

greater number of actionable learnings leading to:

more sales and/or

greater efficiency

increased speed of:

decisions

competitive responses

reactions to customer feedback

Page 12: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

12

The Business Case Benefits of Mashups

Lower TCO

reduced personnel needed to support a BI solution

end-user self-service

save on change request processes

save on manpower to code requests

reduce report request backlog

reduced number of highly-skilled analysts or DBAs needed to satisfy business demands

end-users meet their own needs more often

Page 13: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

13

The Advent of In-Memory Data Federation

Moore’s law, increasing power, lower costs of CPU & memory allow in-memory transformation, pre-aggregation and caching

Enables data mashup as well

Page 14: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

14

The Trade-offs of these Techniques

Technique Development

Time

Development

Skill

Latency Performance Adaptability

ETL high high high high low

Data Federation high high low medium low

Spreadsheet low low high low high

Data Mashup low low low medium high

Page 15: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

15

Combining Formal and Informal Data Sources

how a data mashup works

similar to what a user is doing in Excel

creating new formulas

bringing in external data

doing what-if scenarios

live connections to the enterprise sources are maintained

data mashup "refreshes" automatically on each use

can save it to a shared folder for re-use and collaboration

Page 16: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

16

Data Management and Data Mashup

Relative to Data Warehousing

data mashups can be seen as an expedient alternative to data warehousing is some cases

data mashup can be a precursor to data warehousing

allows quick and inexpensive experimentation

when satisfied, codify the mashup into a data warehouse for performance benefits

Page 17: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

17

Data Management and Data Mashup

Relative to Impact on Pre-Aggregation

pre-aggregation improves downstream processing

with many traditional techniques:

pre-aggregations are designed before reports and dashboards

usage of pre-aggregated data is explicit

in the data mashup model, pre-aggregation can be built into the engine

Page 18: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

18

Data Management and Data Mashup

Importance of Meta Data

creation of mashups depend on meta data: data type compatibility

transformation options, like grouping and aggregation, differ based on the field type

Page 19: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

19

Data Management and Data Mashup

Relative to Data Governance

data mashups are a major improvement over spreadmarts

data quality is enhanced

live data is used

no copying & pasting

changes to master data mappings take effect immediately

data security is enhanced

security defined at source system level

all derived mashups automatically secured

overcome limitations of Excel’s security

concern: is it giving too much power to users?

no different than what users will do inevitably in Excel

Page 20: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

20

Data Management and Data Mashup

Relative to Enterprise Content Management

data mashups are re-usable & shareable

data integrity is always maintained

more easily embedded in other applications, portals

Page 21: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

21

Data Management and Data Mashup

Relative to Data Modeling

data mashups situated on top of various data sources

data mashups can use:

physical tables

pre-defined SQL, or

logical models

Page 22: Data Mashups Defined and  the Differences from Traditional Data Integration Approaches

22

Questions and Discussion