Getting Data to Applications: Why Do We Fail, and How We Can Do Better?

Getting Data to Applications: Why Do We Fail, and How We Can Do Better?

Arnon Rosenthal,

Frank Manola, Scott Renner

Toward an Industrial Revolution for Data Interoperability

Incremental, (full) Interfaces, Incentives

Arnon Rosenthal,

Frank Manola, Scott Renner

logistics mapmaker intelligence operations

sensor naval NIMA info products ground air

Goal: A Common Operational Picture (COP)

User seesdata values,

assembled andexpressedin user’s

own terms

Sourcetier

Viewtier

The “Common Operation Picture” warehouse or federation:

an integrated subset of information sourceswith presentations for different users

5

Current Status

Read only is insufficiently ambitious for a guiding vision but is driving many industrial solutions

Proposed architectures (e.g., messaging) often don’t fit

- Metadata

- Operations: update /annotate/subscribe

- Fusion

Numerous initiatives that are likely to fail e.g., common operational pictures

- Today’s technology: Costly, little reuse, skill-intensive

7

Toward Attainable Goals (and more realistic slogans)

“Give everyone transparent (read) access to all data”. (Any success stories?)

The vision of perfection crowds out ability to live with imperfection!|

Restate the challenge: Prepare data/software systems to work with partners -- including unknown future ones?

Connection-creation as a core competence for IT

- Describe each service that is offered or wanted (e.g., some operation on some data)

- Reduce cost of establishing the software connection

- Reuse knowledge captured when a connection is built

8

What Do We Mean “Industrial Revolution”?

Small tasks Each with one skill Many atomic steps become automatable

Each produces reusable knowledge

(as opposed to motivating a few lines within a program)

“Market-driven” (as connections are made) rather than giant initiatives

9

Future of Large Info Management Architectures

Consensus among researchers for scalable sharing

- Each data resource describes what it offers

- Each consumer describes what it wants

- Discovery and brokering processes create a connection

(prototypes automate some cases)

Is it really so different from today? each functional task is performed by today’s developers

- Key difference: “describe and generate”

10

A word from our sponsor: We’re Hiring

Researcher / Consultants, Prototypers, Systems Engineers (or make us an offer)

Main offices: suburbs of Boston and Washington DC

- Also jobs in Norfolk, Montgomery, St. Louis, San Diego, … + Europe, Asia

We’re a nonprofit working mostly for the US government (A good place to learn. So you’ll get more stock options later)

US Citizens and Permanent residents only (so MITRE can get you a security clearance)

12

Talk Outline

Why do current approaches so often fail?

- We act as if we believe ridiculous things -- in architectures and in design discussions

Where should we try to go? Incremental Interoperability

- Aim to revolutionize -- incrementally

How to Start Moving in this Direction?

- Scope of talk: Create logical connectivity -- development and logical admin

- Omits: Systems planning, execution performance (cache selection, indexing, dissemination)

14

Tacit Assumptions -- and Antidotes -- 2

“End State” fallacies:

- Architectures are for a perfect end state (?) Systems conform and consumers benefit only when transition is complete (?)

- You’ll add flexibility later (?) Config. mgt. is a sufficient strategy for change (?)

Advice Nuggets Architect for manageable, adaptable, imperfect systems

(for 2001, 2002, … 2999)

- Transitional states are within the architecture Architect for adaptability. How to contract for it?

- Config. management is only a brake

15


Mandates will elicit good quality metadata (?)

- Local administrators will rush to keep you up to date (?)

Advice Nuggets Active (operational) metadata is kept accurate

- Passive metadata is untested, and soon too obsolete to drive automated processing (except browsing)

More carrots, fewer sticks

- If your tools use the metadata to ease the providers’ tasks, you’ll get better metadata

Calls for metadata should include an exploitation plan

16


“Midpoint” Fallacy: Design a compromise interface (msg?) Build around and above it. (?)

“Message interface” Fallacy : “Send message Mxyz” is a fine interface between systems (?)

- Support interfaces procedurally (e.g., Java + parser) (?)

Describe the “natural” interface.

- One interface supports all subsets.

- Connectors are separate & declarative (e.g. SQL + fns?)

On the consumer’s interface, generate

- operations (e.g., query, update, subscribe)

- metadata, e.g., units, error, access controls

18

Tacit Assumption 6: Interoperability Metaphor: Universal Plug

Two ProngsToo Simple

Important element of truth: Design to plug into the “infosphere”, not into one neighbor

19

A Better Interoperability Metaphor: A Multi-Pin Connector

1 2 3 4 5 6 7 8 9 10 11 12 13

14 15 16 17 18 19 20 21 22 23 24 25

transactionsCORBA/DCOM

SQLXML

Track Resolution of Each Pin’s Issues

All the PinsHave To Fit --

and Many are compound

Data Each attribute hassemantics format, quality

20

Organization of the Section


Where Should We Want to Go?

- Approach

- Taxonomy of needed capabilities

How to Start Moving in this Direction?

Research Agenda: Risk Mitigation

21

Transition is the steady state, with good ways to cope

Descriptions of sources, consumers exist -- sometimes

- When build next connection, capture more

You’re still funded to build connections

No giant process cutover

- Discovery and brokering tools work with whatever descriptions they find

Integration contractors already do discovery and brokering!

- Manually, with too little reuse!

For everything, there are multiple ways to do it

- Choose one, but work with those who chose differently

- Connections and transforms are partially known

22

Steps to Connect a Consumer to Provider(s):(with metadata reuse)

Obtain descriptions of each player

- Use same form for consumers’ needs as for providers

- May employ intermediary vocabularies

Discover potential (source, consumer) pairs Obtain transforms for

- Element representations (e.g., miles km; jpeg gif)

- Object and set representations (e.g., ODBC XML)

- Protocols (e.g., DCOM CORBA)

- Pull versus push, whole versus changes Generate the entire connection (tuned for efficiency)

What vendor can supply the framework?

24

Metadata Drives Connection Creation (when there is enough metadata)

Repository/Knowl. Base

TransformLibrary +

Brokering process

New “Wants” from consumer

execute

Discovery process

25

Connection Creation Drives Metadata


M’data capturetools

TransformLibrary +

Brokering process


execute

Discovery process

M’data capturetools +

26

Connection Creation Drives Vocabularies (?)


M’data capturetools

TransformLibrary +

Brokering process


execute

Optimizer

Discovery process

Vocab and I/f creation tools

M’data capturetools +

27

Toward an “industrial revolution” for IT:Re-imagine Existing Processes as Simpler Steps

Each step should

- Require just one or two skills

- Benefit from existing resources -- metadata and transforms

Be fully automated (sometimes)

- Produce reusable resources for later steps

Key challenges:

- Incentives: It’s must be made easier to generate from resource atoms than to code it all yourself!

- To support these incentives, we may need tools that assemble the atomic components into a solution

28

Data Descriptions: A Taxonomy (foil 1 of 2)

Data admin for requirements parallels admin for offers!

- Use same constructs

- Enables (partly) automated comparisons

Interpretation: element semantics, element representation, schema

Scope and completeness of what you provide (population), e.g., images of + all US air-fuel depots, since 1970

+ some NATO fuel depots since 1990

Delivery style (push/pull, whole / changes)

(Is offer/need model adequate for update transactions?)

29

Data Descriptions’ Taxonomy (foil 2 of 2)

Quality of service

- Data quality, timeliness, attribution, completeness, obligation (to continue providing), cost, …

Guidance for data merging (match-up, conflict resolution)

Server information, e.g. (catch-all)

- Access language, protocols, address, security domains, …

32

Talk Outline


Discussion of a “low risk” approach

- What the goal system looks like

- How it evolves

- Tool and technology details

How to Start Moving in this Direction? How to:

- Simplify the task of interfacing to a particular system

- Establish more connections

- Make created interfaces “first class”

Research Agenda: Risk Mitigation

33

Getting Started along the New Road

Provide help in creating needed interfaces

- Focus on individual programs, small initiatives

- Give incremental benefits, to keep all aboard What’s the minimum to give some benefits?

Separate existing work into atomic tasks that require fewer skills, and are sometimes automatable

- No giant cutovers, with massive retraining, coordination

Issues

- What does each program need to do?

- What requires coalitions, or central funding? (e.g., repository, brokers)

34

Tasks (examples)

Define vocabularies for

- Metadata (how to say “means the same”, or “distanceUnits = km” or “Corba3.0 interface)

- Aspects to be brokered (of scope, representation, …)

- Frequently-exchanged domain data (Part#, Facility#)

Describe portions of systems in terms of these vocabs

- Be opportunistic, e.g., when building new connections

Provide transforms among major representations, protocols

Provide brokers for various aspects (simple brokers first)

“Partial brokering” must help metadata providers

35

Who Will Be Most Interested? (Suggested Initial Targets)

Find a system which needs multiple interfaces. (to customers and/or feeders)

Good candidates

- Non-dominant players who must connect to multiple others

- Dominant player with bad ease-of-connecting (MIDB?)

Issue: How soon till it’s helpful

- Generate, based on own entries in metadata repository

- Transformers are quickly helpful (esp. harder ones, e.g., coordinates, image formats)

Perhaps attach to DBMS, or to XML engine?

36

Example Initiatives (and their benefits)

Publish interface in one formalism (with description)e.g., SQL

- Tools generate the additional interfaces, without disturbing the original publisher e.g., XML, CORBA, DCOM, html, …

Publish interface in one vocabulary, for all exported info e.g., Supply

- Tools generate “closest feasible” interface in other vocabularies that have been related to ite.g., Repair, Procurement, Defense finance, …

- Transform representations (image format, coord system) Provide interfaces as (root concept, well known modifier) Derive metadata, additional operations (e.g., update)

40

Summary: Try an approach that hasn’t failed consistently!

Identified pitfalls that are too rarely avoided Described incremental steps toward large scale data admin

for diverse, changing, incomplete systems

Generate connections from reusable resources (system metadata, vocabulary metadata, transforms) active metadata

- Separation of skills, use point and click

- Incentives: Make provide resource + generate easier than writing connecting code

Connection-creation creates more reusable resources

- Projects cooperate to create vocabularies, acquire tools

It’s a low risk approach -- begin prototyping

41

Challenges for Database Researchers

Better brokering for matching requirements to sets of views

- Assume multiple ontologies, spotty connection, incremental improvement

- Explain the shortfalls, understandably

Scalable fusion (to match objects, resolve data conflicts) without n x n pairwise administration

Pragmatic

- Acquisition guidance, e.g., metrics on flexibility (what should be in each acquisition contract?)

- Combine techniques for learning metadata? No more discovery heuristics!

Automate physical DBA work (caching, optimization)

Getting Data to Applications: Why Do We Fail, and How We Can Do Better?

Documents

Transcript of Getting Data to Applications: Why Do We Fail, and How We Can Do Better?