Getting Data to Applications: Why Do We Fail, and How We Can Do Better?
-
Upload
samantha-monroe -
Category
Documents
-
view
10 -
download
0
description
Transcript of Getting Data to Applications: Why Do We Fail, and How We Can Do Better?
Getting Data to Applications: Why Do We Fail, and How We Can Do Better?
Arnon Rosenthal,
Frank Manola, Scott Renner
Toward an Industrial Revolution for Data Interoperability
Incremental, (full) Interfaces, Incentives
Arnon Rosenthal,
Frank Manola, Scott Renner
logistics mapmaker intelligence operations
sensor naval NIMA info products ground air
Goal: A Common Operational Picture (COP)
User seesdata values,
assembled andexpressedin user’s
own terms
Sourcetier
Viewtier
The “Common Operation Picture” warehouse or federation:
an integrated subset of information sourceswith presentations for different users
5
Current Status
Read only is insufficiently ambitious for a guiding vision but is driving many industrial solutions
Proposed architectures (e.g., messaging) often don’t fit
- Metadata
- Operations: update /annotate/subscribe
- Fusion
Numerous initiatives that are likely to fail e.g., common operational pictures
- Today’s technology: Costly, little reuse, skill-intensive
7
Toward Attainable Goals (and more realistic slogans)
“Give everyone transparent (read) access to all data”. (Any success stories?)
The vision of perfection crowds out ability to live with imperfection!|
Restate the challenge: Prepare data/software systems to work with partners -- including unknown future ones?
Connection-creation as a core competence for IT
- Describe each service that is offered or wanted (e.g., some operation on some data)
- Reduce cost of establishing the software connection
- Reuse knowledge captured when a connection is built
8
What Do We Mean “Industrial Revolution”?
Small tasks Each with one skill Many atomic steps become automatable
Each produces reusable knowledge
(as opposed to motivating a few lines within a program)
“Market-driven” (as connections are made) rather than giant initiatives
9
Future of Large Info Management Architectures
Consensus among researchers for scalable sharing
- Each data resource describes what it offers
- Each consumer describes what it wants
- Discovery and brokering processes create a connection
(prototypes automate some cases)
Is it really so different from today? each functional task is performed by today’s developers
- Key difference: “describe and generate”
10
A word from our sponsor: We’re Hiring
Researcher / Consultants, Prototypers, Systems Engineers (or make us an offer)
Main offices: suburbs of Boston and Washington DC
- Also jobs in Norfolk, Montgomery, St. Louis, San Diego, … + Europe, Asia
We’re a nonprofit working mostly for the US government (A good place to learn. So you’ll get more stock options later)
US Citizens and Permanent residents only (so MITRE can get you a security clearance)
12
Talk Outline
Why do current approaches so often fail?
- We act as if we believe ridiculous things -- in architectures and in design discussions
Where should we try to go? Incremental Interoperability
- Aim to revolutionize -- incrementally
How to Start Moving in this Direction?
- Scope of talk: Create logical connectivity -- development and logical admin
- Omits: Systems planning, execution performance (cache selection, indexing, dissemination)
14
Tacit Assumptions -- and Antidotes -- 2
“End State” fallacies:
- Architectures are for a perfect end state (?) Systems conform and consumers benefit only when transition is complete (?)
- You’ll add flexibility later (?) Config. mgt. is a sufficient strategy for change (?)
Advice Nuggets Architect for manageable, adaptable, imperfect systems
(for 2001, 2002, … 2999)
- Transitional states are within the architecture Architect for adaptability. How to contract for it?
- Config. management is only a brake
15
Tacit Assumptions -- and Antidotes -- 3
Mandates will elicit good quality metadata (?)
- Local administrators will rush to keep you up to date (?)
Advice Nuggets Active (operational) metadata is kept accurate
- Passive metadata is untested, and soon too obsolete to drive automated processing (except browsing)
More carrots, fewer sticks
- If your tools use the metadata to ease the providers’ tasks, you’ll get better metadata
Calls for metadata should include an exploitation plan
16
Tacit Assumptions -- and Antidotes -- 4
“Midpoint” Fallacy: Design a compromise interface (msg?) Build around and above it. (?)
“Message interface” Fallacy : “Send message Mxyz” is a fine interface between systems (?)
- Support interfaces procedurally (e.g., Java + parser) (?)
Describe the “natural” interface.
- One interface supports all subsets.
- Connectors are separate & declarative (e.g. SQL + fns?)
On the consumer’s interface, generate
- operations (e.g., query, update, subscribe)
- metadata, e.g., units, error, access controls
18
Tacit Assumption 6: Interoperability Metaphor: Universal Plug
Two ProngsToo Simple
Important element of truth: Design to plug into the “infosphere”, not into one neighbor
19
A Better Interoperability Metaphor: A Multi-Pin Connector
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25
transactionsCORBA/DCOM
SQLXML
Track Resolution of Each Pin’s Issues
All the PinsHave To Fit --
and Many are compound
Data Each attribute hassemantics format, quality
20
Organization of the Section
Why do current approaches so often fail?
Where Should We Want to Go?
- Approach
- Taxonomy of needed capabilities
How to Start Moving in this Direction?
Research Agenda: Risk Mitigation
21
Transition is the steady state, with good ways to cope
Descriptions of sources, consumers exist -- sometimes
- When build next connection, capture more
You’re still funded to build connections
No giant process cutover
- Discovery and brokering tools work with whatever descriptions they find
Integration contractors already do discovery and brokering!
- Manually, with too little reuse!
For everything, there are multiple ways to do it
- Choose one, but work with those who chose differently
- Connections and transforms are partially known
22
Steps to Connect a Consumer to Provider(s):(with metadata reuse)
Obtain descriptions of each player
- Use same form for consumers’ needs as for providers
- May employ intermediary vocabularies
Discover potential (source, consumer) pairs Obtain transforms for
- Element representations (e.g., miles km; jpeg gif)
- Object and set representations (e.g., ODBC XML)
- Protocols (e.g., DCOM CORBA)
- Pull versus push, whole versus changes Generate the entire connection (tuned for efficiency)
What vendor can supply the framework?
24
Metadata Drives Connection Creation (when there is enough metadata)
Repository/Knowl. Base
TransformLibrary +
Brokering process
New “Wants” from consumer
execute
Discovery process
25
Connection Creation Drives Metadata
Repository/Knowl. Base
M’data capturetools
TransformLibrary +
Brokering process
New “Wants” from consumer
execute
Discovery process
M’data capturetools +
26
Connection Creation Drives Vocabularies (?)
Repository/Knowl. Base
M’data capturetools
TransformLibrary +
Brokering process
New “Wants” from consumer
execute
Optimizer
Discovery process
Vocab and I/f creation tools
M’data capturetools +
27
Toward an “industrial revolution” for IT:Re-imagine Existing Processes as Simpler Steps
Each step should
- Require just one or two skills
- Benefit from existing resources -- metadata and transforms
Be fully automated (sometimes)
- Produce reusable resources for later steps
Key challenges:
- Incentives: It’s must be made easier to generate from resource atoms than to code it all yourself!
- To support these incentives, we may need tools that assemble the atomic components into a solution
28
Data Descriptions: A Taxonomy (foil 1 of 2)
Data admin for requirements parallels admin for offers!
- Use same constructs
- Enables (partly) automated comparisons
Interpretation: element semantics, element representation, schema
Scope and completeness of what you provide (population), e.g., images of + all US air-fuel depots, since 1970
+ some NATO fuel depots since 1990
Delivery style (push/pull, whole / changes)
(Is offer/need model adequate for update transactions?)
29
Data Descriptions’ Taxonomy (foil 2 of 2)
Quality of service
- Data quality, timeliness, attribution, completeness, obligation (to continue providing), cost, …
Guidance for data merging (match-up, conflict resolution)
Server information, e.g. (catch-all)
- Access language, protocols, address, security domains, …
32
Talk Outline
Why do current approaches so often fail?
Discussion of a “low risk” approach
- What the goal system looks like
- How it evolves
- Tool and technology details
How to Start Moving in this Direction? How to:
- Simplify the task of interfacing to a particular system
- Establish more connections
- Make created interfaces “first class”
Research Agenda: Risk Mitigation
33
Getting Started along the New Road
Provide help in creating needed interfaces
- Focus on individual programs, small initiatives
- Give incremental benefits, to keep all aboard What’s the minimum to give some benefits?
Separate existing work into atomic tasks that require fewer skills, and are sometimes automatable
- No giant cutovers, with massive retraining, coordination
Issues
- What does each program need to do?
- What requires coalitions, or central funding? (e.g., repository, brokers)
34
Tasks (examples)
Define vocabularies for
- Metadata (how to say “means the same”, or “distanceUnits = km” or “Corba3.0 interface)
- Aspects to be brokered (of scope, representation, …)
- Frequently-exchanged domain data (Part#, Facility#)
Describe portions of systems in terms of these vocabs
- Be opportunistic, e.g., when building new connections
Provide transforms among major representations, protocols
Provide brokers for various aspects (simple brokers first)
“Partial brokering” must help metadata providers
35
Who Will Be Most Interested? (Suggested Initial Targets)
Find a system which needs multiple interfaces. (to customers and/or feeders)
Good candidates
- Non-dominant players who must connect to multiple others
- Dominant player with bad ease-of-connecting (MIDB?)
Issue: How soon till it’s helpful
- Generate, based on own entries in metadata repository
- Transformers are quickly helpful (esp. harder ones, e.g., coordinates, image formats)
Perhaps attach to DBMS, or to XML engine?
36
Example Initiatives (and their benefits)
Publish interface in one formalism (with description)e.g., SQL
- Tools generate the additional interfaces, without disturbing the original publisher e.g., XML, CORBA, DCOM, html, …
Publish interface in one vocabulary, for all exported info e.g., Supply
- Tools generate “closest feasible” interface in other vocabularies that have been related to ite.g., Repair, Procurement, Defense finance, …
- Transform representations (image format, coord system) Provide interfaces as (root concept, well known modifier) Derive metadata, additional operations (e.g., update)
40
Summary: Try an approach that hasn’t failed consistently!
Identified pitfalls that are too rarely avoided Described incremental steps toward large scale data admin
for diverse, changing, incomplete systems
Generate connections from reusable resources (system metadata, vocabulary metadata, transforms) active metadata
- Separation of skills, use point and click
- Incentives: Make provide resource + generate easier than writing connecting code
Connection-creation creates more reusable resources
- Projects cooperate to create vocabularies, acquire tools
It’s a low risk approach -- begin prototyping
41
Challenges for Database Researchers
Better brokering for matching requirements to sets of views
- Assume multiple ontologies, spotty connection, incremental improvement
- Explain the shortfalls, understandably
Scalable fusion (to match objects, resolve data conflicts) without n x n pairwise administration
Pragmatic
- Acquisition guidance, e.g., metrics on flexibility (what should be in each acquisition contract?)
- Combine techniques for learning metadata? No more discovery heuristics!
Automate physical DBA work (caching, optimization)