Provenance: concepts, architecture and envisioned tools Professor Luc Moreau...
-
Upload
tucker-humphress -
Category
Documents
-
view
216 -
download
0
Transcript of Provenance: concepts, architecture and envisioned tools Professor Luc Moreau...
![Page 1: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/1.jpg)
Provenance: concepts, architecture and envisioned tools
Professor Luc [email protected] of Southamptonwww.gridprovenance.org
![Page 2: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/2.jpg)
Provenance Team
University of Southampton Luc Moreau, Victor Tan, Paul Groth, Simon Miles, Luc
Moreau IBM UK (Project Coordinator)
John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff
Omer Rana, Arnaud Contes, Vikas Deora Universitad Politecnica de Catalunya (UPC)
Steven Willmott, Javier Vazquez SZTAKI
Laszlo Varga, Arpad Andics German Aerospace
Andreas Schreiber, Guy Kloss, Frank Danneman
![Page 3: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/3.jpg)
Overview
Context Provenance Concepts & Definitions Architectural Design Provenance tools Conclusions
![Page 4: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/4.jpg)
Context: Importance of Past Processes
![Page 5: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/5.jpg)
Context (1)
Aerospace engineering: maintain a historical record of design processes, up to 99 years.
Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients
![Page 6: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/6.jpg)
Context (2)
High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)
Bioinformatics: verification and auditing of “experiments” (e.g.for drug approval)
![Page 7: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/7.jpg)
Concepts & Definitions
![Page 8: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/8.jpg)
Provenance: dictionary definition
Oxford English Dictionary: the fact of coming from some particular
source or quarter; origin, derivation the history or pedigree of a work of art,
manuscript, rare book, etc.; concretely, a record of the ultimate derivation and passage of an item through its various owners.
Concept vs representation
![Page 9: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/9.jpg)
Provenance Definition
Our definition of provenance in the context of applications for which process matters to end users:
The provenance of a piece of data is the process that led to that piece of data
Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases
![Page 10: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/10.jpg)
Provenance “Lifecycle”
ApplicationApplication
Results
ProvenanceStore
Record Documentation of Execution
QueryProvenance
ofData
AdministerStore and itscontents
Core Interfaces to Provenance Store
![Page 11: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/11.jpg)
Nature of Documentation
We represent the provenance of some data by documenting the process that led to the data: documentation can be complete or
partial; it can be accurate or inaccurate; it can present conflicting or
consensual views of the actors involved;
it can provide operational details of execution or it can be abstract.
![Page 12: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/12.jpg)
p-assertion
A given element of process documentation will be referred to as a p-assertion p-assertion: is an assertion that
is made by an actor and pertains to a process.
![Page 13: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/13.jpg)
Service Oriented Architecture
Broad definition of service as component that takes some inputs and produces some outputs.
Services are brought together to solve a given problem typically via a workflow definition that specifies their composition.
Interactions with services take place with messages that are constructed according to services interface specification.
The term actor denotes either a client or a service in a SOA.
A process is defined as execution of a workflow
![Page 14: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/14.jpg)
M1
M2
M3
M4
Actor 1 Actor 2
I received M1, M4I sent M2, M3
I received M3I sent M4
From these p-assertions, we can derive that M3 was sent by Actor 1and received by Actor 2 (and likewise for M4)
If actors are black boxes, these assertions are not very useful because we do not know dependencies between messages
Process Documentation (1)
![Page 15: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/15.jpg)
M1
M2
M3
M4
Actor 1 Actor 2
M2 is in reply to M1M3 is caused by M1M2 is caused by M4
M4 is in reply to M3
These assertions help identify order of messages,but not how data were computed
Process Documentation (2)
![Page 16: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/16.jpg)
f
M1
M2
M3
M4
Actor 1 Actor 2
f1
f2
M3 = f1(M1)M2 = f2(M1,M4) M4 = f(M3)
These assertions help identify how data is computed,but provide no information about non-functional characteristics of the computation(time, resources used, etc)
Process Documentation (3)
![Page 17: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/17.jpg)
M1
M2
M3
M4
Actor 1 Actor 2
I used 386 clusterRequest sat inqueue for 6min
I used sparc processor
I used algorithm x version x.y.z
Process Documentation (4)
![Page 18: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/18.jpg)
Types of p-assertions (1)
Interaction p-assertion: is an assertion of the contents of a message by an actor that has sent or received that message
I received M1, M4I sent M2, M3
![Page 19: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/19.jpg)
Types of p-assertions (2)
Relationship p-assertion: is an assertion, made by an actor, that describes how the actor obtained output data or the whole message sent in an interaction by applying some function to input data or messages from other interactions.
M2 is in reply to M1M3 is caused by M1M2 is caused by M4
M3 = f1(M1)M2 = f2(M1,M4)
![Page 20: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/20.jpg)
Types of p-assertions (3)
Actor state p-assertion: assertion made by an actor about its internal state in the context of a specific interaction
I used sparc processor
I used algorithm xversion x.y.z
![Page 21: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/21.jpg)
Data flow
Interaction p-assertions allow us to specify a flow of data between actors
Relationship p-assertions allow us to characterise the flow of data “inside” an actor
Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result
![Page 22: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/22.jpg)
Architectural Design
![Page 23: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/23.jpg)
Interfaces to Provenance Store
ApplicationApplication
Results
ProvenanceStore
Record Documentation of Execution
QueryProvenance
ofData
AdministerStore and itscontents
![Page 24: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/24.jpg)
![Page 25: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/25.jpg)
Provenance Tools
![Page 26: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/26.jpg)
Provenance Tools
Five core deliverables Data model and schema Provenance store Client side libraries Generic Provenance tools Methodology
![Page 27: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/27.jpg)
Provenance Modelling
![Page 28: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/28.jpg)
Provenance Store Reference Implementation
Implementation of recording, querying and managing interface
Provenance store implemented as a Web Service
Client side libraries for using Provenance Store
Axis Handler for automatically recording communication between Axis-based Web Services
![Page 29: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/29.jpg)
AxisHandler
AxisHandler
Provenance Store
OGSA DAI Interface
Exist DB2
…Backend Stores
PS Client Side
Library
PS Client Side
Library
Web Service WS Client
Query Actor WS
PS Client Side
Library
WS Calls
Java Calls
Implementation Diagram
![Page 30: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/30.jpg)
Implementation Details
Currently functional prototype is a pure Web Services solution (based on Tomcat/AXIS)
Security will be based on WS-Security WSRF offers a number of interesting
opportunities, and we are considering mapping the (technology-neutral) architecture on to a WSRF-oriented stack.
![Page 31: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/31.jpg)
Query Interface
Purpose Obtain the provenance of some specific data Allow for “navigation” of the documentation of
execution Abstract interface
Allows us to view the provenance store as if containing XML data structures
Independent of technology used for running application and internal store representation
Seamless navigation of application dependent and application independent provenance representation
![Page 32: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/32.jpg)
Structure of Documentation
The documentation of processes recorded by actors can be categorised into a hierarchy
All documentation
Message exchange Message exchange
Message sender’s view Message receiver’s view
Message content State of actor during exchange Relationships
![Page 33: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/33.jpg)
XML Query Languages
Two existing query languages provide ways of navigating hierarchical data: XPath and XQuery
For instance, we can use XPath to refer to: The message exchange with ID 345 The client’s view of that exchange The body of the message exchanged
// messageExchange [id=“345”] / clientView / messageContent
![Page 34: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/34.jpg)
Navigating Message Content
If message content is in XML format, or can be mapped to it, then XPath and XQuery can be used to navigate into the message content
For example, we can add application-specific navigation to the previous XPath: The SOAP envelope that encloses the message The body of the message within the envelope The customer name within the body
// messageExchange [id=“345”] / clientView / messageContent / soap:envelope / soap:body // customerName
![Page 35: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/35.jpg)
Other Query Requirements
Execution Filtering: include/exclude all p-assertions that are marked as part of an execution by a single actor.
Functionality Filtering: include/exclude p-assertions that have one of a given set of operation types.
Process Filtering: include/exclude p-assertions that belong to a given (set of) process(es).
![Page 36: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/36.jpg)
Navigation
Relations
Comparison Conflict
AnalysisAssertionEngine
Visualisation
usesA B
A makes use of BGeneric Tools
![Page 37: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/37.jpg)
Generic Tools
Analysis: constraint satisfaction over p-assertions and their content
Comparison: comparison between assertions
Conflict detection: detect conflicts between assertions
Rule engine: verify that provenance of some data satisfy some constraints
Visualisation: Implemented as a Portlet (using the eXo Portal Framework – JSR 168 compliant
![Page 38: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/38.jpg)
Methodology
How to design applications (whether legacy or new) so that they become provenance aware
Sets of useful schema Guidelines on what to record
![Page 39: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/39.jpg)
Key Deliverables
NOW: First functional prototype NOW: Architecture (technology
independent), first public version 04/06: Set of tools 04/06: Final Architecture 09/06: Web Service standardisation
proposal 09/06: Full implementation, secure and
scalable 09/06: Methodology: how to make your
application provenance-aware
![Page 40: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/40.jpg)
Conclusions
![Page 41: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/41.jpg)
ProvenanceInformation
Reco
rd
Applying Provenance
Query
Compliance Reproduction Analysis
Standardising thedocumentation of
Business Processes
Provenance Architecture Methodology
Apply
Healthcare
DistributionFinance
Aerospace
Automobile
Pharmaceutical
![Page 42: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/42.jpg)
Conclusions
Mostly unexplored area that is crucial to develop trusted systems
Definition of provenance Specification of provenance representation Architecture Tools
Data models Provenance Store Client side tools Generic tools Methodology
![Page 43: Provenance: concepts, architecture and envisioned tools Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton .](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c7b5503460f9492e881/html5/thumbnails/43.jpg)
Conclusions
Current work: System and protocol designing, architecture
specification, generic support for use cases Pursue the deployment in concrete application and
performance evaluation Work towards a standardisation proposal Methodology
Software soon to be available Tell us about your use cases: we are keen to
find new collaborations in this space! Download the architecture definition from
www.gridprovenance.org