A Sightseeing Tour of Prov and Some of its Extensions

24
A Sightseeing Tour of PROV and Some of its Extensions Khalid Belhajjame LAMSADE, Université Paris-Dauphine 16/03/16 MADICS: ReProVirtuFlow 1

Transcript of A Sightseeing Tour of Prov and Some of its Extensions

Page 1: A Sightseeing Tour of Prov and Some of its Extensions

1

A Sightseeing Tour of PROV and Some of its

Extensions Khalid Belhajjame

LAMSADE, Université Paris-Dauphine

16/03/16 MADICS: ReProVirtuFlow

Page 2: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 2

Why do we care about provenance …

Help explain results and outliers Assess trust and quality Promote systems transparency: users are

able to determine whether a particular use of information is appropriate under a set of rules.

Assist in debugging Promote reuse and reproducibility

16/03/16

Page 3: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 3

A bit of HistoryProvenance is not a new topic. There has been a lot of provenance work in:

Databases, Workflows, Information retrieval, …. By 2009, there have been a number of

models/vocabularies for expressing provenance information Open Provenance Model (OPM), Proof Markup Language (PML), Provenance Vocabulary, PREservation Metadata : Implementation

Strategies (PREMIS), Semantic Web Applications in Neuromedicine

(SWAN) Ontology, Dublin Core, ….

16/03/16

Page 4: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 4

A bit of History 2009-2010: W3C Provenance Provenance Incubator

Group Objective: provides a state of the art and possible

recommendations for standardization efforts

2011: W3C Provenance Working Group Objective: To define a standard vocabulary primarily

for the semantic Web

2013: The W3C Provenance Working Group published a number of PROV recommendations and notes: PROV-DM, PROV-O, …

Since then a number of models and vocabularies have extended and/or defined mapping rules to PROV16/03/16

Page 5: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 5

Family of PROV documents

16/03/16

Page 6: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 6

Family of PROV documents

16/03/16

Page 7: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 7

Provenance The W3C Provenance Working Group defined provenance as:

Provenance is defined as a record that describes the people, institutions, entities, and activities involved in

producing, influencing, or delivering a piece of data or a thing.

16/03/16

Page 8: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 8

PROV…is not a recommendation for representing and collecting provenance information that should be adopted internally by all systems.

That is not realistic, and won’t happen any time soon

Instead, the aim to facilitate and promote interoperability between domains and applications that adopt their specific representations of provenance.

More pragmatic, and thus likely to happen.

16/03/16

Page 9: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 9

Example

16/03/16

Page 10: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 10

PROV Core Structures

16/03/16

Page 11: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 11

Entity An entity is a physical, digital, conceptual, or

other kind of thing with some fixed aspects; entities may be real or imaginary.

Example: An entity may be the document at IRI http://www.bbc.co.uk/news/science-environment-17526723, a file in a file system, a car, or an idea.

16/03/16

Page 12: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 12

Activity An activity is something that occurs over a

period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.

Example: An activity may be the publishing of a document on the Web, sending a twitter message, extracting metadata embedded in a file, driving a car from Paris to Lyon, etc.

16/03/16

Page 13: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 13

Agent An agent is something that bears some form

of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.

Example: A site selling books on the Web and the companies hosting them can be seen as agents.

16/03/16

Page 14: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 14

Usage and Generation

Usage is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity and could not have been affected by the entity. Example: A program beginning to read an

input file

Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation. Example: the completed creation of a file by a

program16/03/16

Page 15: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 15

Derivation Derivation is a transformation of an entity

into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

Example: The transformation of a relational table into a linked data set

16/03/16

Page 16: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 16

Association and Attribution

An activity association is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity Example: the workflow system is responsible

for the enactment of a workflow execution

Attribution is the ascribing of an entity to an agent. Example: A blog post can be attributed to an

author, a mobile phone to its manufacturer.

16/03/16

Page 17: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 17

PROV Core Structures

16/03/16

Page 18: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 18

W3C PROV Implementations: Preliminary Analysis

16/03/16

Source: https://khalidbelhajjame.wordpress.com/2013/04/04/w3c-prov-implementations/

Page 19: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 19

PROV Compliant Vocabularies

This is by no mean complete ….

PROV

ProvONE

wfprov wfdescc

DC

PAV

extends

extendsc

extends

mapsTo

mapsTo

16/03/16

Page 20: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 20

Prospective provenance

Retrospective provenance

ProvONE: A PROV Extension Data Model for Scientific Workflow

Provenance

16/03/16

Page 21: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 21

PAV ontology: provenance, authoring and versioning

16/03/16

Page 22: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 22

PAV ontology: provenance, authoring and versioning

16/03/16

Page 23: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 23

Acknowledgements W3C Provenance Working Group DataONE Workflow and Provenance Interest

Group PAV’s friends: Paolo Ciccarese, Stian Soiland-

Reyes, Alasdair JG Gray, Carole Goble and Tim Clark

16/03/16

Page 24: A Sightseeing Tour of Prov and Some of its Extensions

MADICS: ReProVirtuFlow 24

A Sightseeing Tour of PROV and Some of its

Extensions Khalid Belhajjame

LAMSADE, Université Paris-Dauphine

16/03/16