CyberSecuritySoton.org @CybSecSoton Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and...

20
CyberSecuritySot on.org @CybSecSot on Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and Internet Science Research Group Cybercrime Workshop 30 Jan 2014

Transcript of CyberSecuritySoton.org @CybSecSoton Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and...

CyberSecuritySoton.org@CybSecSo

ton

Provenance Analytics and Crowdsourcing

Trung Dong HuynhWeb and Internet Science Research Group

Cybercrime Workshop 30 Jan 2014

2

PROVENANCE: A SHORT INTRODUCTION

3

Provenance in Fine Art and Antiques

Provenance of a painting:• Sales receipts• Auction and exhibition catalogues• Gallery stickers• Letters from artists

“establishing provenance is essentially a matter of documentation”

4

Provenance of Digital Objects

Goal: we aim to express how data was created and evolved• who played what role in creating the data• how the data was revised over time, by whom• what other data was used in the process• which tool(s) were used to generate each version

Interchangeability is key!

5

Provenance: A Definition

Provenance is“a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing”

(PROV-DM – The PROV Data Model)

Prof Luc Moreau

6

The PROV Data Model

7

A PROV Example

entity(isbn:0002261022, [prov:label="The Glass Palace"])

entity(isbn:2020669595, [prov:label="Le Palais des Miroirs"])

agent(AmitavGhosh)

wasAttributedTo(isbn:0002261022, AmitavGhosh)

activity(writingTheBook)

wasAssociatedWith(writingTheBook, AmitavGhosh, -)

wasGeneratedBy(isbn:0002261022, writingTheBook, -)

agent(ChristianneBesse)

wasAttributedTo(isbn:2020669595, ChristianneBesse, [prov:role='translator'])

activity(translation)

wasAssociatedWith(translation, ChristianneBesse, -)

wasGeneratedBy(isbn:2020669595, translation, -)

used(translation, isbn:0002261022, -)

The provenance of two books:

• “The Glass Palace”, written by Amitav Ghosh

• “Le Palais des Miroirs”, the French translation, done by Christianne Besse, of the book of Amitav Ghosh

8

Why Provenance Is Needed?

• Open Information Systems: Origin of data

• News and Media Sources and references

of news, blogs, etc.

• Science How the results were

obtained Can they be reproduced

• Manufacturing &

business Traceability of faults (e.g.

suppliers, designers, contractors)

Certificates of origin

• Health Traceability of medicine,

lab test results, organs

• Policy and Law Compliance Privacy protection

COLLABMAP – A CROWDSOURCING APPLICATION

11

• 38,000 micro-tasks• 160 contributors• 5,151 buildings

13

Provenance in CollabMap

A Provenance Graph from CollabMap

14

15

16

Provenance Graphs as Networks

Benefits from network analytics• Network structure• Extrapolations• Sampling• Similarity• Compression

Network metrics• Number of nodes• Number of edges• Graph diameter• Maximum finite distances

(between each pair of node types – entities, activities, agents)

• Node degree distribution• Densification exponent• (and more)

17

Data Quality Assessment

Classifying the trustworthiness of CollabMap data:

• Calculate network metrics of CollabMap provenance graphs

• Supervised learning from user votes (on building, route, route sets)

• Classification accuracy: over 95%

Strong correlation between network metrics of

provenance graph and data quality in CollabMap

Potential Applications

• Auditing confirm that provenance was properly recorded

• On-the-fly classifications• Detection of abnormality• Community detection• Identifying key actors and key links• Inferring missing links

edge directions edge types

18

19

Conclusions

• Provenance analytics is new and unexplored

• Promising novel applications

• We want to test the approach on new data and applications

20

Contact Details

Trung Dong [email protected]

about.me/dong.huynh