CyberSecuritySoton.org @CybSecSoton Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and...
-
Upload
rodney-robertson -
Category
Documents
-
view
216 -
download
0
Transcript of CyberSecuritySoton.org @CybSecSoton Provenance Analytics and Crowdsourcing Trung Dong Huynh Web and...
CyberSecuritySoton.org@CybSecSo
ton
Provenance Analytics and Crowdsourcing
Trung Dong HuynhWeb and Internet Science Research Group
Cybercrime Workshop 30 Jan 2014
3
Provenance in Fine Art and Antiques
Provenance of a painting:• Sales receipts• Auction and exhibition catalogues• Gallery stickers• Letters from artists
“establishing provenance is essentially a matter of documentation”
4
Provenance of Digital Objects
Goal: we aim to express how data was created and evolved• who played what role in creating the data• how the data was revised over time, by whom• what other data was used in the process• which tool(s) were used to generate each version
Interchangeability is key!
5
Provenance: A Definition
Provenance is“a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing”
(PROV-DM – The PROV Data Model)
Prof Luc Moreau
7
A PROV Example
entity(isbn:0002261022, [prov:label="The Glass Palace"])
entity(isbn:2020669595, [prov:label="Le Palais des Miroirs"])
agent(AmitavGhosh)
wasAttributedTo(isbn:0002261022, AmitavGhosh)
activity(writingTheBook)
wasAssociatedWith(writingTheBook, AmitavGhosh, -)
wasGeneratedBy(isbn:0002261022, writingTheBook, -)
agent(ChristianneBesse)
wasAttributedTo(isbn:2020669595, ChristianneBesse, [prov:role='translator'])
activity(translation)
wasAssociatedWith(translation, ChristianneBesse, -)
wasGeneratedBy(isbn:2020669595, translation, -)
used(translation, isbn:0002261022, -)
The provenance of two books:
• “The Glass Palace”, written by Amitav Ghosh
• “Le Palais des Miroirs”, the French translation, done by Christianne Besse, of the book of Amitav Ghosh
8
Why Provenance Is Needed?
• Open Information Systems: Origin of data
• News and Media Sources and references
of news, blogs, etc.
• Science How the results were
obtained Can they be reproduced
• Manufacturing &
business Traceability of faults (e.g.
suppliers, designers, contractors)
Certificates of origin
• Health Traceability of medicine,
lab test results, organs
• Policy and Law Compliance Privacy protection
16
Provenance Graphs as Networks
Benefits from network analytics• Network structure• Extrapolations• Sampling• Similarity• Compression
Network metrics• Number of nodes• Number of edges• Graph diameter• Maximum finite distances
(between each pair of node types – entities, activities, agents)
• Node degree distribution• Densification exponent• (and more)
17
Data Quality Assessment
Classifying the trustworthiness of CollabMap data:
• Calculate network metrics of CollabMap provenance graphs
• Supervised learning from user votes (on building, route, route sets)
• Classification accuracy: over 95%
Strong correlation between network metrics of
provenance graph and data quality in CollabMap
Potential Applications
• Auditing confirm that provenance was properly recorded
• On-the-fly classifications• Detection of abnormality• Community detection• Identifying key actors and key links• Inferring missing links
edge directions edge types
18
19
Conclusions
• Provenance analytics is new and unexplored
• Promising novel applications
• We want to test the approach on new data and applications