Provenance Analytics at AAAI Human Computation Conference 2013

23
Interpretation of Crowdsourced Activities using Provenance Network Analysis T. Dong Huynh 1 , Mark Ebden 2 , Mateo Venanzi 1 , Sarvapali Ramchurn 1 , Stephen Roberts 2 , and Luc Moreau 1 1 University of Southampton 2 University of Oxford Corresponding author: [email protected]

description

Trung Dong Huynh presenting the paper entitled "Interpretation of Crowdsourced Activities using Provenance Network Analysis" - How analysing provenance graphs can help interpreting crowdsouced activities in CollabMap

Transcript of Provenance Analytics at AAAI Human Computation Conference 2013

Page 1: Provenance Analytics at AAAI Human Computation Conference 2013

Interpretation of Crowdsourced Activities using Provenance

Network Analysis

T. Dong Huynh1, Mark Ebden2, Mateo Venanzi1,Sarvapali Ramchurn1, Stephen Roberts2, and Luc Moreau1

1 University of Southampton 2 University of OxfordCorresponding author: [email protected]

Page 2: Provenance Analytics at AAAI Human Computation Conference 2013

Talk Outline1. CollabMap: a crowdsourcing mapping

application2. Provenance3. Provenance network analysis4. Data quality classification5. Conclusions and future work

Page 3: Provenance Analytics at AAAI Human Computation Conference 2013
Page 4: Provenance Analytics at AAAI Human Computation Conference 2013

4

Page 5: Provenance Analytics at AAAI Human Computation Conference 2013

• 38,000 micro-tasks• 160 contributors• 5,151 buildings

Page 6: Provenance Analytics at AAAI Human Computation Conference 2013

Provenance Overview

Page 7: Provenance Analytics at AAAI Human Computation Conference 2013

7

Provenance in CollabMap

Page 8: Provenance Analytics at AAAI Human Computation Conference 2013

8

Example Provenance Graph

Page 9: Provenance Analytics at AAAI Human Computation Conference 2013

9

Page 10: Provenance Analytics at AAAI Human Computation Conference 2013

10

• Network analysis• ...

Network metrics:– Number of nodes– Number of edges– Graph diameter– Maximum finite

distances (between each pair of node types – entities, activities, agents)

Provenance graphs as networks

Page 11: Provenance Analytics at AAAI Human Computation Conference 2013

11

Dependency Graphs

Page 12: Provenance Analytics at AAAI Human Computation Conference 2013

12

Methodology Buildings, Routes, and Route Sets generated in CollabMap

were given trust labels trusted or uncertain as calculated from their user votes.

The data were randomly divided into training sets and test sets.

Decision tree classifiers were trained on test sets to predict the trust labels of the buildings, routes, and route sets, taking their dependency graphs’ network metrics as input features.

The sensitivity, specificity, and accuracy of the classifiers were assessed on the relevant test sets.

Page 13: Provenance Analytics at AAAI Human Computation Conference 2013

13

Classification PerformanceSensitivity Specificity Accuracy

Building 96.61% 99.17% 97.00%Route 94.78% 97.32% 95.28%

Route Set 97.23% 97.78% 97.77%

Sensitivity Specificity AccuracyBuilding 93.55% 53.37% 86.86%

Route 99.56% 94.61% 97.39%Route Set 100% 96.27% 98.26%

Local Deployment

AMT Deployment

Page 14: Provenance Analytics at AAAI Human Computation Conference 2013

14

Classification Decision Tree (for Routes)

Page 15: Provenance Analytics at AAAI Human Computation Conference 2013

15

Relevance values of various features in classifying buildings.

Crowd Behavioural ChangeNetwork Metric Local AMT

Number of nodes 0.087 0.474Number of edges 0.900 0.505

Graph diameter 0.012 0.021MFD (entity entity) 0.001 -

Page 16: Provenance Analytics at AAAI Human Computation Conference 2013

16

ConclusionsA novel methodology for analysing properties of crowd-generated data using provenance graphs:– generic and application-independent– allowing the application of machine learning

techniques– a principled approach to interpret crowdsourced

data.

Page 17: Provenance Analytics at AAAI Human Computation Conference 2013

17

Future Work• Validating the method in new application domains.• Investigating ways to detect crowd behavioural changes.• Extending the analytics to include provenance network

metrics that characterise the evolution of provenance graphs.

• Incorporating generic node attributes (e.g. the values of votes in verification micro-tasks).

• Applying graph analytics methods to identify key agents (e.g. users), activities, data in a task or a deployment.

Page 18: Provenance Analytics at AAAI Human Computation Conference 2013

18

• Contact:Dong [email protected]

• CollabMap: www.collabmap.org• The Orchid Project: www.orchid.ac.uk

Page 19: Provenance Analytics at AAAI Human Computation Conference 2013

19

Local Community• People living in the area• Statistics:

– Users: 160– Buildings: 5,151– Routes: 4,914– Route sets: 5,142– Votes: 44,158– Micro-tasks: 38,448

Amazon Mechanical Turk• World-wide• Statistics:

– Users: 424– Buildings: 800– Routes: 994– Route sets: 1,000– Votes: 9,581– Micro-tasks: 9,039

Deployments

Page 20: Provenance Analytics at AAAI Human Computation Conference 2013

20

Recording Provenance

Page 21: Provenance Analytics at AAAI Human Computation Conference 2013

The CollabMap workflow

Five micro-tasks:A. Draw a buildingB. Verify a buildingC. Draw an evacuation routeD. Verify route(s)E. Verify completion

Page 22: Provenance Analytics at AAAI Human Computation Conference 2013

Network Metric Building Route Route Set

Number of nodes 0.474 0.893 0.230

Number of edges 0.505 0.020 0.770

Graph diameter 0.021 0.046 –

MFD (entity entity) – 0.006 –

MFD (entity activity) – 0.035 –

MFD (activity activity) – – –

Network Metric Building Route Route Set

Number of nodes 0.087 0.704 0.502

Number of edges 0.900 0.193 0.190

Graph diameter 0.012 0.025 0.308

MFD (entity entity) 0.001 0.067 –

MFD (entity activity) – 0.006 –

MFD (activity activity) – 0.005 –

Local Deployment

AMT Deployment

Page 23: Provenance Analytics at AAAI Human Computation Conference 2013

23

Local classifiers’ performance on AMT data

Sensitivity Specificity AccuracyBuilding 72.43% 50.19% 77.23%

Route 99.78% 93.08% 96.48%Route Set 100% 90.53% 95.05%