Provenance Analytics at AAAI Human Computation Conference 2013

Interpretation of Crowdsourced Activities using Provenance

Network Analysis

T. Dong Huynh1, Mark Ebden2, Mateo Venanzi1,Sarvapali Ramchurn1, Stephen Roberts2, and Luc Moreau1

1 University of Southampton 2 University of OxfordCorresponding author: [email protected]

Talk Outline1. CollabMap: a crowdsourcing mapping

application2. Provenance3. Provenance network analysis4. Data quality classification5. Conclusions and future work

• 38,000 micro-tasks• 160 contributors• 5,151 buildings

Provenance Overview

7

Provenance in CollabMap

8

Example Provenance Graph

10

• Network analysis• ...

Network metrics:– Number of nodes– Number of edges– Graph diameter– Maximum finite

distances (between each pair of node types – entities, activities, agents)

Provenance graphs as networks

11

Dependency Graphs

12

Methodology Buildings, Routes, and Route Sets generated in CollabMap

were given trust labels trusted or uncertain as calculated from their user votes.

The data were randomly divided into training sets and test sets.

Decision tree classifiers were trained on test sets to predict the trust labels of the buildings, routes, and route sets, taking their dependency graphs’ network metrics as input features.

The sensitivity, specificity, and accuracy of the classifiers were assessed on the relevant test sets.

13

Classification PerformanceSensitivity Specificity Accuracy

Building 96.61% 99.17% 97.00%Route 94.78% 97.32% 95.28%

Route Set 97.23% 97.78% 97.77%

Sensitivity Specificity AccuracyBuilding 93.55% 53.37% 86.86%

Route 99.56% 94.61% 97.39%Route Set 100% 96.27% 98.26%

Local Deployment

AMT Deployment

14

Classification Decision Tree (for Routes)

15

Relevance values of various features in classifying buildings.

Crowd Behavioural ChangeNetwork Metric Local AMT

Number of nodes 0.087 0.474Number of edges 0.900 0.505

Graph diameter 0.012 0.021MFD (entity entity) 0.001 -

16

ConclusionsA novel methodology for analysing properties of crowd-generated data using provenance graphs:– generic and application-independent– allowing the application of machine learning

techniques– a principled approach to interpret crowdsourced

data.

17

Future Work• Validating the method in new application domains.• Investigating ways to detect crowd behavioural changes.• Extending the analytics to include provenance network

metrics that characterise the evolution of provenance graphs.

• Incorporating generic node attributes (e.g. the values of votes in verification micro-tasks).

• Applying graph analytics methods to identify key agents (e.g. users), activities, data in a task or a deployment.

18

• Contact:Dong [email protected]

• CollabMap: www.collabmap.org• The Orchid Project: www.orchid.ac.uk

mailto:[email protected]

http://www.collabmap.org/

http://www.orchid.ac.uk/

19

Local Community• People living in the area• Statistics:

– Users: 160– Buildings: 5,151– Routes: 4,914– Route sets: 5,142– Votes: 44,158– Micro-tasks: 38,448

Amazon Mechanical Turk• World-wide• Statistics:

– Users: 424– Buildings: 800– Routes: 994– Route sets: 1,000– Votes: 9,581– Micro-tasks: 9,039

Deployments

20

Recording Provenance

The CollabMap workflow

Five micro-tasks:A. Draw a buildingB. Verify a buildingC. Draw an evacuation routeD. Verify route(s)E. Verify completion

Network Metric Building Route Route Set

Number of nodes 0.474 0.893 0.230

Number of edges 0.505 0.020 0.770

Graph diameter 0.021 0.046 –

MFD (entity entity) – 0.006 –

MFD (entity activity) – 0.035 –

MFD (activity activity) – – –

Network Metric Building Route Route Set

Number of nodes 0.087 0.704 0.502

Number of edges 0.900 0.193 0.190

Graph diameter 0.012 0.025 0.308

MFD (entity entity) 0.001 0.067 –

MFD (entity activity) – 0.006 –

MFD (activity activity) – 0.005 –

Local Deployment

AMT Deployment

23

Local classifiers’ performance on AMT data

Sensitivity Specificity AccuracyBuilding 72.43% 50.19% 77.23%

Route 99.78% 93.08% 96.48%Route Set 100% 90.53% 95.05%

Provenance Analytics at AAAI Human Computation Conference 2013

Technology

Transcript of Provenance Analytics at AAAI Human Computation Conference 2013