Provenance Analytics at AAAI Human Computation Conference 2013
-
Upload
dong-huynh -
Category
Technology
-
view
204 -
download
0
description
Transcript of Provenance Analytics at AAAI Human Computation Conference 2013
Interpretation of Crowdsourced Activities using Provenance
Network Analysis
T. Dong Huynh1, Mark Ebden2, Mateo Venanzi1,Sarvapali Ramchurn1, Stephen Roberts2, and Luc Moreau1
1 University of Southampton 2 University of OxfordCorresponding author: [email protected]
Talk Outline1. CollabMap: a crowdsourcing mapping
application2. Provenance3. Provenance network analysis4. Data quality classification5. Conclusions and future work
4
• 38,000 micro-tasks• 160 contributors• 5,151 buildings
Provenance Overview
7
Provenance in CollabMap
8
Example Provenance Graph
9
10
• Network analysis• ...
Network metrics:– Number of nodes– Number of edges– Graph diameter– Maximum finite
distances (between each pair of node types – entities, activities, agents)
Provenance graphs as networks
11
Dependency Graphs
12
Methodology Buildings, Routes, and Route Sets generated in CollabMap
were given trust labels trusted or uncertain as calculated from their user votes.
The data were randomly divided into training sets and test sets.
Decision tree classifiers were trained on test sets to predict the trust labels of the buildings, routes, and route sets, taking their dependency graphs’ network metrics as input features.
The sensitivity, specificity, and accuracy of the classifiers were assessed on the relevant test sets.
13
Classification PerformanceSensitivity Specificity Accuracy
Building 96.61% 99.17% 97.00%Route 94.78% 97.32% 95.28%
Route Set 97.23% 97.78% 97.77%
Sensitivity Specificity AccuracyBuilding 93.55% 53.37% 86.86%
Route 99.56% 94.61% 97.39%Route Set 100% 96.27% 98.26%
Local Deployment
AMT Deployment
14
Classification Decision Tree (for Routes)
15
Relevance values of various features in classifying buildings.
Crowd Behavioural ChangeNetwork Metric Local AMT
Number of nodes 0.087 0.474Number of edges 0.900 0.505
Graph diameter 0.012 0.021MFD (entity entity) 0.001 -
16
ConclusionsA novel methodology for analysing properties of crowd-generated data using provenance graphs:– generic and application-independent– allowing the application of machine learning
techniques– a principled approach to interpret crowdsourced
data.
17
Future Work• Validating the method in new application domains.• Investigating ways to detect crowd behavioural changes.• Extending the analytics to include provenance network
metrics that characterise the evolution of provenance graphs.
• Incorporating generic node attributes (e.g. the values of votes in verification micro-tasks).
• Applying graph analytics methods to identify key agents (e.g. users), activities, data in a task or a deployment.
18
• Contact:Dong [email protected]
• CollabMap: www.collabmap.org• The Orchid Project: www.orchid.ac.uk
19
Local Community• People living in the area• Statistics:
– Users: 160– Buildings: 5,151– Routes: 4,914– Route sets: 5,142– Votes: 44,158– Micro-tasks: 38,448
Amazon Mechanical Turk• World-wide• Statistics:
– Users: 424– Buildings: 800– Routes: 994– Route sets: 1,000– Votes: 9,581– Micro-tasks: 9,039
Deployments
20
Recording Provenance
The CollabMap workflow
Five micro-tasks:A. Draw a buildingB. Verify a buildingC. Draw an evacuation routeD. Verify route(s)E. Verify completion
Network Metric Building Route Route Set
Number of nodes 0.474 0.893 0.230
Number of edges 0.505 0.020 0.770
Graph diameter 0.021 0.046 –
MFD (entity entity) – 0.006 –
MFD (entity activity) – 0.035 –
MFD (activity activity) – – –
Network Metric Building Route Route Set
Number of nodes 0.087 0.704 0.502
Number of edges 0.900 0.193 0.190
Graph diameter 0.012 0.025 0.308
MFD (entity entity) 0.001 0.067 –
MFD (entity activity) – 0.006 –
MFD (activity activity) – 0.005 –
Local Deployment
AMT Deployment
23
Local classifiers’ performance on AMT data
Sensitivity Specificity AccuracyBuilding 72.43% 50.19% 77.23%
Route 99.78% 93.08% 96.48%Route Set 100% 90.53% 95.05%