Jacobs socs-2013

1
Computer Vision with Humans in the Loop David Jacobs (University of Maryland, College Park) Introduction I BIOTRACKER: I Combines computer vision, state-of-the-art mobile phone technologies, and internet I Encourage science enthusiasts to gather biological data I Help scientists to identify new species I Several projects under the large umbrella called BIOTRACKER I Clustering Images with Human in the Loop I Subclustering: summarizing large image databases I Odd Leaf Out: Computer game to identify labeling errors I And many others! Active Image Clustering (Biswas and Jacobs) Goal: Improve clustering performance, minimize total human effort I Cluster images with pairwise constraints (must-link and can’t-link) from humans I Main Contribution: Find the best image pair out of O (N 2 ) possible image pairs I Look at the effect of each image pair on the overall clustering I Choose the pair for which the expected change in clustering is maximum Experimental Results I Clustering performance is evaluated using Relative Jaccard’s Coefficient w.r.t ground truth I We use two different domains (leaves and faces): I leaf dataset (subset of the database collected for Leafsnap) I face dataset (subset of Pubfig dataset) (a) Leaf - 1042 (b) Face - 500 Active Subclustering (Biswas and Jacobs) ACTIVE SUBCLUSTERING DIFFERENT FINAL SUBCLUSTERING OUTPUT PASSIVE SUBCLUSTERING I Clustering large datasets is hard; even with human in the loop I Cluster only a subset of the data; useful in many applications Odd Leaf Out (Hansen et al.) I Odd Leaf Out is an Online Game. I The game helps in refining Large Image Databases for Computer Vision Research. I Fun for players but useful information for vision researchers and biological enthusiasts. I Research Questions: I How do we build a game that is interesting, simple and useful? I How can we motivate users to continue to play when we are dealing with some imperfect data that will sometimes provide two “correct” answers? I How do we choose the game elements (in Odd Leaf Out set of six images)? I How can data provided by novice users be employed to enhance the work of experts? Game Design Selection of Image Sets: We choose five images from one species and one from a different one. We can create a set using each leaf in our database as a seed leaf (say this is L i 1 and is in species S). The other five leaves are chosen in the following way: Seed Leaf Least Similar leaf from seed leaf in S (L i 2 ) A leaf from a different species other than S (L j ); set difficulty depends on dis- tance between L j and L i 1 Distinct randomly cho- sen leaf from S (L i 3 ) Distinct randomly cho- sen leaf from S (L i 4 ) Distinct randomly chosen leaf from S (L i 5 ) Different versions of the game: We have four versions of the game: Three Lives version, Contestation, Multiple guesses, skip Database: For all our experiments, we use the leaf dataset collected as part of a project called Leafsnap. This is an iphone application developed by researchers in University of Maryland, Columbia University and Smithsonian Institution. The iPhone application is now available in Apple store !! What Do We Get From This Game? I Identify errors in the dataset I Discover if color helps humans identify leaves (caution: Leaf color changes over the year) I Feedback on how enjoyable or difficult the game is. Based on that we will improve our game. The game interface: Example Cases We give two sample scenarios which can happen if labels are wrong, however in reality we see many other scenarios I When the Odd leaf is wrongly labeled it can be same as the other five leaves. Players pick all the leaves with equal probability. I When one of the non-Odd leaves is wrongly labeled, there are two different looking leaves. Players pick the Odd leaf and the wrongly labeled leaves with equal probabilities. About Biotracker I People in Biotracker: David Jacobs, Jennifer Preece, Derek Hansen, Dana Rotman, Anne Bowser, Carol Boston, Yurong He, Arijit Biswas, Jen Hammond, Cynthia Parr and many others! I Publications from Biotracker: I Arijit Biswas, David Jacobs. Active Image Clustering: Seeking Constraints from Humans to Complement Algorithms. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. I Derek Hansen, David Jacobs, Darcy Lewis, Arijit Biswas, Jennifer Preece, Dana Rotman, and Eric Stevens. 2011. Odd Leaf Out: Improving visual recognition with games. In Proceedings of the IEEE International Conference on Social Computing. Boston, MA. I Ahn J., Hammock J., Parr C., Preece J., Shneidernam B., Schulz K., Hansen D., Rotman D., He Y. Visually Exploring Social Participation in Encyclopedia of Life. ASE International Conference on Social Informatics 2012. I Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C.S., Lewis, D., Jacobs, D. Dynamic changes in motivation in collaborative ecological citizen science projects. CSCW 2012. I Rotman, D., Procita, K., Hansen, D., Sims Parr, C. and Preece, J. (2012), Supporting content curation communities, The Case of the Encyclopedia of Life J. Am. Soc. Inf. Sci.. I Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David Jacobs, W. John Kress, Ida Lopez, Joao V. B. Soares. Leafsnap: A Computer Vision System for Automatic Plant Species Identification. European Conference in Computer Vision (ECCV), 2012. Conclusion I Improved image clustering with humans in the loop I Clustering subset of a dataset I Finding Labeling errors in large image databases I Many other works are going on! Acknowledgement: This work was supported by NSF grant #0968546. University of Maryland, College Park email: [email protected] WWW: http://biotrackers.net/

Transcript of Jacobs socs-2013

Page 1: Jacobs socs-2013

Computer Vision with Humans in the LoopDavid Jacobs (University of Maryland, College Park)

Introduction

I BIOTRACKER:I Combines computer vision, state-of-the-art mobile phone technologies, and

internetI Encourage science enthusiasts to gather biological dataI Help scientists to identify new species

I Several projects under the large umbrella called BIOTRACKERI Clustering Images with Human in the LoopI Subclustering: summarizing large image databasesI Odd Leaf Out: Computer game to identify labeling errorsI And many others!

Active Image Clustering (Biswas and Jacobs)Goal: Improve clustering performance, minimize total human effort

I Cluster images with pairwise constraints (must-link and can’t-link) from humansI Main Contribution: Find the best image pair out of O(N2) possible image pairsI Look at the effect of each image pair on the overall clusteringI Choose the pair for which the expected change in clustering is maximum

Experimental ResultsI Clustering performance is evaluated using Relative Jaccard’s Coefficient w.r.t ground

truthI We use two different domains (leaves and faces):

I leaf dataset (subset of the database collected for Leafsnap)I face dataset (subset of Pubfig dataset)

(a) Leaf − 1042 (b) Face − 500

Active Subclustering (Biswas and Jacobs)

ACTIVESUBCLUSTERING

DIFFERENT

FINAL SUBCLUSTERING OUTPUT

PASSIVE SUBCLUSTERING

I Clustering large datasets is hard; even with human in the loopI Cluster only a subset of the data; useful in many applications

Odd Leaf Out (Hansen et al.)I Odd Leaf Out is an Online Game.I The game helps in refining Large Image Databases for Computer Vision Research.I Fun for players but useful information for vision researchers and biological enthusiasts.I Research Questions:

I How do we build a game that is interesting, simple and useful?I How can we motivate users to continue to play when we are dealing with some

imperfect data that will sometimes provide two “correct” answers?I How do we choose the game elements (in Odd Leaf Out set of six images)?I How can data provided by novice users be employed to enhance the work of experts?

Game DesignSelection of Image Sets: We choose five images from one species and one from adifferent one. We can create a set using each leaf in our database as a seed leaf (saythis is Li1 and is in species S). The other five leaves are chosen in the following way:

Seed Leaf Least Similar leaffrom seed leaf in S(Li2)

A leaf from a different species otherthan S (Lj); set difficulty depends on dis-tance between Lj and Li1

Distinct randomly cho-sen leaf from S (Li3)

Distinct randomly cho-sen leaf from S (Li4)

Distinct randomly chosen leaf from S (Li5)

Different versions of the game: We have four versions of the game: Three Livesversion, Contestation, Multiple guesses, skipDatabase: For all our experiments, we use the leaf dataset collected as part of a projectcalled Leafsnap. This is an iphone application developed by researchers in University ofMaryland, Columbia University and Smithsonian Institution. The iPhone application is nowavailable in Apple store !!

What Do We Get From This Game?I Identify errors in the datasetI Discover if color helps humans identify leaves (caution: Leaf color changes over the

year)I Feedback on how enjoyable or difficult the game is. Based on that we will improve our

game.The game interface:

Example CasesWe give two sample scenarios which can happen if labels are wrong, however inreality we see many other scenariosI When the Odd leaf is wrongly labeled it can be same as the other five leaves.

Players pick all the leaves with equal probability.

I When one of the non-Odd leaves is wrongly labeled, there are two different lookingleaves.Players pick the Odd leaf and the wrongly labeled leaves with equal probabilities.

About BiotrackerI People in Biotracker: David Jacobs, Jennifer Preece, Derek Hansen, Dana Rotman,

Anne Bowser, Carol Boston, Yurong He, Arijit Biswas, Jen Hammond, Cynthia Parr andmany others!

I Publications from Biotracker:I Arijit Biswas, David Jacobs. Active Image Clustering: Seeking Constraints from

Humans to Complement Algorithms. IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2012.

I Derek Hansen, David Jacobs, Darcy Lewis, Arijit Biswas, Jennifer Preece, DanaRotman, and Eric Stevens. 2011. Odd Leaf Out: Improving visual recognition withgames. In Proceedings of the IEEE International Conference on Social Computing.Boston, MA.

I Ahn J., Hammock J., Parr C., Preece J., Shneidernam B., Schulz K., Hansen D.,Rotman D., He Y. Visually Exploring Social Participation in Encyclopedia of Life. ASEInternational Conference on Social Informatics 2012.

I Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C.S., Lewis, D.,Jacobs, D. Dynamic changes in motivation in collaborative ecological citizen scienceprojects. CSCW 2012.

I Rotman, D., Procita, K., Hansen, D., Sims Parr, C. and Preece, J. (2012), Supportingcontent curation communities, The Case of the Encyclopedia of Life J. Am. Soc. Inf.Sci..

I Neeraj Kumar, Peter N. Belhumeur, Arijit Biswas, David Jacobs, W. John Kress, IdaLopez, Joao V. B. Soares. Leafsnap: A Computer Vision System for Automatic PlantSpecies Identification. European Conference in Computer Vision (ECCV), 2012.

ConclusionI Improved image clustering with humans in the loopI Clustering subset of a datasetI Finding Labeling errors in large image databasesI Many other works are going on!

Acknowledgement: This work was supported by NSF grant #0968546.

University of Maryland, College Park email: [email protected] WWW: http://biotrackers.net/