Embedded Librarians: Incorporating the "New" Library into Online Courses
Taking’Computer’Vision’...
Transcript of Taking’Computer’Vision’...
![Page 1: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/1.jpg)
Taking Computer Vision Into The Wild
Neeraj Kumar
October 4, 2011 CSE 590V – Fall 2011
University of Washington
![Page 2: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/2.jpg)
A Joke
Q. What is computer vision?
A. If it doesn’t work (in the wild), it’s computer vision.
(I’m only half-‐joking)
![Page 3: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/3.jpg)
Instant Object RecogniWon Paper*
![Page 4: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/4.jpg)
Instant Object RecogniWon Paper*
1. Design new algorithm
2. Pick dataset(s) to evaluate on 3. Repeat unWl conference deadline:
a. Train classifiers b. Evaluate on test set c. Tune parameters and tweak algorithm
4. Brag about results with ROC curves
-‐ Fixed set of training examples
-‐ Fixed set of classes/objects
*Just add grad students
-‐ Training examples only have one object, oaen in center of image -‐ Fixed test set, usually from same overall dataset as training
-‐ MTurk filtering, pruning responses, long training Wmes, …
-‐ How does it do on real data? New classes?
![Page 5: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/5.jpg)
Instant Object RecogniWon Paper
1. User proposes new object class 2. System gathers images from flickr
3. Repeat unWl convergence: a. Choose windows to label b. Get labels from MTurk c. Improve classifier (detector)
4. Also evaluate on Pascal VOC
[S. Vijayanarasimhan & K. Grauman – Large-‐Scale Live AcWve Learning:
Training Object Detectors with Crawled Data and Crowds (CVPR 2011)]
-‐ Which windows to pick?
-‐ Which images to label?
-‐ What representaWon?
-‐ How does it compare to state of the art?
![Page 6: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/6.jpg)
Object RepresentaWon
Root from here
Deformable Parts: Root + Parts + Context
P=6 parts, from bootstrap set
C=3 context windows, excluding object candidate, defined to the lea, right, above
![Page 7: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/7.jpg)
Features: Sparse Max Pooling
Bag of Words Sparse Max Pooling Base features SIFT SIFT
Build vocabulary tree ✔ ✔
QuanWze features Nearest neighbor, hard decision
Weighted nearest neighbors, sparse coded
Aggregate features SpaWal pyramid Max pooling
[Y.-‐L. Boureau, F. Bach, Y. LeCun, J. Ponce – Learning Mid-‐level Features for RecogniWon (CVPR 2010]
[J. Yang, K. Yu, Y. Gong, T. Huang – Linear SpaWal Pyramid Matching Sparse Coding for Image ClassificaWon (CVPR 2009)]
![Page 8: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/8.jpg)
How to Generate Root Windows?
100,000s of possible locaWons, aspect raWos, sizes
1000s of images X
= too many possibiliWes!
![Page 9: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/9.jpg)
Jumping Windows Training Image Novel Query Image
• Build lookup table of how frequently given feature in a grid cell predicts bounding box
• Use lookup table to vote for candidate windows in query image a la generalized Hough transform
![Page 10: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/10.jpg)
Pick Examples via Hyperplane Hashing
[P. Jain, S. Vijayanarasimhan & K. Grauman – Hashing Hyperplane Queries to
Near Points with ApplicaWons to Large-‐Scale AcWve Learning (NIPS 2010)]
• Want to label “hard” examples near the hyperplane boundary
• But hyperplane keeps changing, so have to recompute distances…
• Instead, hash all unlabeled examples into table
• At run-‐Wme, hash current hyperplane to get index into table, to pick examples close to it
![Page 11: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/11.jpg)
Comparison on Pascal VOC
• Comparable to state-‐of-‐the-‐art, beuer on few classes
• Many fewer annotaWons required! • Training Wme is 15mins vs 7 hours (LSVM) vs 1 week (SP+MKL)
![Page 12: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/12.jpg)
Online Live Learning for Pascal
• Comparable to state-‐of-‐the-‐art, beuer on fewer classes
• But using flickr data vs. Pascal data, and automaWcally
![Page 13: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/13.jpg)
Sample Results Co
rrect
Incorrect
![Page 14: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/14.jpg)
Lessons Learned • It is possible to leave the sandbox • And sWll do well on sandbox evaluaWons
• Sparse max pooling with a part model works well
• Linear SVMs can be compeWWve with these features
• Jumping windows is MUCH faster than sliding
• Picking examples to get labeled is a big win
• Linear SVMs also allow for fast hyperplane hashing
![Page 15: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/15.jpg)
LimitaWons
“Hell is other people”
Jean-‐Paul Sartre
users
With apologies to
![Page 16: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/16.jpg)
![Page 17: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/17.jpg)
It doesn’t work well enough
Solving Real Problems for Users
Users want to do stuff
Users express their displeasure gracefully *With apologies to John Gabriel
![Page 18: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/18.jpg)
…And Never The Twain Shall Meet?
Pascal VOC Results from Previous Paper
0
10
20
30
40
50
60
70
80
90
100
bicyc. bird boul car chair dinin. horse person poue. sofa tvmon. Mean
Ours� BoF SP� LLC SP� LSVM+HOG SP+MKL
Current Best of Vision Algorithms
User ExpectaWon
?
![Page 19: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/19.jpg)
SegmentaWon
DetecWon Shape EsWmaWon
Stereo
Tracking Geometry
Simplify Problem!
Unsolved Vision Problems
![Page 20: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/20.jpg)
Columbia University
University of Maryland
Smithsonian InsWtuWon
![Page 21: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/21.jpg)
![Page 22: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/22.jpg)
Easier SegmentaWon for Leafsnap
![Page 23: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/23.jpg)
Plants vs Birds
2d 3d
Doesn’t move Moves
Okay to pluck from tree Not okay to pluck from tree
Mostly single color Many colors
Very few parts Many parts
Adequately described by boundary Not well described by boundary
RelaWvely easy to segment Hard to segment
![Page 24: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/24.jpg)
Human-‐Computer CooperaWon
[S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder, P. Perona, S. Belongie – Visual RecogniWon with Humans in the Loop (ECCV 2010)]
What color is it?
Red!
Where’s the beak?
Top-‐right!
Where’s the tail?
Bouom-‐lea!
Describe its beak
Uh, it’s pointy?
Where is it?
Okay.
Bouom-‐lea!
<Shape descriptor>
![Page 25: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/25.jpg)
20 QuesWons
hup://20q.net/
![Page 26: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/26.jpg)
InformaWon Gain for 20Q
Pick most informaWve quesWon to ask next
Expected informaWon gain of class c, given image & previous responses
Probability of ge|ng response ui, given image & previous responses
Entropy of class c, given image and possible new response ui Entropy of class c right now
![Page 27: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/27.jpg)
Answers make distribuWon peakier
![Page 28: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/28.jpg)
IncorporaWng Computer Vision
Probability of class c, given image and any set of responses
Bayes’ rule
Assume variaWons in user responses are NOT image-‐dependent
ProbabiliWes affect entropies!
![Page 29: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/29.jpg)
IncorporaWng Computer Vision…
…leads to different quesWons
![Page 30: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/30.jpg)
Ask for User Confidences
![Page 31: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/31.jpg)
Modeling User Responses is EffecWve!
![Page 32: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/32.jpg)
Birds-‐200 Dataset
hup://www.vision.caltech.edu/visipedia/CUB-‐200.html
![Page 33: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/33.jpg)
Results
![Page 34: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/34.jpg)
Results
With fewer quesWons, CV does beuer With more quesWons, humans do beuer
![Page 35: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/35.jpg)
Lessons Learned • Computer vision is not (yet) good enough for users
• But users can meet vision halfway
• Minimizing user effort is key!
• Users are not to be trusted (fully) • Adding vision improves recogniWon
• For fine-‐scale categorizaWon, auributes do beuer than 1-‐vs-‐all classifiers if there are enough of them
Classifier 200
(1-‐vs-‐all) 288 aur. 100 aur. 50 aur. 20 aur. 10 aur.
Avg # QuesWons
6.43 6.72 7.01 7.67 8.81 9.52
![Page 36: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/36.jpg)
LimitaWons • Real system sWll requires much human effort
• Only birds • CollecWng and labeling data • Crowdsourcing? • Experts?
• Building usable system
• Minimizing
![Page 37: Taking’Computer’Vision’ Into’The’Wild’courses.cs.washington.edu/courses/...into-the-wild.pdf · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011](https://reader034.fdocuments.us/reader034/viewer/2022052013/602a02e1f6dfea64583a77bf/html5/thumbnails/37.jpg)
Visipedia
hup://www.vision.caltech.edu/visipedia/