Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois...
-
Upload
poppy-warren -
Category
Documents
-
view
222 -
download
0
Transcript of Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois...
![Page 1: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/1.jpg)
Computer Vision: Summary and Anti-summary
Computer VisionCS 543 / ECE 549
University of Illinois
Derek Hoiem
05/04/2010
![Page 2: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/2.jpg)
Today’s class• Administrative stuff
– HW 4 due– Posters next Tues + reports due– Feedback and Evaluation (end of class)
• Review of important concepts
• Anti-summary: important concepts that were not covered
• Wide open problems
![Page 3: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/3.jpg)
Review• Geometry• Matching• Grouping• Categorization
![Page 4: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/4.jpg)
Geometry (on board)• Projection matrix P = K [R t] relates 2d point x
and 3d point X in homogeneous coordinates: x = P X
• Parallel lines in 3D converge at the vanishing point in the image– A 3D plane has a vanishing line in the image
• In two views, points that correspond to the same 3D point are related by the fundamental matrix: x’T F x = 0
![Page 5: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/5.jpg)
Matching• Does this patch match that patch?
– In two simultaneous views? (stereo)– In two successive frames? (tracking, flow, SFM)– In two pictures of the same object? (recognition)
? ?
![Page 6: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/6.jpg)
MatchingRepresentation: be invariant/robust to expected deformations but nothing else•Change in viewpoint
– Rotation invariance: rotate and/or affine warp patch according to dominant orientations
•Change in lighting or camera gain– Average intensity invariance: oriented gradient-based matching– Contrast invariance: normalize gradients by magnitude
•Small translations– Translation robustness: histograms over small regions
But can one representation do all of this?•SIFT: local normalized histograms of
oriented gradients provides robustness to in-plane orientation, lighting, contrast, translation
•HOG: like SIFT but does not rotate to dominant orientation
![Page 7: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/7.jpg)
MatchingSearch: efficiently localize matching patches• Interest points: find repeatable, distinctive points
– Long-range matching: e.g., wide baseline stereo, panoramas, object instance recognition
– Harris: points with strong gradients in orthogonal directions (e.g., corners) are precisely repeatable in x-y
– Difference of Gaussian: points with peak response in Laplacian image pyramid are somewhat repeatable in x-y-scale
• Local search– Short range matching: e.g., tracking, optical flow– Gradient descent on patch SSD, often with image pyramid
• Windowed search– Long-range matching: e.g., recognition, stereo w/ scanline
![Page 8: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/8.jpg)
MatchingRegistration: match sets of points that satisfy deformation
constraints• Geometric transformation (e.g., affine)
– Least squares fit (SVD), if all matches can be trusted– Hough transform: each potential match votes for a range of
parameters• Works well if there are very few parameters (3-4)
– RANSAC: repeatedly sample potential matches, compute parameters, and check for inliers
• Works well if fraction of inliers is high and few parameters (4-8)• Other cases
– One-to-one correspondence (Hungarian algorithm)– Small local deformation of ordered points
B1B2
B3
A1
A2
A3
![Page 9: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/9.jpg)
Grouping• Clustering: group items (patches, pixels, lines, etc.) that have similar
appearance– Discretize continuous values; typically, represent points within cluster
by center– Improve efficiency: e.g., cluster interest points before recognition– Enable counting: histograms of interest points, color, texture
• Segmentation: group pixels into regions of coherent color, texture, motion, and/or label– Mean-shift clustering– Watershed– Graph-based segmentation: e.g., MRF and graph cuts
• EM, mixture models: probabilistically group items that are likely to be drawn from the same distribution, while estimating the distributions’ parameters
![Page 10: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/10.jpg)
CategorizationMatch objects, parts, or scenes that may vary in
appearance• Categories are typically defined by human and may be related
by function, cost, or other non-visual attributes• Naïve matching or clustering approach will usually fail
– Elements within category often do not have obvious visual similarities – Possible deformations are not easily defined
• Typically involves example-based machine learning approach
Training LabelsTraining
Images
Classifier Training
Image Features
Trained Classifier
![Page 11: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/11.jpg)
Categorization
Representation: ideally should be compact, comprehensive, direct
• Histograms of quantized interest points (SIFT, HOG), color, texture– Typical for image or region categorization– Degree of spatial invariance is controllable by using spatial
pyramids
• HOG features at specified position– Often used for finding parts or objects
![Page 12: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/12.jpg)
Object Categorization
Sliding Window Detector• May work well for rigid objects
• Combines window-based matching search with feature-based classifier
Object or Background?
![Page 13: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/13.jpg)
Object Categorization
Parts-based model• Defined by models of part appearance,
geometry or spatial layout, and search algorithm
• May work better for articulated objects
![Page 14: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/14.jpg)
Vision as part of an intelligent system
3D Scene
FeatureExtraction
Interpretation
Action
Texture Color Optical Flow
Stereo Disparity
Grouping Surfaces Bits of objects
Sense of depth
Objects Agents and goals
Shapes and properties
Open paths Words
Walk, touch, contemplate, smile, evade, read on, pick up, …
Motion patterns
![Page 15: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/15.jpg)
Anti-summary• Summary of things not covered
Term “anti-summary” from Murphy’s book “Big Book of Concepts”
![Page 16: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/16.jpg)
Context
• Biederman’s Relations among Objects in a Well-Formed Scene (1981):
– Support
– Size
– Position
– Interposition
– Likelihood of Appearance
Hock, Romanski, Galie, & Williams 1978
![Page 17: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/17.jpg)
Machine learning• Probabilistic approaches
– Logistic regression– Bayesian network (special case: tree-shaped
graph)
• Support vector machines• Energy-based approaches
– Conditional random fields (special case: lattice)– Graph cuts and belief propagation (BP-TRW)
Further reading:• Learning from Data: Concepts, Theory, and Methods by Cherkassky and Muliel (2007): I have not read this, but reviews say it is good for SVM and statistical learning)• Machine Learning by Tom Mitchell (1997): Good but somewhat outdated introduction to learning • Heckerman’s tutorial on learning with Bayesian Networks (1995)
![Page 18: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/18.jpg)
3D Object Models
• Aspect graphs (see Forsyth and Ponce)
• Model 3D spatial configuration of parts– E.g., see recent work by Silvio Savarese and others
![Page 19: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/19.jpg)
Action recognition• Common method: compute SIFT + optical flow
on image, classify• Starting to move beyond this
Tennis serve Volleyball smash
http://vision.stanford.edu/documents/YaoFei-Fei_CVPR2010b.pdf
![Page 20: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/20.jpg)
Geometry• Beyond pinhole cameras
– Special cameras, radial distortion
• Geometry from 3 or more views– Trifocal tensors
Much of this covered in Forsyth and Ponce or Hartley and Zisserman
![Page 21: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/21.jpg)
Other sensors• Infrared, LIDAR, structured light
http://bighugelabs.com/onblack.php?id=527329336
![Page 22: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/22.jpg)
Physics-based vision
• Models of atmospheric effects, reflection, partial transparency– Depth from fog– Removing rain from photos– Rendering– See Shree Nayar’s projects:
http://www.cs.columbia.edu/CAVE/projects/pbv.php
![Page 23: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/23.jpg)
Wide-open problems
Computer vision is potentially worth billions per year, but there are major challenges to overcome first.
Major $$$• Driver assistance (MobileEye received >$100M in
funding from Goldman Sachs)• Entertainment (MS Natal and others)• Robot workers
![Page 24: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/24.jpg)
Wide-open problems
• How do you represent actions and activities?– Probably involves motion, pose, objects, and goals– Compositional
![Page 25: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/25.jpg)
Wide-open problems
• How to build vision systems that manage the visual world’s complexity?– Potentially thousands of categories but it depends
how you think about it
![Page 26: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/26.jpg)
Wide-open problems
• How should we adjust vision systems to solve particular tasks?– How do we know what is important?
![Page 27: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/27.jpg)
Wide-open problems
• Can we build a “core” vision system that can easily be extended to perform new tasks or even learn on its own?– What kind of representations might allow this?– What should be built in and what should be
learned?
![Page 28: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/28.jpg)
See you next week!• Projects on Tuesday
– Pizza provided– Posters: 24” wide x 32” tall
• Online feedback forms
![Page 29: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/29.jpg)
![Page 30: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/30.jpg)
Wide-open problems• How do you represent actions and activities?
– Usually involve motion and objects– Compositional– Probably involves motion fields + pose estimation + object recognition + goal
reasoning
• How to build vision systems that manage the visual world’s complexity?– Potentially thousands of categories but it depends how you think about it
• How should we adjust vision systems to solve particular tasks?– How do we know what is important?
• Can we build a “core” vision system that can easily be extended to perform new tasks or even learn on its own?– What should be built in and what should be learned?– What kind of representations might allow this?
![Page 31: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/31.jpg)
Anti-summary outline• Context
– (well-formed scene)• Physics-based vision • Machine learning
– Probabilistic approaches• Logistic regression• Bayesian network (special case: tree-shaped graph)
– Support vector machines– Energy-based approaches
• Conditional random fields (special case: lattice)• Graph cuts and belief propagation (BP-TRW)
• Action recognition– Current methods: bag of SIFT + optical flow
• Geometry– Beyond perspective projection (non-pinhole cameras, radial distortion)– Tri-focal tensors
• Object recognition– Aspect graphs (how does an object’s appearance change as you move around it?)– 3D object models
• Other sensors: laser range finders, infrared, structured light
![Page 32: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/32.jpg)
The Vision Process• Scene two eyes extract low-level
representation: {color, edges/bars/blobs, motion, stereo} interpretation: {3D surfaces, physical properties, distance, objects of interest, reading, reasoning about agents, adjusting actions}
![Page 33: Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649e5e5503460f94b57592/html5/thumbnails/33.jpg)
Summary outline• Geometry
– Single-view (projection matrix)– Multiview (epipolar geometry)– Vanishing points + SFM + stereo
• Matching– Image filtering– Interest points– RANSAC and Hough
• Tracking– Repeated detection + simple motion model (e.g., doesn’t move
far, constant velocity)• Categorization
– HOG + histograms– Matching and modeling