Abstract

1
Using Context to Improve Robot Vision Catie Meador, Swarthmore College Faculty Advisor: Dr. Mohan Sridharan Texas Tech University 2012 NSF Research Experiences for Undergraduates Site Project Abstract Vision is a rich source of information for robots deployed in the real world. Although considerable research has been performed in the area of robot vision, existing algorithms are still inadequate for accurate scene understanding and object recognition in the real world. Robots frequently find it difficult to recognize objects and successfully complete the assigned tasks in challenging scenarios with a significant amount of clutter. Improving the ability of robots to fully utilize the information encoded in visual inputs is hence crucial for the widespread deployment of robots. Contextual cues are very important for object recognition in humans. Recent research in computer vision and robot vision has hence focused on using context to improve object recognition on robots. Contextual cues can enable robots to use the known information about some objects in the domain to locate other related objects more effectively. This research project describes the context of objects in images using color histograms and local image gradient (SIFT) features of the neighboring image segments. Once the robot has learned the typical context of desired objects, these objects can be recognized effectively in test images by comparing the context of candidate image regions with the learned context, using the nearest neighbor algorithm. This approach is evaluated on a set of images captured by a camera mounted on a mobile robot. Context “Any information that might be relevant to object detection, categorization, and classification tasks, but not directly due to the physical appearance of the object, as perceived by the image acquisition system” (Marques et al. 2011). Visual: objects in the background, etc. Non-visual: GPS, weather conditions, etc. Context is important in human vision. Many objects are partially defined by their function, for example, doorknobs are used to open doors. Some objects occur in pairs, such as cars typically appearing on roads. Objects tend to occur in specific scenes, for example, fish tend to be in water. Conclusion Results: Hypothesized that context information would help recognize objects correctly. Context information is represented by comparing color histograms of regions neighboring the Region of Interest (ROI). Context information is used to determine the best fit among learned models. Experiment 1: Learned models of the target object (box) compared against test images inside the lab and in the hallway. 89% of test images matched the appropriate model, which indicates that this method of context matching is useful. Experiment 2: Learned models of various target objects (human, robot, box, book, car, and airplane), compared against test images in four settings: a landscape, a robot soccer field, buildings, and a desk 75% of test images matched the appropriate model, again indicating that this method is effective Future Work: Train and test on additional objects in cluttered backgrounds. Combine SIFT and color histograms in nearest neighbor calculation. Improve the segmentation algorithm to be more consistent. Only consider the largest neighboring segments in comparisons, or weight calculations by the size of each segment and number of segments. References: Felzenszwalb, P. & Huttenlocher, D. Efficient Graph-Based Image Segmentation. IJCV 59:2 (2004). Galleguios C. & Belongie, S. Context based object categorization: A critical survey. CVIU 114 (2010). Hoiem, D. (2004). Putting Context into Vision [PowerPoint Slides]. Retrieved from: www.cs.uiuc. edu/~dhoiem/presentations/putting_context _into_vision_final.ppt Li, X. & Sridharan. M. Vision-based Autonomous Learning of Object Models on a Mobile Robot. ARMS 12 (2012). Marques, O., Barenholtz, E., Charvillat, V. Context modeling in computer vision: techniques, implications, and applications. Multimedia Tools Appl 51 (2011). Torralba, A., Murphy, K., Freeman, W., Rubin, M. Context-based vision system for place and object recognition. ICCV’03 (2003). This research is supported by NSF Grant No. CNS 1005212. Opinions, findings, conclusions, or recommendations expressed in this presentation are those of the authors and do not necessarily reflect the views of NSF. Image 1: SIFT vectors, represented by red dots, for two images, where the box is the target object. SIFT feature vectors are matched across the two images, with green lines representing vectors that are inside the target object (box) and blue lines represent vectors in the background. Image 2: An RGB image of a car, and the corresponding segmented image, each showing the location of the Region of Interest (ROI), outlined in red Acknowledgements: I would like to thank Dr. Mohan Sridharan and Xiang Li for their help and guidance this summer. In addition, I am grateful to Dr. Susan Urban, Texas Tech University, and the National Science Foundation for providing me with this research opportunity this summer. Methods Different algorithms were used to analyze and compare the context of images taken from the robot’s camera. SIFT: Scale-Invariant Feature Transform (SIFT) vectors characterize objects of interest by local image gradient features. Color Histograms: Models the distribution of pixels in a specific image region in the 3D color space. Objects with similar color histograms are assumed to be similar. Nearest Neighbor Algorithm Compares one vector to a set of vectors to find the vector that is most similar. Used to compare both SIFT vectors and color histograms. Used to find which contextual segments are most similar between images. Image 3: An HSV image of a box in the lab setting, and the corresponding segmented image, each showing the location of the Region of Interest (ROI), outlined in black. The HSV color space was used in this experiment because it is typically more robust than RGB.

description

Using Context to Improve Robot Vision Catie Meador, Swarthmore College Faculty Advisor: Dr. Mohan Sridharan Texas Tech University 2012 NSF Research Experiences for Undergraduates Site Project. Conclusion Results : Hypothesized that context information would help recognize objects correctly. - PowerPoint PPT Presentation

Transcript of Abstract

Page 1: Abstract

Using Context to Improve Robot VisionCatie Meador, Swarthmore College

Faculty Advisor: Dr. Mohan SridharanTexas Tech University 2012 NSF Research Experiences for Undergraduates Site Project

Abstract Vision is a rich source of information for robots deployed in the real world. Although considerable research has been performed in the area of robot vision, existing algorithms are still inadequate for accurate scene understanding and object recognition in the real world. Robots frequently find it difficult to recognize objects and successfully complete the assigned tasks in challenging scenarios with a significant amount of clutter. Improving the ability of robots to fully utilize the information encoded in visual inputs is hence crucial for the widespread deployment of robots. Contextual cues are very important for object recognition in humans. Recent research in computer vision and robot vision has hence focused on using context to improve object recognition on robots. Contextual cues can enable robots to use the known information about some objects in the domain to locate other related objects more effectively. This research project describes the context of objects in images using color histograms and local image gradient (SIFT) features of the neighboring image segments. Once the robot has learned the typical context of desired objects, these objects can be recognized effectively in test images by comparing the context of candidate image regions with the learned context, using the nearest neighbor algorithm. This approach is evaluated on a set of images captured by a camera mounted on a mobile robot.

Context• “Any information that might be relevant to object detection,

categorization, and classification tasks, but not directly due to the physical appearance of the object, as perceived by the image acquisition system” (Marques et al. 2011).

• Visual: objects in the background, etc.

• Non-visual: GPS, weather conditions, etc.

• Context is important in human vision.

• Many objects are partially defined by their function, for example, doorknobs are used to open doors.

• Some objects occur in pairs, such as cars typically appearing on roads.

• Objects tend to occur in specific scenes, for example, fish tend to be in water.

ConclusionResults:• Hypothesized that context information would help recognize objects

correctly.

• Context information is represented by comparing color histograms of regions neighboring the Region of Interest (ROI).

• Context information is used to determine the best fit among learned models.

• Experiment 1: Learned models of the target object (box) compared against test images inside the lab and in the hallway.

• 89% of test images matched the appropriate model, which indicates that this method of context matching is useful.

• Experiment 2: Learned models of various target objects (human, robot, box, book, car, and airplane), compared against test images in four settings: a landscape, a robot soccer field, buildings, and a desk

• 75% of test images matched the appropriate model, again indicating that this method is effective

Future Work:• Train and test on additional objects in cluttered backgrounds.

• Combine SIFT and color histograms in nearest neighbor calculation.

• Improve the segmentation algorithm to be more consistent.

• Only consider the largest neighboring segments in comparisons, or weight calculations by the size of each segment and number of segments.

References:Felzenszwalb, P. & Huttenlocher, D. Efficient Graph-Based Image Segmentation. IJCV 59:2 (2004).Galleguios C. & Belongie, S. Context based object categorization: A critical survey. CVIU 114 (2010).Hoiem, D. (2004). Putting Context into Vision [PowerPoint Slides]. Retrieved from: www.cs.uiuc. edu/~dhoiem/presentations/putting_context _into_vision_final.pptLi, X. & Sridharan. M. Vision-based Autonomous Learning of Object Models on a Mobile Robot. ARMS 12 (2012).Marques, O., Barenholtz, E., Charvillat, V. Context modeling in computer vision: techniques, implications, and applications. Multimedia Tools Appl 51 (2011).Torralba, A., Murphy, K., Freeman, W., Rubin, M. Context-based vision system for place and object recognition. ICCV’03 (2003).

This research is supported by NSF Grant No. CNS 1005212. Opinions,

findings, conclusions, or recommendations expressed in this presentation are

those of the authors and do not necessarily reflect the views of NSF.

Image 1: SIFT vectors, represented by red dots, for two images, where the box is the target object. SIFT feature vectors are matched across the two images, with green lines representing vectors that are inside the target object (box) and blue lines represent vectors in the background.

Image 2: An RGB image of a car, and the corresponding segmented image, each showing the location of the Region of Interest (ROI), outlined in red

Acknowledgements:I would like to thank Dr. Mohan Sridharan and Xiang Li for their help and guidance this summer. In addition, I am grateful to Dr. Susan Urban, Texas Tech University, and the National Science Foundation for providing me with this research opportunity this summer.

Methods• Different algorithms were used to analyze and compare the context of

images taken from the robot’s camera.

SIFT:• Scale-Invariant Feature Transform (SIFT) vectors characterize objects

of interest by local image gradient features.

Color Histograms:• Models the distribution of pixels in a specific image region in the 3D

color space.

• Objects with similar color histograms are assumed to be similar.

Nearest Neighbor Algorithm• Compares one vector to a set of vectors to find the vector that is most

similar.

• Used to compare both SIFT vectors and color histograms.

• Used to find which contextual segments are most similar between images. Image 3: An HSV image of a box in the lab setting, and the corresponding segmented image, each showing the location of the Region of

Interest (ROI), outlined in black. The HSV color space was used in this experiment because it is typically more robust than RGB.