Post on 15-Apr-2017
Robot Vision
Chapter 6.
2
Introduction
Computer vision
– Endowing machines with the means to “see”
Create an image of a scene and extract features
– Very difficult problem for machines
Several different scenes can produce identical images.
Images can be noisy .
Cannot directly „invert‟ the image to reconstruct the scene.
3
Human Vision (1)
4
Human Vision (2)
5
Human Vision (3)
6
Steering an Automobile
ALVINN system [Pomerleau 1991,1993]
– Uses Artificial Neural Network
Used 30*32 TV image as input (960 input node)
5 Hidden node
30 output node
– Training regime: modified “on-the-fly”
A human driver drives the car, and his actual steering
angles are taken as correct labels for the corresponding
inputs.
Shifted and rotated images were also used for training.
– ALVINN has driven for 120 consecutive kilometers at
speeds up to 100km/h.
7
Steering an Automobile-ALVINN
8
Two stages of Robot Vision (1)
Finding out objects in the scene
– Looking for “edges” in the image Edge:a part of the image across which the image intensity or some other
property of the image changes abruptly.
– Attempting to segment the image into regions. Region:a part of the image in which the image intensity or some other
property of the image changes only gradually.
9
Two stages of Robot Vision (2)
Image processing stage
– Transform the original image into one that is more
amendable to the scene analysis stage.
– Involves various filtering operations that help reduce
noise, accentuate edges, and find regions.
Scene analysis stage
– Attempt to create an iconic or a feature-based
description of the original scene, providing the task-
specific information.
10
Two stages of Robot Vision (3)
Scene analysis stage produces task-specific
information.
– If only the disposition of the blocks is important, appropriate
iconic model can be (C B A FLOOR)
– If it is important to determine whether there is another block on
top of the block labeled C, adequate description will include the
value of a feature, CLEAR_C.
11
Averaging (1)
Original image can be represented as an m*n array of
numbers. The numbers represent the light intensities
at corresponding points in the image.
Certain irregularities in the image can be smoothed by
an averaging operation.
Averaging operation involves sliding an averaging
widow all over the image array.
12
Averaging (2)
Smoothing operation thickens broad lines and eliminates thin lines and
small details.
The averaging window is centered at each pixel, and the weighted sum
of all the pixel numbers within the averaging window is computed. This
sum then replaces the original value at that pixel.
13
Averaging (3)
Common function used for smoothing is a Gaussian of
two dimensions.
Convolving an image with a Gaussian is equivalent to
finding the solution to a diffusion equation when the
initial condition is given by the image intensity field.
14
Averaging (4)
15
Edge enhancement (1)
Edge: any boundary between parts of the image with
markedly different values of some property.
Edges are often related to important object properties.
Edges in the image occur at places where the second
derivative of the image intensity is zero.
16
Edge enhancement (2)
17
Combining Edge Enhancement with Averaging (1)
Edge enhancement alone would tend to emphasize
noise elements along with enhancing edges.
To be less sensitive to noise, both operations are
needed. (First averaging and then edge enhancing)
We can convolve the one-dimensional image with the
second derivative of a Gaussian curve to combine
both operation.
18
Combining Edge Enhancement with Averaging (2)
Laplacian is second-derivate-type operation that enhances edges of any
orientation.
Laplacian of the two-dimensional Gaussian function looks like an upside-
down hat, often called a sombrero function.
Entire averaging/edge-finding operation can be achieved by convolving
the image with the sombrero function(Called Laplacian filtering)
19
6.4.4 Finding Region
Another method for processing image
to find “regions”
Finding regions Finding outlines
20
A region of the image
A region is homogeneous.
– The difference in intensity values of pixels in the region is no
more than some
– A polynomial surface of degree k can be fitted to the intensity
values of pixels in the region with largest error less than
For no two adjacent regions is it the case that the union of all the
pixels in these two regions satisfies the homogeneity property.
Each region corresponds to a world object or a meaningful part of
one.
21
Split-and-merge method
1. The algorithm begins with just one candidate region,
the whole image.
2. Until no more splits need be made.
1. For all candidate regions that do not satisfy the
homogeneity property, are each split into four equal-
sized candidate regions.
3. Adjacent candidate regions are merged if their pixels
satisfying homogeneity property.
22
23
Regions Found by Split Merge for a Grid-World Scene (from Fig.6.12)
24
“Cleaned Up” the regions found by Split-and-merge method
Eliminating very small regions (some of which are
transitions between larger regions).
Straightening bounding lines.
Taking into account the known shapes of objects likely
to be in the scene.
25
6.4.5 Using Image Attributes Other Than Intensity
Image attributes other than the homogeneity
Visual texture
fine-grained variation of the surface reflectivity of
the objects
Ex) a field of grass, a section of carpet, foliage in
tree, the fur of animals
The reflectivity variations in objects cause
similar fine-grained structure in image intensity.
26
Methods for analyzing texture
Structural methods
– Represent regions in the image by a tessellation (花纹) of
primitive “texels” –small shapes comprising black and white
parts
Statistical methods
– Based on the idea that image texture is best described by a
probability distribution for the intensity values over regions of
the image.
– Ex) an image of a grassy field in which the blades of grass are
oriented vertically
a probability distribution that peaks for thin, vertically
oriented regions of high intensity, separated by regions of low
intensity
27
Other attributes
If we had a direct way to measure the range from the camera to
objects in the scene, we could produce a “range image” and look
for abrupt range differences.
– Range image : each pixel value represents the distance from the
corresponding point in the scene to the camera.
Motion, color
28
6.5 Scene Analysis (1)
Scene Analysis
– Extracting from the image the needed information about the scene
– Requires either additional images (for stereo vision) or general information about the kinds of scenes, since the scene-to-image transformation is many-to-one.
The required knowledge
– very general or quite specific
– explicit or implicit
29
6.5 Scene Analysis (2)
Knowledge of surface reflectivity characteristics and shading of intensity in the image
give information about the shape of smooth objects in the scene.
Iconic scene analysis
– Build a model of the scene or parts of the scene
Feature-based scene analysis
– Extracts features of the scene needed by task
– Task-oriented or purposive vision
30
6.5.1 Interpreting Lines and Curves in the Image
Interpreting the line drawing
– Association between scene properties and the
components of a line drawing
Trihedral vertex polyhedra
The scene to contain only
planar surfaces such that no
more than three surfaces
intersect in a point
31
Three kinds of edges in Trihedral vertex polyhedra (1/2)
There are only three kinds of ways in which two planes can
intersect in a scene edge.
– Occlude
One kind of edge is formed by two planes, with
one of them occluding the other.
labeled in Fig. 6.15 with arrows ().
the arrowhead pointing along the edge such
that surface doing the occluding is to the right of
the arrow.
32
Three kinds of edges in Trihedral vertex polyhedra (2/2)
– Blade
Two planes can intersect such that both planes
are visible in the scene.
Two surfaces form a convex edge.
Labeled with pluses (+).
– Ford
Edge is concave.
Labeled with minus ()
33
Labels for Lines at Junctions
34
Line-labeling scene analysis (1/2)
1. Labeling all of the junctions in the image as V, W, Y, or T junctions according to the shape of the junctions
in the image
35
Line-labeling scene analysis (2/2)
2. Assign +, , or labels to the lines in the image.
An image line that connects two junctions must have a
consistent labeling.
If there is no consistent labeling,
there must have been some error in converting the image
into a line drawing.
the scene must no have been one of trihedral polyhedra.
Constraint satisfaction problem
36
6.5.2 Model-Based Vision (1/2)
If, we knew that the scene contained a parallelepiped (in Figure
6.15), we could attempt to fit a projection of a parallelepiped to
components of an image of this scene.
A generalized
cylinders as building
blocks for model
construction
Each cylinder has
9 parameters.
37
Model-Based Vision (2/2)
An example rough scene
reconstruction of a human
figure
– Hierarchical representation
– Each cylinder in the model
can be articulated into a
set of smaller cylinders
38
6.6 Stereo Vision and Depth Information
Depth information can be obtained using stereo vision, which based on triangulation calculations using two (or more) images.
Some depth information can be extracted from a single image.
– The analysis of texture in the image can indicate that some elements in the scene are closer than are others.
– More precise depth information; If we know that a perceived object is on the floor and the camera height above the floor, we can calculate the distance to the object.
39
Depth Calculation from a Single Image
40
Stereo Vision
Stereo vision uses triangulation.
Two lenses whose centers are separated by a baseline, b.
The image point of a scene point, at distance d, created by these
lenses.
The angles of these image points from the lens centers, , .
The optical axes are parallel, the image planes are coplanar, and the
scene point is in the same plane as that formed by two parallel optical
axes.
41
Triangulation in Stereo Vision
42
The main complication
In scenes containing more than one point, it must be
established which pair of points in the two images
correspond to the same scene point.
We must be able to identify a corresponding pixel in
the other image. correspondence problem
43
Techniques for correspondence problem
Geometric analysis reveals that we need only search
along one dimension (epipolar line).
One-dimensional searches can be implemented by
cross-correlation of two image intensity profiles along
corresponding epipolar lines.
We do not have to find correspondences between
individual pairs of image points but can do so
between pairs of larger image components, such as
lines.
44
Assignments
Page 111~112
– Ex.6.2, Ex. 6.3