Model fitting ECE 847: Digital Image Processing Stan Birchfield Clemson University.
-
Upload
blaise-maxwell -
Category
Documents
-
view
220 -
download
0
description
Transcript of Model fitting ECE 847: Digital Image Processing Stan Birchfield Clemson University.
Model fitting
ECE 847:Digital Image Processing
Stan BirchfieldClemson University
AcknowledgmentMany slides are courtesy of Frank Dellaert, Georgia Tech
Jay Summetet al.
from http://www-static.cc.gatech.edu/classes/AY2007/cs4495_fall/html/materials.html
Fitting
• Choose a parametric object/some objects to represent a set of tokens
• Most interesting case is when criterion is not local
– can’t tell whether a set of points lies on a line by looking only at each point and the next.
• Three main questions:– what object represents
this set of tokens best?– which of several
objects gets which token?
– how many objects are there?
(you could read line for object here, or circle, or ellipse or...)
Line fitting• Least squares• Minimize vertical distance• Minimize perpendicular distance
Fitting using SVD• Many 2D curves that can be
represented using linear equations (in the coeff of the curve)
• ax+by+c=0• Conics: x’Ax=0
– includes parabola, hyperbola, ellipses
Fitting and the Hough Transform
• Purports to answer all three questions
• We explain for lines• One representation: a
line is the set of points (x, y) such that:(cos X + (sin Y = r
• Different choices of , r>0 give different lines
• For any token (x, y) there is a one parameter family of lines through this point, given by(cos X + (sin Y = r
• Each point gets to vote for each line in the family; if there is a line that has lots of votes, that should be the line passing through the points
Tokens
Votes
Theta = 45º = 0.785 rad r = (1√2) / 2 = 0.707
Theta: 0 to 3.14 (rad)
r: 0 to 1.55
Brightest point = 20 votes
Mechanics of the Hough transform
• Construct an array representing , r
• For each point, render the curve (, r) into this array, adding one at each cell
• Difficulties– how big should the cells
be? (too big, and we cannot distinguish between quite different lines; too small, and noise causes lines to be missed)
• How many lines?– count the peaks in the
Hough array• Who belongs to which
line?– tag the votes
• Hardly ever satisfactory in practice, because problems with noise and cell size defeat it
Hough transform
image space line space
Notice that (,) and (+,-) are the same lineThat’s why we get two peaksSolution: Let 0 <= <
Note symmetry:-flip vertical- then slide by
Tokens
Votes
Brightest point = 6 votes
r
Noise Lowers the Peaks
Noise Increases the Votes in Spurious Accumulator Elements
Optimizations to the Hough Transform
Noise: If the orientation of tokens (pixels) is known, only accumulator elements for lines with that general orientation are voted on. (Most edge detectors give orientation information.)
Speed: The accumulator array can be coarse, then repeated in areas of interest at a finer scale.
Real World Example
Original Edge Detection Found Lines
Parameter Space
Fitting other objectsThe Hough transform can be used to fit points to any object that can be paramatized. (e.g. Circle, elipse)
Objects of arbitrary shape can be parameterized by building an R-Table. (Assumes orientation information for each token is available.)
R and beta value(s) are obtained from the R-Table, based upon omega (orientation)
Circle ExampleWith no orientation, each token (point) votes for all possible circles. With orientation, each token can vote for a smaller number of circles.
Real World Circle Examples
Crosshair indicates results of Hough transform,bounding box found via motion differencing.
Finding CoinsOriginal Edges (note noise)
Finding Coins (Continued)Penny Quarters
Finding Coins (Continued)
Coin finding sample images from: Vivik Kwatra
Note that because the quarters and penny are different sizes, a different Hough transform (with separate accumulators) was used for each circle size.
Generalized Hough transform
Conclusion• Finding lines and other parameterized objects is
an important task for computer vision. • The (generalized) Hough transform can detect
arbitrary shapes from (edge detected) tokens.• Success rate depends directly upon the noise in
the edge image.• Downsides: Can be slow, especially for objects in
arbitrary scales and orientations (extra parameters increase accumulator space exponentially).
Tensor voting• Another voting technique
Line fitting can be max.likelihood - but choice ofmodel is important
Line fitting
Who came from which line?• Assume we know how many lines
there are - but which lines are they?– easy, if we know who came from which
line• Three strategies
– Incremental line fitting– K-means– Probabilistic (later!)
Robustness• As we have seen, squared error can
be a source of bias in the presence of noise points– One fix is EM - we’ll do this shortly– Another is an M-estimator
• Square nearby, threshold far away– A third is RANSAC
• Search for good points
Too small
Too large
RANSAC• Choose a small
subset uniformly at random
• Fit to that• Anything that is close
to result is signal; all others are noise
• Refit• Do this many times
and choose the best
• Issues– How many times?
• Often enough that we are likely to have a good line
– How big a subset?• Smallest possible
– What does close mean?• Depends on the problem
– What is a good line?• One where the number
of nearby points is so big it is unlikely to be all outliers
Fitting curves other than lines
• In principle, an easy generalisation– The probability of
obtaining a point, given a curve, is given by a negative exponential of distance squared
• In practice, rather hard– It is generally
difficult to compute the distance between a point and a curve
K-means• Recall line fitting• Now suppose that you have more
than one line• But you do not know which points
belong to which line
K-Means• Choose a fixed number of
clusters (K)
• Choose cluster centers and point-cluster allocations to minimize error
• can’t do this by search, because there are too many possible allocations.
• Algorithm– fix cluster centers;
allocate points to closest cluster
– fix allocation; compute best cluster centers
• x could be any set of features for which we can compute a distance (careful about scaling)
x j i2
jelements of i'th cluster
iclusters
* From Marc Pollefeys COMP 256 2003
K-Means
* From Marc Pollefeys COMP 256 2003
K-Means
compute meanof data points
assign data pointsto clusters
Image Segmentation by K-Means
• Select a value of K• Select a feature vector for every pixel (color, texture,
position, or combination of these etc.)• Define a similarity measure between feature vectors
(Usually Euclidean Distance).• Apply K-Means Algorithm.• Apply Connected Components Algorithm (to enforce
spatial continuity).• Merge any components of size less than some
threshold to an adjacent component that is most similar to it.
* From Marc Pollefeys COMP 256 2003
Example
Image Clusters on intensity Clusters on color
Idea• Data generated from mixture of Gaussians
• Latent (hidden) variables: Correspondence between Data Items and Gaussians
Expectation-Maximization(Generalized K-Means)
• Notice:– Given the mixture model, it’s easy to calculate the
correspondence– Given the correspondence it’s easy to estimate the
mixture models• K-Means involves
– Model (hypothesis space): Mixture of N Gaussians– Latent variables: Correspondence of data and
Gaussians• Replace hard assignments with soft assignments EM
• EM is guaranteed to converge (EM steps do not decrease likelihood)
Learning a Gaussian Mixture
(with known covariance)
k
n
x
x
ni
ji
e
e
1
)(2
1
)(2
1
22
22
m
iiijj xzE
m 1
][1M-Step
k
nni
jiij
xxp
xxpzE
1
)|(
)|(][
E-Step
Generalized K-Means (EM)
compute weighted meanof data points
compute weights of assigning data points to clusters
EM Clustering: Results
Application: Clustering Flow
Clustered Flow
ML correspondence
Missing variable problems• In many vision problems, if some variables were
known the maximum likelihood inference problem would be easy– fitting; if we knew which line each token came from, it
would be easy to determine line parameters– segmentation; if we knew the segment each pixel came
from, it would be easy to determine the segment parameters
– fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix
– etc.• This sort of thing happens in statistics, too
Missing variable problems• Strategy
– estimate appropriate values for the missing variables– plug these in, now estimate parameters– re-estimate appropriate values for missing variables,
continue• eg
– guess which line gets which point– now fit the lines– now reallocate points to lines, using our knowledge of
the lines– now refit, etc.
• We’ve seen this line of thought before (k means)
Missing variables - strategy• We have a problem with
parameters, missing variables
• This suggests:• Iterate until convergence
– replace missing variable with expected values, given fixed values of parameters
– fix missing variables, choose parameters to maximise likelihood given fixed values of missing variables
• e.g., iterate till convergence– allocate each point to a
line with a weight, which is the probability of the point given the line
– refit lines to the weighted set of points
• Converges to local extremum
• Somewhat more general form is available
EM• E-step:
– calculate errors– calculate probabilities
• M-step:– re-calculate RGB clusters
• Demo with fixed sigma=20segment01.m
MATLAB code essentials
% init models
sigma=20;
m(:,g)=128+50*randn(3,1);
pi(g)=1/nrClusters;
% E-step
% calculate errors
E{g}=zeros(h,w);
e=(I(:,:,c)-m(c,g))/sigma;
E{g}=E{g}+e.*e;
% unnormalized probabilities
q{g}=pi(g)*exp(-0.5*E{g});
% normalize probabilities
p{g}=q{g}./sumq;
% M-step
P=p{g};
R=P.*I(:,:,1);
G=P.*I(:,:,2);
B=P.*I(:,:,3);
pi(g)=sum(P(:));
m(:,g) = [sum(R(:));
sum(G(:));
sum(B(:))]/pi(g);
pi=pi/sum(pi)
Expectation-Maximization
• See my TR online• We have parameters and data U• Also: hidden “nuisance” variables J• We want to find optimal
Example: Mixture
• 2 Components• Gaussian
Posterior in Parameter Space
• 2D !
EM
• Successive lower bounds
Line Fitting• Parameters =(, c)• c = distance to origin
| x cos y sin c |
c
Lines and robustness• We have one line, and n
points• Some come from the line,
some from “noise”• This is a mixture model:
• We wish to determine– line parameters– p(comes from line)
P point | line and noise params P point | line P comes from line P point | noise P comes from noise P point | line P point | noise (1 )
Estimating the mixture model
• Introduce a set of hidden variables, , one for each point. They are one when the point is on the line, and zero when off.
• If these are known, the negative log-likelihood becomes (the line’s parameters are c):
• Here K is a normalising constant, kn is the noise intensity (we’ll choose this later).
Qc x; i
xi cos yi sin c 2
2 2
1 i kn
i
K
Substituting for delta• We shall substitute the
expected value of , for a given
• recall =(, c, )• E(_i)=1. P(_i=1|)+0....
• Notice that if kn is small and positive, then if distance is small, this value is close to 1 and if it is large, close to zero
P i 1|,xi P xi |i 1, P i 1 P xi | i 1, P i 1 P xi |i 0, P i 0
exp 1
2 2 xi cos yi sin c 2 exp 1
2 2 xi cos yi sin c 2 exp kn 1
Algorithm for line fitting• Obtain some start point
• Now compute’s using formula above
• Now compute maximum likelihood estimate of
– , c come from fitting to weighted points
– comes by counting
• Iterate to convergence
0 0 ,c 0 , 0
1
The expected values of the deltas at the maximum(notice the one value close to zero).
Closeup of the fit
Other examples• Segmentation
– a segment is a gaussian that emits feature vectors (which could contain colour; or colour and position; or colour, texture and position).
– segment parameters are mean and (perhaps) covariance
– if we knew which segment each point belonged to, estimating these parameters would be easy
– rest is on same lines as fitting line
• Fitting multiple lines– rather like fitting one
line, except there are more hidden variables
– easiest is to encode as an array of hidden variables, which represent a table with a one where the i’th point comes from the j’th line, zeros otherwise
– rest is on same lines as above
Issues with EM• Local maxima
– can be a serious nuisance in some problems
– no guarantee that we have reached the “right” maximum
• Starting– k means to cluster the points is often a
good idea
Choosing parameters• What about the noise parameter, and the
sigma for the line?– several methods
• from first principles knowledge of the problem (seldom really possible)
• play around with a few examples and choose (usually quite effective, as precise choice doesn’t matter much)
– notice that if kn is large, this says that points very seldom come from noise, however far from the line they lie
• usually biases the fit, by pushing outliers into the line• rule of thumb; its better to fit to the better fitting points,
within reason; if this is hard to do, then the model could be a problem
Other examples• Segmentation
– a segment is a gaussian that emits feature vectors (which could contain colour; or colour and position; or colour, texture and position).
– segment parameters are mean and (perhaps) covariance
– if we knew which segment each point belonged to, estimating these parameters would be easy
– rest is on same lines as fitting line
• Fitting multiple lines– rather like fitting one
line, except there are more hidden variables
– easiest is to encode as an array of hidden variables, which represent a table with a one where the i’th point comes from the j’th line, zeros otherwise
– rest is on same lines as above
Issues with EM• Local maxima
– can be a serious nuisance in some problems
– no guarantee that we have reached the “right” maximum
• Starting– k means to cluster the points is often a
good idea
Local maximum
which is an excellent fit to some points
and the deltas for this maximum
A dataset that is well fitted by four lines
Result of EM fitting, with one line (or at least, one available local maximum).
Result of EM fitting, with two lines (or at least, one available local maximum).
Seven lines can produce a rather logical answer
Figure from “Color and Texture Based Image Segmentation Using EM and Its Application to Content Based Image Retrieval”,S.J. Belongie et al., Proc. Int. Conf. Computer Vision, 1998, c1998, IEEE
Segmentation with EM
Motion segmentation with EM
• Model image pair (or video sequence) as consisting of regions of parametric motion– affine motion is popular
• Now we need to– determine which pixels
belong to which region– estimate parameters
• Likelihood– assume
• Straightforward missing variable problem, rest is calculation
vxvy
a bc d
xy
txty
I x, y,t I x vx , y vy,t 1 noise
Three frames from the MPEG “flower garden” sequence
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE
Grey level shows region no. with highest probability
Segments and motion fields associated with themFigure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE
If we use multiple frames to estimate the appearanceof a segment, we can fill in occlusions; so we canre-render the sequence with some segments removed.
Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE
Some generalities• Many, but not all
problems that can be attacked with EM can also be attacked with RANSAC– need to be able to get a
parameter estimate with a manageably small number of random choices.
– RANSAC is usually better
• Didn’t present in the most general form– in the general form, the
likelihood may not be a linear function of the missing variables
– in this case, one takes an expectation of the likelihood, rather than substituting expected values of missing variables
– Issue doesn’t seem to arise in vision applications.
Model Selection• We wish to choose
a model to fit to data– e.g. is it a line or a
circle?– e.g is this a
perspective or orthographic camera?
– e.g. is there an aeroplane there or is it noise?
• Issue– In general, models
with more parameters will fit a dataset better, but are poorer at prediction
– This means we can’t simply look at the negative log-likelihood (or fitting error)
Top is not necessarily a betterfit than bottom(actually, almost always worse)
We can discount the fitting error with some term in the numberof parameters in the model.
Discounts• AIC (an information
criterion)– choose model with
smallest value of
– p is the number of parameters
• BIC (Bayes information criterion)– choose model with
smallest value of
– N is the number of data points
• Minimum description length– same criterion as BIC,
but derived in a completely different way
2L D;* p logN
2L D;* 2 p
Cross-validation• Split data set into two
pieces, fit to one, and compute negative log-likelihood on the other
• Average over multiple different splits
• Choose the model with the smallest value of this average
• The difference in averages for two different models is an estimate of the difference in KL divergence of the models from the source of the data
Model averaging • Very often, it is smarter to
use multiple models for prediction than just one
• e.g. motion capture data– there are a small number
of schemes that are used to put markers on the body
– given we know the scheme S and the measurements D, we can estimate the configuration of the body X
• We want
• If it is obvious what the scheme is from the data, then averaging makes little difference
• If it isn’t, then not averaging underestimates the variance of X --- we think we have a more precise estimate than we do.
P X | D P X | S1,D P S1 | D P X | S2,D P S2 | D P X | S3,D P S3 | D