CS 231A Section 2CS 231A Section 2Math background for PS1
Jiahui ShiJiahui Shi
Section 2 -Jiahui Shi 2011.10.71
AnnouncementAnnouncement
• Updated deadline of PS1:Updated deadline of PS1:
10/14, 12 noon
• Extra office hour for PS1:Extra office hour for PS1:
10/13, 7‐9pm, Gates 104
Section 2 -Jiahui Shi 2011.10.72
OverviewOverview
• Maximizing and Minimizing quadratic forms
• Least squaresLeast squares
• Linear filtering
• Maximum likelihood estimation
Section 2 -Jiahui Shi 2011.10.73
Quadratic FormsQuadratic Forms
• Quadratic forms look like (A is symmetric),
or if we have a 2x2 matrix
Section 2 -Jiahui Shi 2011.10.74
Quadratic Forms in OptimizationQuadratic Forms in Optimization
ill l k h i i i bl f• We will look at the optimization problem for a symmetric matrix, this will be useful for PS1
• This can be solved using the eigenvector g gcorresponding to the smallest eigenvalue.
• Similarly the eigenvector of the largest• Similarly, the eigenvector of the largest eigenvalue will give the max value.
Section 2 -Jiahui Shi 2011.10.75
Optimizing Quadratic FormsOptimizing Quadratic Forms
• Formulate the Lagrangian
k h i l i h• Now we take the partial with respect to x
Section 2 -Jiahui Shi 2011.10.76
Optimizing Quadratic FormsOptimizing Quadratic Forms
• Using this condition we know that our optimal x’smust satisfy
• Our possible optimizing vectors are just theOur possible optimizing vectors are just the eigenvectors of
Pl i i• Plugging in:
• The constraint on the matrix is pretty stringent. The matrix must be square, and symmetric.
Section 2 -Jiahui Shi 2011.10.77
Linear RegressionLinear Regression
• Fit a linear model to data
• PS1 ‐ use least squares approach to fit a modelPS1 use least squares approach to fit a model to our data with minimum total error.
Section 2 -Jiahui Shi 2011.10.78
Least SquaresLeast Squares
• Given A and y, find optimal x, s.t. Ax = y.• Residual is defined asResidual is defined as
A A i ki (# # lAssume A is skinny (#rows> #columns, or more equations than unknowns) and full rank.
• We want to minimize the squared 2‐norm of residual:
Section 2 -Jiahui Shi 2011.10.79
Least SquaresLeast Squares
• To minimize
take the gradient with respect to x and set equal to zeroequal to zero
solvingsolving
Section 2 -Jiahui Shi 2011.10.710
Geometric ApproachGeometric Approach
• Pick by
thus we have
since is skinny (#rows > #columns) and full rank we can write
Section 2 -Jiahui Shi 2011.10.711
Least Squares SolutionLeast Squares Solution
• When there is an optimization problem of the form
we can immediately write down the solutionwe can immediately write down the solution,
• Trick is to get it in that form, to better illustrate this lets see an exampleillustrate this lets see an example
Section 2 -Jiahui Shi 2011.10.712
Example ProblemExample Problem
Section 2 -Jiahui Shi 2011.10.713
Example ProblemExample Problem
• Solution:
Section 2 -Jiahui Shi 2011.10.714
Linear FilteringLinear Filtering
• Remember that convolution is linear
• Convolution properties: commutative,Convolution properties: commutative, associative, distributive, shift‐invariance
A l• As a result:
Section 2 -Jiahui Shi 2011.10.715
Linear FilteringLinear Filtering
• Trig identities are very useful for the problem set and analyzing linear filters in generaly g g– Useful for the problem set:
Section 2 -Jiahui Shi 2011.10.716
Likelihood functionLikelihood function
• You have a model which is parameterized bywhich characterizes the probability of each of the outputs
• This model is also function of the observed input data
• The model is given by , which reads: the e ode s g e by , c eads eprobability of given and parameterized by
• Assuming all our data is i i d we can write theAssuming all our data is i.i.d. we can write the probability of all our data as
Section 2 -Jiahui Shi 2011.10.717
Likelihood functionLikelihood function
• This distribution describes the probability of the output data given our input data and the model
• We want to find the model , to make predictions in the future
• To find our model we use a likelihood function
• The likelihood is defined as• The likelihood is defined as
which is a function of
Section 2 -Jiahui Shi 2011.10.718
Maximum likelihood estimationMaximum likelihood estimation
• Once we have we would like to maximize it over
• This is known as maximum likelihood (ML) estimationestimation
• To make maximization easier and still solving an equivalent optimization problem we will maximize , instead of trying to directlymaximize , instead of trying to directly maximize
Section 2 -Jiahui Shi 2011.10.719
Maximum a posteriori (MAP)Maximum a posteriori (MAP)
• Very similar to maximum likelihood, but now we view the our model as a random variable
• This allows us to put some prior distribution over what instances can take on
• With this prior we can write our the probability of our data asou da a as
• We can maximize the same way as maximumWe can maximize the same way as maximum likelihood, in fact maximum likelihood is a MAP problem with a uniform prior
Section 2 -Jiahui Shi
problem with a uniform prior
2011.10.720
Solving MAP and ML estimateSolving MAP and ML estimate
i d• Write down
• Now take the log
• Now maximize by taking the gradient and setting equal to zerosetting equal to zero
Section 2 -Jiahui Shi 2011.10.721
Top Related