Download - CS231A PS1 ProblemSection

CS 231A Section 2CS 231A Section 2Math background for PS1

Jiahui ShiJiahui Shi

Section 2 -Jiahui Shi 2011.10.71

AnnouncementAnnouncement

• Updated deadline of PS1:Updated deadline of PS1:

10/14, 12 noon

• Extra office hour for PS1:Extra office hour for PS1:

10/13, 7‐9pm, Gates 104


OverviewOverview

• Maximizing and Minimizing quadratic forms

• Least squaresLeast squares

• Linear filtering

• Maximum likelihood estimation


Quadratic FormsQuadratic Forms

• Quadratic forms look like (A is symmetric),

or if we have a 2x2 matrix


Quadratic Forms in OptimizationQuadratic Forms in Optimization

ill l k h i i i bl f• We will look at the optimization problem for a symmetric matrix, this will be useful for PS1

• This can be solved using the eigenvector g gcorresponding to the smallest eigenvalue.

• Similarly the eigenvector of the largest• Similarly, the eigenvector of the largest eigenvalue will give the max value.


Optimizing Quadratic FormsOptimizing Quadratic Forms

• Formulate the Lagrangian

k h i l i h• Now we take the partial with respect to x


Optimizing Quadratic FormsOptimizing Quadratic Forms

• Using this condition we know that our optimal x’smust satisfy

• Our possible optimizing vectors are just theOur possible optimizing vectors are just the eigenvectors of

Pl i i• Plugging in:

• The constraint on the matrix is pretty stringent. The matrix must be square, and symmetric.


Linear RegressionLinear Regression

• Fit a linear model to data

• PS1 ‐ use least squares approach to fit a modelPS1 use least squares approach to fit a model to our data with minimum total error.


Least SquaresLeast Squares

• Given A and y, find optimal x, s.t. Ax = y.• Residual is defined asResidual is defined as

A A i ki (# # lAssume A is skinny (#rows> #columns, or more equations than unknowns) and full rank.

• We want to minimize the squared 2‐norm of residual:


Least SquaresLeast Squares

• To minimize

take the gradient with respect to x and set equal to zeroequal to zero

solvingsolving


Geometric ApproachGeometric Approach

• Pick by

thus we have

since is skinny (#rows > #columns) and full rank we can write


Least Squares SolutionLeast Squares Solution

• When there is an optimization problem of the form

we can immediately write down the solutionwe can immediately write down the solution,

• Trick is to get it in that form, to better illustrate this lets see an exampleillustrate this lets see an example


Example ProblemExample Problem


Example ProblemExample Problem

• Solution:


Linear FilteringLinear Filtering

• Remember that convolution is linear

• Convolution properties: commutative,Convolution properties: commutative, associative, distributive, shift‐invariance

A l• As a result:


Linear FilteringLinear Filtering

• Trig identities are very useful for the problem set and analyzing linear filters in generaly g g– Useful for the problem set:


Likelihood functionLikelihood function

• You have a model which is parameterized bywhich characterizes the probability of each of the outputs

• This model is also function of the observed input data

• The model is given by , which reads: the e ode s g e by , c eads eprobability of given and parameterized by

• Assuming all our data is i i d we can write theAssuming all our data is i.i.d. we can write the probability of all our data as


Likelihood functionLikelihood function

• This distribution describes the probability of the output data given our input data and the model

• We want to find the model , to make predictions in the future

• To find our model we use a likelihood function

• The likelihood is defined as• The likelihood is defined as

which is a function of


Maximum likelihood estimationMaximum likelihood estimation

• Once we have we would like to maximize it over

• This is known as maximum likelihood (ML) estimationestimation

• To make maximization easier and still solving an equivalent optimization problem we will maximize , instead of trying to directlymaximize , instead of trying to directly maximize


Maximum a posteriori (MAP)Maximum a posteriori (MAP)

• Very similar to maximum likelihood, but now we view the our model as a random variable

• This allows us to put some prior distribution over what instances can take on

• With this prior we can write our the probability of our data asou da a as

• We can maximize the same way as maximumWe can maximize the same way as maximum likelihood, in fact maximum likelihood is a MAP problem with a uniform prior

Section 2 -Jiahui Shi

problem with a uniform prior

2011.10.720

Solving MAP and ML estimateSolving MAP and ML estimate

i d• Write down

• Now take the log

• Now maximize by taking the gradient and setting equal to zerosetting equal to zero