Data Modeling and Least Squares Fitting

Data Modeling andData Modeling andLeast Squares FittingLeast Squares Fitting

COS 323COS 323

Data ModelingData Modeling

• Given: data points, functional form,Given: data points, functional form,find constants in functionfind constants in function

• Example: given (xExample: given (xii, y, yii), find line through ), find line through

them;them;i.e., find a and b in y = ax+bi.e., find a and b in y = ax+b

(x(x11,y,y11))

(x(x22,y,y22))

(x(x33,y,y33))

(x(x44,y,y44))

(x(x55,y,y55))(x(x66,y,y66))

(x(x77,y,y77))

y=ax+by=ax+b


• You might do this because you actually You might do this because you actually care about those numbers…care about those numbers…– Example: measure position of falling object,Example: measure position of falling object,

fit parabolafit parabola

p = –p = –11//22 gt gt22

posi

tion

posi

tion

timetime

Estimate g from fitEstimate g from fit


• … … or because some aspect of behavior or because some aspect of behavior is unknown and you want to ignore itis unknown and you want to ignore it– Example: measuringExample: measuring

relative resonantrelative resonantfrequency of two ions,frequency of two ions,want to ignorewant to ignoremagnetic field driftmagnetic field drift

Least SquaresLeast Squares

• Nearly universal formulation of fitting:Nearly universal formulation of fitting:minimize squares of differences betweenminimize squares of differences betweendata and functiondata and function– Example: for fitting a line, minimizeExample: for fitting a line, minimize

with respect to a and bwith respect to a and b

– Most general solution technique: take Most general solution technique: take derivatives w.r.t. unknown variables, set derivatives w.r.t. unknown variables, set equal to zeroequal to zero

i

ii baxy 22 )( i

ii baxy 22 )(

Least SquaresLeast Squares

• Computational approaches:Computational approaches:– General numerical algorithms for function General numerical algorithms for function

minimizationminimization

– Take partial derivatives; general numerical Take partial derivatives; general numerical algorithms for root findingalgorithms for root finding

– Specialized numerical algorithms that take Specialized numerical algorithms that take advantage of form of functionadvantage of form of function

– Important special case: linear least squaresImportant special case: linear least squares

Linear Least SquaresLinear Least Squares

• General pattern:General pattern:

• Note that Note that dependence on unknownsdependence on unknowns is is linear,linear,not necessarily function!not necessarily function!

,,,forsolve),,(Given

)()()(

cbayx

xhcxgbxfay

ii

iiii

,,,forsolve),,(Given

)()()(

cbayx

xhcxgbxfay

ii

iiii

Solving Linear Least Squares Solving Linear Least Squares ProblemProblem

• Take partial derivatives:Take partial derivatives:

iii

iii

iii

iiiii

iii

iii

iii

iiiii

iiii

yxgxgxgbxfxga

xgbxfayxgb

yxfxgxfbxfxfa

xgbxfayxfa

xgbxfay

)()()()()(

0)()()(2

)()()()()(

0)()()(2

)()(22

iii

iii

iii

iiiii

iii

iii

iii

iiiii

iiii

yxgxgxgbxfxga

xgbxfayxgb

yxfxgxfbxfxfa

xgbxfayxfa

xgbxfay

)()()()()(

0)()()(2

)()()()()(

0)()()(2

)()(22

Solving Linear Least Squares Solving Linear Least Squares ProblemProblem

• For convenience, rewrite as matrix:For convenience, rewrite as matrix:

• Factor:Factor:

iii

iii

iii

iii

iii

iii

yxg

yxf

b

a

xgxgxfxg

xgxfxfxf

)(

)(

)()()()(

)()()()(

iii

iii

iii

iii

iii

iii

yxg

yxf

b

a

xgxgxfxg

xgxfxfxf

)(

)(

)()()()(

)()()()(

ii

i

ii

i

i

i

i

xg

xf

yb

a

xg

xf

xg

xf

)(

)(

)(

)(

)(

)(T

ii

i

ii

i

i

i

i

xg

xf

yb

a

xg

xf

xg

xf

)(

)(

)(

)(

)(

)(T


• There’s a different derivation of this:There’s a different derivation of this:overconstrained linear systemoverconstrained linear system

• A has n rows and m<n columns:A has n rows and m<n columns:more equations than unknownsmore equations than unknowns

bx

bx

A

A

bx

bx

A

A


• Interpretation: find x that comes Interpretation: find x that comes “closest” to satisfying Ax=b“closest” to satisfying Ax=b– i.e., minimize b–Axi.e., minimize b–Ax

– i.e., minimize |b–Ax|i.e., minimize |b–Ax|

– Equivalently, minimize |b–Ax|Equivalently, minimize |b–Ax|22 or (b–Ax) or (b–Ax)(b–(b–Ax)Ax)

bx

xbxbxb

xbxb

TT

TT

T

0)(2)()(

)()(min

AAA

AAAA

AA

bx

xbxbxb

xbxb

TT

TT

T

0)(2)()(

)()(min

AAA

AAAA

AA


• If fitting data to linear function:If fitting data to linear function:– Rows of A are functions of xRows of A are functions of xii

– Entries in b are yEntries in b are yii

– Minimizing sum of squared differences!Minimizing sum of squared differences!

iii

iii

iii

iii

iii

iii

xgy

xfy

bxgxgxfxg

xgxfxfxf

y

y

bxgxf

xgxf

)(

)(

,)()()()(

)()()()(

,)()(

)()(

TT

2

1

22

11

AAA

A

iii

iii

iii

iii

iii

iii

xgy

xfy

bxgxgxfxg

xgxfxfxf

y

y

bxgxf

xgxf

)(

)(

,)()()()(

)()()()(

,)()(

)()(

TT

2

1

22

11

AAA

A


• Compare two expressions we’ve derived Compare two expressions we’ve derived – equal!– equal!

iii

iii

iii

iii

iii

iii

xgy

xfy

b

a

xgxgxfxg

xgxfxfxf

)(

)(

)()()()(

)()()()(

iii

iii

iii

iii

iii

iii

xgy

xfy

b

a

xgxgxfxg

xgxfxfxf

)(

)(

)()()()(

)()()()(

ii

i

ii

i

i

i

i

xg

xf

yb

a

xg

xf

xg

xf

)(

)(

)(

)(

)(

)(T

ii

i

ii

i

i

i

i

xg

xf

yb

a

xg

xf

xg

xf

)(

)(

)(

)(

)(

)(T

Ways of Solving Linear Least Ways of Solving Linear Least SquaresSquares

• Option 1:Option 1:for each xfor each xii,y,yii

compute f(xcompute f(xii), g(x), g(xii), etc. ), etc.

store in row i of Astore in row i of Astore ystore yii in b in b

compute (Acompute (ATTA)A)-1 -1 AATTbb

• (A(ATTA)A)-1 -1 AATT is known as “pseudoinverse” is known as “pseudoinverse” of Aof A



compute f(xcompute f(xii), g(x), g(xii), etc.), etc.

store in row i of Astore in row i of Astore ystore yii in b in b

compute Acompute ATTA, AA, ATTbbsolve Asolve ATTAx=AAx=ATTbb

• These are known as the “normal These are known as the “normal equations” of the least squares problemequations” of the least squares problem


• These can be inefficient, since A typically These can be inefficient, since A typically much larger than Amuch larger than ATTA and AA and ATTbb


compute f(xcompute f(xii), g(x), g(xii), etc.), etc.

accumulate outer product in Uaccumulate outer product in Uaccumulate product with yaccumulate product with yii in v in v

solve Ux=vsolve Ux=v

Special Case: ConstantSpecial Case: Constant

• Let’s try to model a function of the formLet’s try to model a function of the form y = a y = a

• In this case, f(xIn this case, f(xii)=1 and we are solving)=1 and we are solving

• Punchline: mean is least-squares Punchline: mean is least-squares estimator for best constant fitestimator for best constant fit

n

ya

ya

ii

ii

i

1

n

ya

ya

ii

ii

i

1

Special Case: LineSpecial Case: Line

• Fit to y=a+bxFit to y=a+bx

2222

2

T22

2

1

2

1T

,

,)(

11

1

ii

iiii

ii

iiiii

ii

i

ii

i

ii

ii

i

i i

ii

i

i

xxn

yxyxnb

xxn

yxxyxa

yx

yb

xxn

nx

xx

xx

xn

xy

b

ax

x

AAA

2222

2

T22

2

1

2

1T

,

,)(

11

1

ii

iiii

ii

iiiii

ii

i

ii

i

ii

ii

i

i i

ii

i

i

xxn

yxyxnb

xxn

yxxyxa

yx

yb

xxn

nx

xx

xx

xn

xy

b

ax

x

AAA

Weighted Least SquaresWeighted Least Squares

• Common case: the (xCommon case: the (xii,y,yii) have different ) have different

uncertainties associated with themuncertainties associated with them

• Want to give more weight to Want to give more weight to measurementsmeasurementsof which you are more certainof which you are more certain

• Weighted least squares minimizationWeighted least squares minimization

• If uncertainty is If uncertainty is , best to take, best to take

i

iii xfyw 22 )(min i

iii xfyw 22 )(min

21

iiw 21

iiw

Weighted Least SquaresWeighted Least Squares

• Define weight matrix W asDefine weight matrix W as

• Then solve weighted least squares viaThen solve weighted least squares via

0

0

4

3

2

1

w

w

w

w

W

0

0

4

3

2

1

w

w

w

w

W

bx WAWAA TT bx WAWAA TT

Error Estimates from Linear Least Error Estimates from Linear Least SquaresSquares

• For many applications, finding values is For many applications, finding values is useless without estimate of their accuracyuseless without estimate of their accuracy

• Residual is b – AxResidual is b – Ax

• Can compute Can compute 22 = (b – Ax) = (b – Ax)(b – Ax)(b – Ax)

• How do we tell whether answer is good?How do we tell whether answer is good?– Lots of measurementsLots of measurements

22 is smallis small

22 increases quickly with perturbations to x increases quickly with perturbations to x


• Let’s look at increase in Let’s look at increase in 22::

• So, the So, the biggerbigger A ATTA is, the A is, the fasterfaster error error increasesincreasesas we move away from current xas we move away from current x

xx

xxxbx

xxxbxxbxb

xxbxxb

xxbxxb

xxx

AA

AAAAA

AAAAAA

AAAA

AA

TT22

TTTTT2

TTTTT

T

T

So,

)(2

)(2)()(

))())(

)()(

xx

xxxbx

xxxbxxbxb

xxbxxb

xxbxxb

xxx

AA

AAAAA

AAAAAA

AAAA

AA

TT22

TTTTT2

TTTTT

T

T

So,

)(2

)(2)()(

))())(

)()(


• C=(AC=(ATTA)A)–1 –1 is called is called covariancecovariance of the data of the data

• The “standard variance” in our estimate of x isThe “standard variance” in our estimate of x is

• This is a matrix:This is a matrix:– Diagonal entries give variance of estimates of Diagonal entries give variance of estimates of

components of xcomponents of x

– Off-diagonal entries explain mutual dependenceOff-diagonal entries explain mutual dependence

• n–m is (# of samples) minus (# of degrees of n–m is (# of samples) minus (# of degrees of freedom in the fit): consult a statistician…freedom in the fit): consult a statistician…

Cmn

2

2 Cmn

2

2

Special Case: ConstantSpecial Case: Constant

n

ya

ya

ay

ii

ii

i

1

n

ya

ya

ay

ii

ii

i

1

nn

ay

n

ay

ay

ii

a

ni

i

ii

1

)(

1

)(

)(

2

1

2

2

22

nn

ay

n

ay

ay

ii

a

ni

i

ii

1

)(

1

)(

)(

2

1

2

2

22

““standard deviationstandard deviationof mean”of mean”

““standard deviationstandard deviationof samples”of samples”

Things to Keep in MindThings to Keep in Mind

• In general, uncertainty in estimated In general, uncertainty in estimated parametersparametersgoes down slowly: like 1/sqrt(# samples)goes down slowly: like 1/sqrt(# samples)

• Formulas for special cases (like fitting a line) Formulas for special cases (like fitting a line) are messy: simpler to think of Aare messy: simpler to think of ATTAx=AAx=ATTb formb form

• All of these minimize “vertical” squared All of these minimize “vertical” squared distancedistance– Square not always appropriateSquare not always appropriate

– Vertical distance not always appropriateVertical distance not always appropriate

Data Modeling and Least Squares Fitting

Documents

Transcript of Data Modeling and Least Squares Fitting