Quadratic form and functional optimization

Quadratic Form and Functional Optimization

9th June, 2011 Junpei Tsuji

Optimization of multivariate quadratic function

𝐽 𝑥1, 𝑥2 = 1.2 + 0.2, 0.3𝑥1𝑥2 +

12𝑥1, 𝑥2

3 11 4

𝑥1𝑥2

= 1.2 + 0.2𝑥1 + 0.3𝑥2 +32𝑥12 + 𝑥1𝑥2 + 2𝑥22

𝑥1, 𝑥2, 𝐽 = 0.045, 0.064, 1.1881

Quadratic approximation By Taylor's expansion

𝑓 𝒙 ≈ 𝑓̅ + �̅� ∙ 𝒙 − 𝒙� +12 𝒙 − 𝒙� 𝑇𝑯� 𝒙 − 𝒙�

where

• 𝒙 ∶= 𝑥1, 𝑥2,⋯𝑥𝑝𝑇

• 𝑓̅ ∶= 𝑓 𝒙�

• �̅�: = 𝜕𝑓𝜕𝑥1

, 𝜕𝑓𝜕𝑥2

,⋯ , 𝜕𝑓𝜕𝑥𝑝 𝒙=𝒙�

Jacobian (gradient)

• 𝑯� ∶=

𝜕2𝑓𝜕𝑥1𝜕𝑥1

⋯ 𝜕2𝑓𝜕𝑥1𝜕𝑥𝑝

⋮ ⋱ ⋮𝜕2𝑓

𝜕𝑥𝑝𝜕𝑥1⋯ 𝜕2𝑓

𝜕𝑥𝑝𝜕𝑥𝑝 𝒙=𝒙�

Hessian (constant)

quadratic form constant linear form

Completing the square

𝑓 𝒙 = 𝑓̅ + �̅� ∙ 𝒙 − 𝒙� +12𝒙 − 𝒙� 𝑇𝑯� 𝒙 − 𝒙�

• Let 𝒙� = 𝒙∗ where 𝑱 𝒙∗ 𝑇 = 𝟎 then

𝑓 𝒙 = 𝑓∗ +12𝒙 − 𝒙∗ 𝑇𝑯∗ 𝒙 − 𝒙∗

quadratic form constant

Completing the square 𝑓 𝒙 = 𝑐 + 𝒃𝑇𝒙 +

12𝒙𝑇𝑨𝒙

𝑓 𝒙 = 𝑑 +12𝒙 − 𝒙0 𝑇𝑨 𝒙 − 𝒙0

= 𝑑 +12𝒙0𝑇𝑨𝒙0 −

12𝒙0𝑇 𝑨 + 𝑨𝑇 𝒙 +

12𝒙𝑇𝑨𝒙

• 𝒃𝑇 = −12𝒙0𝑇 𝑨 + 𝑨𝑇

𝒙0𝑇 = −2𝒃𝑇 𝑨 + 𝑨𝑇 −1 𝒙0 = −2 𝑨 + 𝑨𝑇 −1𝒃

• 𝑐 = 𝑑 + 12𝒙0𝑇𝑨𝒙0

𝑑 = 𝑐 −12𝒙0

𝑇𝑨𝒙0 = 𝑐 − 2𝒃𝑇 𝑨 + 𝑨𝑇 −1𝑨 𝑨 + 𝑨𝑇 −1𝒃

Therefore, 𝑓 𝒙 = 𝑐 − 2𝒃𝑇 𝑨 + 𝑨𝑇 −1𝑨 𝑨 + 𝑨𝑇 −1𝒃

+12 𝒙 + 2 𝑨 + 𝑨𝑇 −1𝒃 𝑇𝑨 𝒙 + 2 𝑨 + 𝑨𝑇 −1𝒃

• If 𝑨 was symmetric matrix,

𝑓 𝒙 = 𝑐 −12𝒃

𝑇𝑨−1𝒃 +12 𝒙 + 𝑨−1𝒃 𝑇𝑨 𝒙 + 𝑨−1𝒃

Quadratic form

𝑓 𝒙𝒙 = 𝒙𝒙𝑇𝑺𝒙𝒙 where • 𝑺 is symmetric matrix.

Symmetric matrix • Symmetric matrix 𝑺 is defined as a matrix that satisfies the

following formula: 𝑺𝑇 = 𝑺

• Symmetric matrix 𝑺 has real eigenvalues 𝜆𝑖 and

eigenvectors 𝒖𝑖 that consist of normal orthogonal base. where

𝑺𝒖𝑖 = 𝜆𝑖𝒖𝑖 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑝

𝒖𝑖 ,𝒖𝑗 = 𝛿𝑖𝑗 𝛿𝑖𝑗 is Kronecker's delta

Diagonalization of symmetric matrix

• We define an orthogonal matrix 𝑼 as follows: 𝑼 = 𝒖1,𝒖2,⋯ ,𝒖𝑝

• Then, 𝑼 satisfies the following formulas: 𝑼𝑇𝑼 = 𝑰

∴ 𝑼−1 = 𝑼𝑇 • where 𝑰 is an identity matrix.

𝑺𝑼 = 𝑺 𝒖1,𝒖2,⋯ ,𝒖𝑝 = 𝑺𝒖1,𝑺𝒖2,⋯ ,𝑺𝒖𝑝

= 𝜆1𝒖1, 𝜆2𝒖2,⋯ , 𝜆𝑝𝒖𝑝 = 𝒖1,⋯ ,𝒖𝑝𝜆1 ⋱ 𝜆𝑝

= 𝑼 𝐝𝐝𝐝𝐝 𝜆1, 𝜆2,⋯ , 𝜆𝑝 ∴ 𝑺 = 𝑼 𝐝𝐝𝐝𝐝 𝜆1, 𝜆2,⋯ , 𝜆𝑝 𝑼𝑇

Transformation to principal axis

𝑓 𝒙𝒙 = 𝒙𝒙𝑇𝑺𝒙𝒙 • Then, we assume 𝒙𝒙 = 𝑼𝑇𝒛, where 𝒛 =

𝑧1, 𝑧1,⋯ , 𝑧𝑝 .

𝑓 𝑼𝑇𝒛 = 𝑼𝑇𝒛 𝑇𝑺 𝑼𝑇𝒛 = 𝒛𝑇𝑼𝑺𝑼𝑇𝒛= 𝒛𝑇 𝐝𝐝𝐝𝐝 𝜆1, 𝜆2,⋯ , 𝜆𝑝 𝒛

∴ 𝑓 𝒛 = �𝜆𝑖𝑧𝑖2𝑝

𝑖=1

Contour surface

• If we assume 𝑓 𝒛 equals constant 𝑐,

𝑓 𝒛 = �𝜆𝑖𝑧𝑖2𝑝

𝑖=1

= 𝑐

• When 𝑝 = 2, – a locus of 𝒛 illustrates an ellipse if 𝜆1𝜆2 > 0. – a locus of 𝒛 illustrates a hyperbola if 𝜆1𝜆2 < 0.

Contour surface

𝑧1

𝑧2

𝑓 𝒛 = �𝜆𝑖𝑧𝑖22

𝑖=1

= 𝑐𝑐𝑐𝑐𝑐.

𝜆1𝜆2 > 0

𝑓 𝑥1, 𝑥2 = −𝑥12 − 2𝑥22 + 20.0

maximal or minimal point


𝑓 𝒙𝒙 = 𝑐𝑐𝑐𝑐𝑐.

𝑥𝒙1

𝑥𝒙2

𝒙𝒙 = 𝑼𝑇𝒛 ∴ 𝒛 = 𝑼𝒙′


𝑥𝒙1

𝑥𝒙2

Parallel translation

𝑓 𝒙 = 𝑐𝑐𝑐𝑐𝑐.

𝑥1

𝑥2

𝒙�

𝒙𝒙 = 𝒙 − 𝒙�

Contour surface of quadratic function

𝑓 𝒙 = 𝑓∗ +12𝒙 − 𝒙∗ 𝑇𝑯∗ 𝒙 − 𝒙∗

𝑓 𝒙 = 𝑐𝑐𝑐𝑐𝑐.

𝑥1

𝑥2

𝒙�

Contour surface

𝑓 𝒛 = �𝜆𝑖𝑧𝑖22

𝑖=1

= 𝑐𝑐𝑐𝑐𝑐.

𝜆1𝜆2 < 0 𝑧1

𝑧2

saddle point

𝑓 𝑥1, 𝑥2 = 𝑥12 − 𝑥22

Stationary points

saddle point maximal point

𝑓 𝑥1, 𝑥2 = 𝑥13 + 𝑥23 + 3𝑥1𝑥2 + 2

Stationary points

maximal point saddle point

𝑓 𝑥1, 𝑥2 = exp −13𝑥13 + 𝑥1 − 𝑥22

Newton-Raphson method

• Newton’s method is an approximate solver of 𝑓𝒙 𝒙 = 𝟎 where 𝑓 𝒙 is 𝑁-th polynomial by using a quadratic approximation.

𝑓 𝒙

quadratic approximation of 𝑓 𝒙 in 𝒙

𝑓𝒙 𝒙∗ = 𝟎

𝒙 𝒙 𝒙 + 𝚫𝒙 𝒙∗

𝑓 𝒙 + Δ𝒙 ≈ 𝑓 𝒙 + 𝑱 𝒙 ∙ Δ𝒙 +12Δ𝒙𝑇𝑯 𝒙 Δ𝒙

𝜕𝑓 𝒙 + Δ𝒙𝜕 Δ𝒙

= 𝑱 𝒙 𝑇 + 𝑯 𝒙 Δ𝒙

Algorithm of Newton’s method

Procedure Newton (𝑱 𝒙 , 𝑯 𝒙 ) 1. Initialize 𝒙. 2. Calculate 𝑱 𝒙 and 𝑯 𝒙 . 3. Solve the following simultaneous

equation and giving ∆𝒙 : 𝑱 𝒙 𝑇 + 𝑯 𝒙 ∆𝒙 = 𝟎

4. Update 𝒙 as follows: 𝒙 ← 𝒙 + ∆𝒙

5. If ∆𝒙 < 𝛿 then return 𝒙 else go back to 2.

Linear regression

𝒙𝑖 ,𝑦𝑖

𝒙

𝑦 𝑦 = 𝑓 𝒙 = 𝛽0 + �𝛽𝑗𝑥𝑗

𝑝

𝑗=1

We would like to find 𝜷∗ that minimizes the residual sum of square (RSS).

𝑁 samples

𝑝-th dimensional space

Linear regression

min𝜷

RSS 𝜷 • where

RSS 𝜷 = � 𝑦𝑖 − 𝑓 𝒙𝑖 2𝑁

𝑖=1

= � 𝑦𝑖 − 𝛽0 + �𝛽𝑗𝑥𝑖𝑗

𝑝

𝑗=1

2𝑁

𝑖=1

• Given 𝑿,𝒚,𝜷 as follows:

𝑿 =𝑥11 ⋯ 𝑥1𝑝⋮ ⋱ ⋮𝑥𝑁1 ⋯ 𝑥𝑁𝑝

1⋮1

, 𝒚 =𝑦1⋮𝑦𝑁

, 𝜷 =𝛽1⋮𝛽𝑝

∴ RSS 𝜷 = 𝒚 − 𝑿𝜷 2

Linear regression

RSS 𝜷 = 𝐽 𝜷 = 𝒚 − 𝑿𝜷 2 = 𝒚 − 𝑿𝜷 𝑇 𝒚 − 𝑿𝜷 = 𝒚𝑇𝒚 − 𝜷𝑇𝑿𝑇𝒚 − 𝒚𝑇𝑿𝜷 + 𝜷𝑇𝑿𝑇𝑿𝜷

• 𝜕𝜕𝜷

𝒂𝑇𝜷 = 𝒂

• 𝜕𝜕𝜷

𝜷𝑇𝒂 = 𝒂

• 𝜕𝜕𝜷

𝜷𝑇𝑨𝜷 = 𝑨

𝐽′ 𝜷 =𝜕𝐽𝜕𝜷

= −2𝑿𝑇𝒚 + 2𝑿𝑇𝑿𝜷

Linear regression Given 𝜷∗ that satisfies 𝐽′ 𝜷∗ = 𝟎,

𝑿𝑇𝒚 = 𝑿𝑇𝑿𝜷∗ 𝒚𝑇𝑿 = 𝜷∗𝑇𝑿𝑇𝑿

∴ 𝜷∗ = 𝑿𝑇𝑿 −1𝑿𝑇𝒚 ∴ 𝐽 𝜷 = 𝒚𝑇𝒚 − 𝜷𝑇𝑿𝑇𝑿𝜷∗ − 𝜷∗𝑇𝑿𝑇𝑿𝜷 + 𝜷𝑇𝑿𝑇𝑿𝜷 ∴ 𝐽 𝜷

= 𝒚𝑇𝒚 − 𝜷∗𝑇𝑿𝑇𝑿𝜷∗ + 𝜷∗𝑇𝑿𝑇𝑿𝜷∗ − 𝜷𝑇𝑿𝑇𝑿𝜷∗

− 𝜷∗𝑇𝑿𝑇𝑿𝜷 + 𝜷𝑇𝑿𝑇𝑿𝜷 ∴ 𝐽 𝜷 = 𝒚𝑇𝒚 − 𝜷∗𝑇𝑿𝑇𝑿𝜷∗ + 𝜷 − 𝜷∗ 𝑇𝑿𝑇𝑿 𝜷 − 𝜷∗

completing the square

Linear regression

𝐽 𝜷 = 𝒚𝑇𝒚 − 𝜷∗𝑇𝑿𝑇𝑿𝜷∗ + 𝜷 − 𝜷∗ 𝑇𝑿𝑇𝑿 𝜷 − 𝜷∗ = 𝒚 − 𝑿𝜷∗ 2 + 𝜷 − 𝜷∗ 𝑇𝑿𝑇𝑿 𝜷 − 𝜷∗

= 𝐽 𝜷∗ +12𝜷 − 𝜷∗ 𝑇𝑯 𝜷 − 𝜷∗

quadratic form Residual sum of squares (RSS) by Linear Regression

𝐽 𝜷 = 𝑐𝑐𝑐𝑐𝑐.

𝛽1

𝛽2

𝜷∗

𝜷∗ = 𝑿𝑇𝑿 −1𝑿𝑇𝒚 𝑯 = 2𝑿𝑇𝑿

Hessian

• 𝑯 ≔ 𝜕2𝐽𝜕𝛽𝑖𝜕𝛽𝑗

= 2𝑿𝑇𝑿

• 𝑯 has the following two features: – symmetric matrix: 𝑯𝑇 = 𝑯 – positive-definite matrix: 𝒙∀ ≠ 𝟎, 𝒙𝑇𝑯𝒙 > 0

Therefore, 𝜷∗ = 𝑿𝑇𝑿 −1𝑿𝑇𝒚 is the minimum of 𝐽 𝜷 .

Analysis of residuals

𝒚∗ = 𝑿𝜷∗ • Then, we substitute 𝜷∗ = 𝑿𝑇𝑿 −1𝑿𝑇𝒚 in the above,

𝒚∗ = 𝑿𝜷∗ = 𝑿 𝑿𝑇𝑿 −1𝑿𝑇 𝒚

∴ 𝒚∗ = ℋ𝒚 (Hat matrix) • the vector of residuals 𝒓 can be expressed by follows:

𝒓 = 𝒚 − 𝒚∗ = 𝒚 −ℋ𝒚 = 𝑰 −ℋ 𝒚 𝑉𝑉𝑉 𝒓 = 𝑉𝑉𝑉 𝑰 −ℋ 𝒚 = 𝑰 −ℋ 𝑉𝑉𝑉 𝒚 𝑰 −ℋ 𝑇


ℋ = 𝑿 𝑿𝑇𝑿 −1𝑿𝑇 The hat matrix ℋ is a projection matrix, which satisfies the following equations: 1. Projection: ℋ2 = ℋ

ℋ2 = ℋ ∙ℋ = 𝑿 𝑿𝑇𝑿 −1𝑿𝑇 ∙ 𝑿 𝑿𝑇𝑿 −1𝑿𝑇 = 𝑿 𝑿𝑇𝑿 −1 𝑿𝑇𝑿 𝑿𝑇𝑿 −1𝑿𝑇

= 𝑿 𝑿𝑇𝑿 −1𝑿𝑇 = ℋ

2. Orthogonal: ℋ𝑇 = ℋ


𝑦1∗⋮𝑦𝑁∗

=𝑥11 ⋯ 𝑥1𝑝⋮ ⋱ ⋮𝑥𝑁1 ⋯ 𝑥𝑁𝑝

1⋮1

𝛽1∗

⋮𝛽𝑝

∗

𝛽0∗

= 𝛽1∗𝑥11⋮𝑥𝑁1

+ ⋯+ 𝛽𝑝∗

𝑥1𝑝⋮

𝑥𝑁𝑝+ 𝛽0

∗1⋮1

linear combination in 𝑝 + 1 -th vector space

𝒙1 𝒙𝑝 𝒙𝑝+1 = 𝟏


𝒙𝑝

𝒙𝑗

𝒚∗

𝒚

𝑁-th dimensional space

𝒚∗ = ℋ𝒚 (Projection)

𝑝 + 1 -th dimensional super surface


𝒚 = 𝑿𝜷 • 𝜷 = 𝑿−1𝒚, where 𝑿−1 is M-P generalized inverse.

1. Unique solution: 𝑝 = 𝑁 2. Many solutions: 𝑝 > 𝑁 3. No solution: 𝑝 < 𝑁

• 𝑿−1 = �𝑿−1

𝑿𝒙 𝑿𝑿𝒙 −1 𝜷 = 𝑿−1𝒚 is min in 𝜷𝑿𝒙𝑿 −1𝑿𝒙 𝒚 − 𝑿𝜷 2 is min

Quadratic form and functional optimization

Education

Transcript of Quadratic form and functional optimization