Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating...

126
Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower [email protected] Advisor Margarida Pinheiro Mello [email protected] July 25, 2011

Transcript of Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating...

Page 1: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Calculating Hessian matrices

StudentRobert Mansel Gower [email protected]

AdvisorMargarida Pinheiro Mello [email protected]

July 25, 2011

Page 2: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Contents

1 Motivation

2 Computational graph

3 GradientForward GradientPartial derivatives on computational graph

4 HessianForward HessianHessian on computational graphNew Reverse Hessian algorithmComparative tests

Page 3: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Motivation

Motivation from nonlinear programming

Second order Taylor approximations very common in nonlinearprogramming.

Hessians desirable in interior-point and augmented Lagrangianmethods.

Sensitivity analysis

Page 4: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0−1 0

v1 = h(v−1) 1 v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 5: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0−1 0

v1 = h(v−1) 1 v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))

v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 6: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0−1 0

v1 = h(v−1) 1 v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 7: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0−1 0

v1 = h(v−1) 1 v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 8: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0−1 0

v1 = h(v−1) 1 v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 9: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0

−1 0

v1 = h(v−1) 1

v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 10: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0

−1 0

v1 = h(v−1)

1

v2 = g(v−1, v0)2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 11: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0

−1 0

v1 = h(v−1)

1

v2 = g(v−1, v0)

2

v3 = f (v2, v1) 3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 12: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0

−1 0

v1 = h(v−1)

1

v2 = g(v−1, v0)

2

v3 = f (v2, v1)

3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 13: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Representation

−1 0

1 2

3

v−1 = x−1 v0 = x0

−1 0

v1 = h(v−1)

1

v2 = g(v−1, v0)

2

v3 = f (v2, v1)

3

f (h(x−1), g(x−1, x0))v−1 = x−1v0 = x0v1 = h(v−1)v2 = g(v−1, v0)v3 = f (v2, v1)

Indices of matrices and vectors shifted by −n.y ∈ Rm : y = (y1−n, . . . , ym−n)T

Node numbering is in order of evaluation.

(j is a predecessor of i) ≡ j ∈ P(i).

(i is a sucessor of j) ≡ i ∈ S(j).

Page 14: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:

vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 15: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , n

Independent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 16: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:

vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 17: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.

Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 18: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 19: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E )

& φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 20: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives coded

TIME(eval(f (x))) = O(`+ n).

Page 21: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Computational graph

Function Evaluation ≡ Computational Graph

Nodes for Independent variables:vi−n = xi−n, for i = 1, . . . , nIndependent nodes Z = {1− n, . . . , 0}

Nodes for Intermediate variables:vi = φi (vP(i)), for i = 1, . . . , `.Intermediate nodes V = {1, . . . , `}

Function Evaluation ≡ G = (Z ∪ V ,E ) & φ set of elementalfunctions with derivatives codedTIME(eval(f (x))) = O(`+ n).

Page 22: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Forward Gradient: The first attempt

Set of elemental function = Sums, multiplication and unaryfunctions.

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj .

Each j passes on ∂φi∂vj∇vj to each sucessor i .

Page 23: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Forward Gradient: The first attempt

Set of elemental function = Sums, multiplication and unaryfunctions.

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj .

Each j passes on ∂φi∂vj∇vj to each sucessor i .

Page 24: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Forward Gradient: The first attempt

Set of elemental function = Sums, multiplication and unaryfunctions.

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj .

Each j passes on ∂φi∂vj∇vj to each sucessor i .

Page 25: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Resume of Forward gradient

For each node i one stores ∇vi = ( ∂vi∂x1−n

, . . . , ∂vi∂x0).

Memory complexity: O(n`).

For each node visit, perform n-dimension vector arithmetic.

Time complexity: O(n`).

Storing and calculating all ∇vi ’s is expensive andunnecessary.

Page 26: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Resume of Forward gradient

For each node i one stores ∇vi = ( ∂vi∂x1−n

, . . . , ∂vi∂x0).

Memory complexity: O(n`).

For each node visit, perform n-dimension vector arithmetic.

Time complexity: O(n`).

Storing and calculating all ∇vi ’s is expensive andunnecessary.

Page 27: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Resume of Forward gradient

For each node i one stores ∇vi = ( ∂vi∂x1−n

, . . . , ∂vi∂x0).

Memory complexity: O(n`).

For each node visit, perform n-dimension vector arithmetic.

Time complexity: O(n`).

Storing and calculating all ∇vi ’s is expensive andunnecessary.

Page 28: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Resume of Forward gradient

For each node i one stores ∇vi = ( ∂vi∂x1−n

, . . . , ∂vi∂x0).

Memory complexity: O(n`).

For each node visit, perform n-dimension vector arithmetic.

Time complexity: O(n`).

Storing and calculating all ∇vi ’s is expensive andunnecessary.

Page 29: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Resume of Forward gradient

For each node i one stores ∇vi = ( ∂vi∂x1−n

, . . . , ∂vi∂x0).

Memory complexity: O(n`).

For each node visit, perform n-dimension vector arithmetic.

Time complexity: O(n`).

Storing and calculating all ∇vi ’s is expensive andunnecessary.

Page 30: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Forward Gradient

Resume of Forward gradient

For each node i one stores ∇vi = ( ∂vi∂x1−n

, . . . , ∂vi∂x0).

Memory complexity: O(n`).

For each node visit, perform n-dimension vector arithmetic.

Time complexity: O(n`).

Storing and calculating all ∇vi ’s is expensive andunnecessary.

Page 31: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 32: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 33: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 34: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x0=∂φ3

∂v2

∂φ2

∂x0+ 0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 35: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x0=∂φ3

∂v2

∂φ2

∂x0+ 0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 36: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x0=∂φ3

∂v2

∂φ2

∂x0+ 0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 37: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x0=∂φ3

∂v2

∂φ2

∂x0+ 0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 38: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

x−1 x0

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

∂f

∂x0=∂φ3

∂v2

∂φ2

∂x0+ 0

∂f

∂x−1=∂φ3

∂v2

∂φ2

∂v−1+∂φ3

∂v1

∂φ1

∂v−1

∂f

∂xi=

∑p|path from i to `

(weight of path p)

Page 39: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2 v0 =

∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 40: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2 v0 =

∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 41: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2 v0 =

∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 42: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2 v0 =

∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 43: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

2v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2 v0 =

∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 44: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

1v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2 v0 =

∂φ2

∂x0v2

v−1 =∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 45: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2

v0 =∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 46: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Gradient

Partial derivatives on computational graph

Reverse Gradient - Accumulating paths

−1 0

1 2

3

f (x) = φ3(φ1(x−1), φ2(x−1, x0))

∂φ1∂x−1

∂φ2∂x0

∂φ2∂x−1

∂φ3∂v2

∂φ3∂v1

v3 = 1 3

v1 = ∂φ3∂v1

v2 = ∂φ3∂v2

v−1 =∂φ2

∂x−1v2

v0 =∂φ2

∂x0v2v−1 =

∂φ1

∂x−1v1 +

∂φ2

∂x−1v2

∂f

∂x−1= v−1

∂f

∂x0= v0

vi =∑path from i to `

(path weight)

vj =∑i∈S(j)

∂φi∂vj

vi

TIME(∇f (x))= TIME(f (x))

Page 47: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian: McCormick and Jackson 1986

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj

v ′′i =∑

j ,k∈P(i)

∇vj ·∂2φi

∂vj∂vk· ∇vT

k +∑

j∈P(i)

∂φi

∂vj· v ′′j

Page 48: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian: McCormick and Jackson 1986

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj

v ′′i =∑

j ,k∈P(i)

∇vj ·∂2φi

∂vj∂vk· ∇vT

k +∑

j∈P(i)

∂φi

∂vj· v ′′j

Page 49: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian: McCormick and Jackson 1986

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj

v ′′i =∑

j ,k∈P(i)

∇vj ·∂2φi

∂vj∂vk· ∇vT

k +∑

j∈P(i)

∂φi

∂vj· v ′′j

Page 50: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian: McCormick and Jackson 1986

vi = φi (vP(i))

∇vi =∑

j∈P(i)

∂φi

∂vj∇vj

v ′′i =∑

j ,k∈P(i)

∇vj ·∂2φi

∂vj∂vk· ∇vT

k +∑

j∈P(i)

∂φi

∂vj· v ′′j

Page 51: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian resume

For each node, store and calculate a n × n matrix.

Is it necessary to calculate the gradient and Hessian of eachnode?

Gain a deeper understanding on the problem using gradientgraph.

Page 52: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian resume

For each node, store and calculate a n × n matrix.

Is it necessary to calculate the gradient and Hessian of eachnode?

Gain a deeper understanding on the problem using gradientgraph.

Page 53: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Forward Hessian resume

For each node, store and calculate a n × n matrix.

Is it necessary to calculate the gradient and Hessian of eachnode?

Gain a deeper understanding on the problem using gradientgraph.

Page 54: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Calculating the Hessian using the computational graph

Function’s computational graph + vi nodes and dependencies= gradient computational graph.

Interpret partial derivative on augmented graph: Second orderderivative.

Eliminate unnecessary symmetries on augmented graph.

Page 55: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Calculating the Hessian using the computational graph

Function’s computational graph + vi nodes and dependencies= gradient computational graph.

Interpret partial derivative on augmented graph: Second orderderivative.

Eliminate unnecessary symmetries on augmented graph.

Page 56: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Forward Hessian

Calculating the Hessian using the computational graph

Function’s computational graph + vi nodes and dependencies= gradient computational graph.

Interpret partial derivative on augmented graph: Second orderderivative.

Eliminate unnecessary symmetries on augmented graph.

Page 57: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

The adjoint variables of the Reverse Gradient satisfy

vj =∑i∈S(j)

vi∂φi∂vj≡ ϕj .

Gradient’s graph has 2(`+ n) nodes:(v1−n, . . . , v`) and (v1−n, . . . , v`).

node ←→ vj .

i ∈ P () iff j ∈ P(i).

Page 58: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

The adjoint variables of the Reverse Gradient satisfy

vj =∑i∈S(j)

vi∂φi∂vj≡ ϕj .

Gradient’s graph has 2(`+ n) nodes:(v1−n, . . . , v`) and (v1−n, . . . , v`).

node ←→ vj .

i ∈ P () iff j ∈ P(i).

Page 59: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

The adjoint variables of the Reverse Gradient satisfy

vj =∑i∈S(j)

vi∂φi∂vj≡ ϕj .

Gradient’s graph has 2(`+ n) nodes:(v1−n, . . . , v`) and (v1−n, . . . , v`).

node ←→ vj .

i ∈ P () iff j ∈ P(i).

Page 60: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

The adjoint variables of the Reverse Gradient satisfy

vj =∑i∈S(j)

vi∂φi∂vj≡ ϕj .

Gradient’s graph has 2(`+ n) nodes:(v1−n, . . . , v`) and (v1−n, . . . , v`).

node ←→ vj .

i ∈ P () iff j ∈ P(i).

Page 61: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 62: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 63: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1 ←−v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 64: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2 ←−v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 65: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1) ←−

Page 66: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1∂2f

∂xi∂xj=∑

p from −1 to −1

Weight(p)

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 67: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 68: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

cos(v−1)

cos(v−1)

∂φ1

∂v−1= cos(v−1)

∂ϕ−1

∂v1= cos(v−1)

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) ←− v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1) ←−

Page 69: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

cos(v−1)

cos(v−1)∂ϕj

∂vk= ckj =

∂φk

∂vj

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) ←− v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1) ←−

Page 70: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v3

v3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1 ←−v1 = sin(v−1) v1 = v3v2 ←−v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 71: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

v3

v3

∂ϕ1

∂v2= v3

∂ϕ2

∂v1= v3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1 ←−v1 = sin(v−1) v1 = v3v2 ←−v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 72: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

∂ϕj

∂vk= ckj =

∂ϕk

∂vj

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 73: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0)

−1 0

1 2

3

∂ϕj

∂vk= ckj =

∂ϕk

∂vj

ckj =∑

i∈S(k)∩S(j)

vi∂2φi

∂vj∂vk

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 74: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0) = φ3(φ1(x−1), φ2(x−1, x0))

−1 0

1 2

3

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 75: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0) = φ3(φ1(x−1), φ2(x−1, x0))

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 76: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0) = φ3(φ1(x−1), φ2(x−1, x0))

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 77: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

f (x) = sin(x−1)(x−1 + x0) = φ3(φ1(x−1), φ2(x−1, x0))

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

v−1 = x−1 v3 = 1v0 = x0 v2 = v3v1v1 = sin(v−1) v1 = v3v2v2 = (v−1 + v0) v0 = v21v3 = v1v2 v−1 = v21 + v1 cos(v−1)

Page 78: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 79: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1

∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 80: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1

∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1

∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 81: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1

∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 82: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 83: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 84: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 85: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

∂2f∂x2−1

= 2c1−1c21c2−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 86: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

−1 0

1 2

3

∂2f∂x2−1

= c1−1c21c2−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1∂2f∂x2−1

= c1−1c21c2−1 + c2−1c21c1−1 + c−1−1

∂2f∂x2−1

= 2c1−1c21c2−1 + c−1−1

Fold mirror subgraph.

More symmetry

k 99K j iff k L99 j

ckj = cjk

Page 87: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

m k

i j

∂2f

∂xi∂xj=∑

nonlinear

edge {m, k}

∑{p| from i to m}

( weight of p) cmk

∑{p| from j to k}

( weight of p) .

Page 88: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Hessian on computational graph

m k

i j

∂2f

∂xi∂xj=∑

nonlinear

edge {m, k}

∑{p| from i to m}

( weight of p) cmk

∑{p| from j to k}

( weight of p) .

Page 89: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Building shortcuts

P(m) = {i , j}.

(m, k) ∈ Path

⇒ path 3 (i ,m, k) or path 3 (j ,m, k)

i j

m k

cmk

cjmcmk

cimcmk

Figure: Pushing the edge {m, k}

Page 90: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Building shortcuts

P(m) = {i , j}.(m, k) ∈ Path

⇒ path 3 (i ,m, k) or path 3 (j ,m, k)

i j

m kcmk

cjmcmk

cimcmk

Figure: Pushing the edge {m, k}

Page 91: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Building shortcuts

P(m) = {i , j}.(m, k) ∈ Path

⇒ path 3 (i ,m, k) or path 3 (j ,m, k)

i j

m kcmk

cjmcmk

cimcmk

Figure: Pushing the edge {m, k}

Page 92: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Building shortcuts

P(m) = {i , j}.(m, k) ∈ Path

⇒ path 3 (i ,m, k) or path 3 (j ,m, k)

i j

m k

cmk

cjmcmk

cimcmk

Figure: Pushing the edge {m, k}

Page 93: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 94: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

66

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 95: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6

5

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 96: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6

4

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 97: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6

3

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 98: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6

2

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 99: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6

1

f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 100: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5v6 = 1v5 = v4v4 = v5

Page 101: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

Simple example of edge pushing execution

−2 −1 0

1 2

4

3

5

6f (x) = (x−2 + 1)(x−1 + 1)3(x0 + 1)

v1 = v−2 + 1v2 = v−1 + 1v3 = v0 + 1v4 = v1v2v5 = 3v3v6 = v4v5

f ′′ =

0 X XX 0 XX X 0

Page 102: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

pushing of nonlinear edges

i

j k

i

j k

i

j k

i

j k

i

j k

i

j k

sweepingnode i

wii

wiicji wiickiwiicjicki

wik

wikcji2wikcik

pwip

p

wipcji wipcki

Page 103: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

New Reverse Hessian algorithm

The pseudo-code of edge pushing

Input: x ∈ Rn,for i = `, . . . , 1 do

Create nonlinear edges if φi is nonlinear ;Push nonlinear edges adjacent to i ;

end

Page 104: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Competitor for edge pushing: Graph coloring

edge pushing implementation aimed at large sparse Hessians.

state-of-the-art competitor: graph coloring methodsGebremedhin, Manne, Pothen, Walther, Tarafdar

Efficient Computation of Sparse Hessians Using Coloring andAutomatic Differentiation(2009)What Color Is Your Jacobian? Graph Coloring for ComputingDerivatives(2005)

Page 105: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Sparsity Pattern1 Color Compact

Calculates once

f ′′(x) f ′′(x)S

1Uses Walther’s 2008 algorithm

Page 106: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Sparsity Pattern1 Color Compact

Calculates once

f ′′(x) f ′′(x)S

1Uses Walther’s 2008 algorithm

Page 107: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Sparsity Pattern1 Color Compact Derive Compact Recover

Repeat for each x ∈ RnCalculates once

f ′′(x) f ′′(x)S

1Uses Walther’s 2008 algorithm

Page 108: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Sparsity Pattern1 Color Compact Derive Compact Recover

Repeat for each x ∈ RnCalculates once

f ′′(x) f ′′(x)S

1Uses Walther’s 2008 algorithm

Page 109: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Invests a large initial time in 1st run ⇒ fast subsequent runs.

Two different coloring methods with different recoveries: Starand Acyclic.

Page 110: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Invests a large initial time in 1st run ⇒ fast subsequent runs.

Two different coloring methods with different recoveries: Starand Acyclic.

Page 111: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Test set chosen from CUTE

n = 50′000.# colors

Name Pattern Star Acyclic

cosine B 1 3 2chainwoo B 2 3 3bc4 B 1 3 2cragglevy B 1 3 2pspdoc B 2 5 3scon1dls B 2 5 3morebv B 2 5 3augmlagn 5× 5 diagonal blocks 5 5lminsurf B 5 11 6brybnd B 5 13 7arwhead arrow 2 2nondquar arrow + B 1 4 3sinquad frame + diagonal 3 3bdqrtic arrow + B 3 8 5noncvxu2 irregular 12 7ncvxbqp1 irregular 12 7

Page 112: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Numeric Results edge pushing × Colouring methods

Star AcyclicName 1st 2nd 1st 2nd e p

cosine 9.93 0.16 9.68 2.52 0.15chainwoo 35.07 0.33 33.24 5.08 0.30bc4 10.02 0.25 10.00 2.56 0.25cragglevy 28.17 0.79 28.15 2.60 0.48pspdoc 10.31 0.35 10.27 4.39 0.23scon1dls 11.00 0.59 10.97 4.96 0.40morebv 10.36 0.46 10.33 4.49 0.35augmlagn 15.99 0.68 8.36 16.74 0.27lminsurf 9.30 1.01 9.24 3.89 0.35brybnd 11.87 2.44 11.73 12.63 1.68arwhead 176.50 0.16 45.86 0.24 0.20nondquar 166.59 0.18 28.64 2.57 0.12sinquad 606.72 0.26 888.57 1.51 0.32bdqrtic 262.64 1.34 96.87 7.80 0.80noncvxu2 29.69 1.10 29.27 7.76 0.28ncvxbqp1 13.51 2.42 – – 0.37

Averages 87.98 0.78 82.08 5.32 0.41

Variances 25 083.44 0.54 50 313.10 19.32 0.14

1

Page 113: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Graphical comparison: Star 2nd run versus edge pushing.

0.16

0.33

0.25

0.79

0.35

0.59

0.46

0.68

1.01

2.44

0.16

0.18

0.26

1.34

1.1

2.42

0.15

0.3

0.25

0.48

0.23

0.4

0.35

0.27

0.35

1.68

0.2

0.12

0.32

0.8

0.28

0.37

cosine

chainwoo

bc4

cragglevy

pspdoc

scon1dls

morebv

augmlagn

lminsurf

brybnd

arwhead

nondquar

sinquad

bdqrtic

noncvxu2

ncvxbqp1

edge pushing

Star

Time (in seconds)

Functions

1

Page 114: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:

New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:

New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 115: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.

New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:

New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 116: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.

Propagate from known contributions (nonlinear edges).

Algebraic representation:

New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 117: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:

New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 118: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:

New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 119: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:New correctness.

New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 120: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 121: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:New correctness.New algorithms.

edge pushing

Exploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 122: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:New correctness.New algorithms.

edge pushingExploits the symmetry and sparsity.

Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 123: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:New correctness.New algorithms.

edge pushingExploits the symmetry and sparsity.Promising test results.

Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 124: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

Summing up

Graph representation:New algorithm.New perspective.Propagate from known contributions (nonlinear edges).

Algebraic representation:New correctness.New algorithms.

edge pushingExploits the symmetry and sparsity.Promising test results.Lives up to Griewank 16th rule.

The calculation of gradients by nonincrementalreverse makes the corresponding computationalgraph symmetric, a property that should beexploited and maintained in accumulating Hes-sians.

Page 125: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

HANK U

QUESTIONS?

Page 126: Calculating Hessian matrices - Télécom ParisTech · Calculating Hessian matrices Calculating Hessian matrices Student Robert Mansel Gower gowerrobert@gmail.com Advisor Margarida

Calculating Hessian matrices

Hessian

Comparative tests

HANK UQUESTIONS?