Simple implementation and examples for piecewise ... implementation and examples for piecewise...
Transcript of Simple implementation and examples for piecewise ... implementation and examples for piecewise...
Simple implementation and examples forpiecewise linearizationwith abs-normal form
Koichi KUBOTA
Chuo University
AD2016
1 / 31
Outline
IntroductionANF: Abs-normal formrewrite function
Implementation notesListing up the abs operations with value zeroComputation of derivativesLocal minimization
Branch and bound search
ExamplesSimple caseMax-min of many planes
Local minimal point n = 32 (64 planes, s = 63)Not local minimal point n = 16 (32 planes, s = 31)
Conclusions and further work
2 / 31
Introduction
Previous tools
I Most AD tools can treat the derivative of |x | at x 6= 0I But they can not do its derivative at x = 0.
I (The directed derivatives can be computed.)
I Some may display alerts and may compute an element of thesubgradient...
Yaad: Yet another AD with piecewise linearization
I computation of piecewise linearization including abs,min, maxat a critical point with abs-normal form (ANF given byA.Griewank[2013])
I reverse/forward accumulation of derivatives
I checking the first order local optimality with LinearProgramming
3 / 31
Abs-normal form [Griewank2013]
Absolute Normal Form (ANF) is introduced by A. Griewank(2013). ANF can treat |x | at x = 0 systematically by means of thedirectional derivatives.
Original function and evaluation procedure
y = f(x) (f : Rn → Rm)w1 = ϕ1(x1, · · · , xn)w2 = ϕ2(x1, · · · , xn,w1)
...wk = ϕk(x1, · · · , xn,w1, · · · ,wk−1)
...w`−m = ϕ`−m(x1, · · · , xn,w1, · · · ,w`−m−1)
y1 = w`−m+1 = ϕ`−m+1(x1, · · · , xn,w1, · · · ,w`−m)...
ym = w` = ϕ`(x1, · · · , xn,w1, · · · ,w`−m)
4 / 31
Abs-normal form [Griewank2013]
Extract absolute operations with zero arguments
Note that the values of the following wβ1 , · · · , wβs are zero!.· · ·
wβ1 = · · ·wα1 = ϕα1(x1, · · · , xn,w1, · · · ,wα1−1) = |wβ1 |
· · ·wβ2 = · · ·
· · ·wα2 = ϕα2(x1, · · · , xn,w1, · · · ,wβ2−1) = |wβ2 |
· · ·wβs = · · ·
· · ·wαs = ϕαs (x1, · · · , xn,w1, · · · ,wβs−1) = |wβs |
· · ·
5 / 31
Abs-normal form [Griewank2013]
Rename variables
I results: wα1 ⇒ v1, · · · , wαs ⇒ vsI arguments: wβ1 ⇒ u1, · · · , wβs ⇒ us
v1 = |u1|...
vs = |us |
6 / 31
Rewrite the function
Define functions: g : Rn × Rs → Rs , h : Rn × Rs → Rm
u = g(x, v)v = abs(u) componentwise:vk = abs(uk) = |uk |y = h(x, v)
7 / 31
Rewrite function: example
Example1: f (x1, x2) = min(|x1 + x2|, |x1 − x2|), where n = 2,m = 1, min(a, b) = (a + b − |a− b|) ∗ 0.5
Consider the case: x1 = 0.0 and x2 = 0.0 (s = 3)w1 = x1 + x2
w2 = abs(w1)w3 = x1 − x2
w4 = abs(w3)w5 = w2 − w4
w6 = abs(w5)w7 = w2 + w4 − w6
y = w8 = w7 ∗ 0.5
u1 ≡ w1, u2 ≡ w3, u3 ≡ w5,v1 ≡ w2 = |u1|, v2 ≡ w4 = |u2|, v3 ≡ w6 = |u3|.
8 / 31
Rewrite function: example
Example1 (mod): f (x1, x2) = min(|x1 + x2|, |x1 − x2|), wheren = 2, m = 1, min(a, b) = (a + b − |a− b|) ∗ 0.5
Consider the case: x1 = 0.1 and x2 = 0.1 (s = 1)w1 = x1 + x2
w2 = abs(w1)w3 = x1 − x2
w4 = abs(w3)w5 = w2 − w4
w6 = abs(w5)w7 = w2 + w4 − w6
y = w8 = w7 ∗ 0.5
u1 ≡ w3,v1 ≡ w4 = |u1|.
9 / 31
Rewrite the function
Example1: x1 = 0.0 and x2 = 0.0u1 = x1 + x2
v1 = abs(u1)u2 = x1 − x2
v2 = abs(u2)u3 = v1 − v2
v3 = abs(u3)w7 = v1 + v2 − v3
y = w8 = w7 ∗ 0.5
⇒
u1 = x1 + x2 ([1])u2 = x1 − x2 ([3])u3 = v1 − v2 ([5])v1 = abs(u1) ([2])v2 = abs(u2) ([4])v3 = abs(u3) ([6])w7 = v1 + v2 − v3 ([7])
y = w8 = w7 ∗ 0.5 ([8])
u =
u1
u2
u3
= g(x1, x2, v1, v2, v3) =
x1 + x2
x1 − x2
v1 − v2
y = h(x1, x2, v1, v2, v3) = 0.5v1 + 0.5v2 − 0.5v3
10 / 31
Rewrite procedure for linearization
Note that uk = 0.0 for k = 1, 2, 3.
rewrite for linearization∆u1 = ∆x1 + ∆x2
∆v1 = abs(∆u1) = sign(∆u1) ·∆u1 = σ1∆u1
∆u2 = ∆x1 −∆x2
∆v2 = abs(∆u2) = σ2∆u2
∆u3 = ∆v1 −∆v2
∆v3 = abs(∆u3) = σ3∆u3
∆w7 = ∆v1 + ∆v2 −∆v3
∆y = ∆w8 = ∆w7 ∗ 0.5
σk =
{1 if ∆uk > 0−1 if ∆uk < 0
}
11 / 31
Rewrite procedure for linearization
Note that uk = 0.0 for k = 1, 2, 3.
rewrite for linearization∆u1 = ∆x1 + ∆x2
∆u2 = ∆x1 −∆x2
∆u3 = ∆v1 −∆v2
∆v1 = abs(∆u1) = σ1∆u1
∆v2 = abs(∆u2) = σ2∆u2
∆v3 = abs(∆u3) = σ3∆u3
∆w7 = ∆v1 + ∆v2 −∆v3
∆y = ∆w8 = ∆w7 ∗ 0.5
σk =
{1 if ∆uk > 0−1 if ∆uk < 0
}
12 / 31
Differentiation of Absolute function [Griewank2013]
Absolute function
v = |u| = abs(u) = sign(u) · u
v + ∆v = |u + ∆u|= sign(u + ∆u) · (u + ∆u)
= sign(u + ∆u) ·∆u + sign(u + ∆u) · u
∆v = sign(u + ∆u) ·∆u + (sign(u + ∆u)− sign(u)) · u
when u = 0
∆v = sign(∆u) ·∆u = |∆u|
13 / 31
Directed derivatives
Now we can compute the directed derivatives for given ∆x:
y = f(x)
y + ∆y = f(x + ∆x) = f(x) +∂f
∂x∆x + · · ·
rewrite the above as follows:
∆u =∂g
∂x∆x +
∂g
∂v∆v,
∆v = sign(u + ∆u) ·∆u + (sign(u + ∆u)− sign(u)) · u,= |∆u| when all elements of u are zeros
∆y =∂h
∂x∆x +
∂h
∂v∆v.
Note that when we give a value of direction ∆x, we get the valuesof ∆u, ∆v and ∆y. Moreover, when we assume the sign of ∆ufirstly, we get the coefficients of ∆v and ∆y with respect to linearcombinations of ∆x = (∆x1, . . . ,∆xn).
14 / 31
What is my talk
I We check the above concepts with c++ operator overloadprogram for generating computational graph G .
I For a scalar function f (x), we check the first order optimalitycondition (minimization or maximization) by solving LP(Linear Programming) repeatedly.
I The number of solving different LP is O(2s) (naively count)I Reduce the number by branch and bound.
15 / 31
Computation of derivatives
implementation outline
The partial derivatives ∂g/∂x, ∂g/∂v, ∂h/∂x, ∂h/∂v arecomputed as follows.
(i) Make a computational graph G of f as well as the sets ofrenamed nodes, i.e., V = {v1, . . . , vs} and U = {u1, . . . , us}.
(ii) For each node v in V , change the node type of v from thetype that is the result of abs() operation to the type that isthe new independent variable.
(iii) Compute ∂gk/∂xj (j = 1, . . . , n) and ∂gk/∂v` (` = 1, . . . , s)by reverse mode (or forward mode) for all uk ∈ U(k = 1, . . . , s). That is, after the topological sort with thedepth first search from uk , the adjoint values are computedfor all w ≺ uk ∈ G .
(iv) Compute ∂h/∂xj (j = 1, . . . , n) and ∂h/∂v` (` = 1, . . . , s) byreverse mode from the node corresponding to y = f (x).
16 / 31
Representation of ∆v, ∆u, ∆y [Griewank2013]
∆u = (∂g/∂x) ∆x + (∂g/∂v) ∆v∆v = Σ∆u,where Σ ≡ diag(σ1, . . . , σs) is a diagonal matrix.we can eliminate ∆v and get the explicit form ∆u with respect to∆x, that is,∆u = (I − (∂g/∂v) Σ)−1 (∂g/∂x) ∆x.Thus the directed derivative ∆y = (∂f /∂x)∆x is computed by
∆y = (∂h/∂x) ∆x + (∂h/∂v)Σ(I − (∂g/∂v)Σ)−1(∂g/∂x)∆x. (1)
U ≡ (∂g/∂x), L ≡ (∂g/∂v), J ≡ (∂h/∂x) and V ≡ (∂h/∂v).
∆y =(J + VΣ(I − LΣ)−1U
)∆x
17 / 31
Check the local minimum of the linearization
One of key advantages of the abs-normal form is that thecoefficients of a directional derivative ∆y can be computed withthe sign of the ∆u.The sign of the ∆u is computed by the value of the direction ∆x.
Usually
Given ∆x , compute y = f (x)⇒ y + ∆y = f (x + ∆x).
Usually (Abs-normal form)
Given ∆x, compute u = g(x, v), v = |u| ⇒ y = h(x, v)Σ is determined by ∆x∆u = U∆x,∆v = Σ∆u⇒ ∆y = J∆x + V∆v
Fix Σ first(Abs-normal form)
Fix Σ, compute ∆y = J∆x + V∆vCheck the existence of the direction ∆x that realizes the given Σ.
18 / 31
Check the existence of ∆x with respect to Σ
The k-th diagonal element of Σ, σk , indicates the sign of ∆uk .
Check the existence (or finding the subdomain)
I Compute ∆uk as a linear combination of ∆x (k = 1, · · · , s)∆uk > 0 for σk = 1, or, ∆uk < 0 for σk = −1
I Construct an s × n matrix A whose k-th row is ∆uk(ifσk = 1) or −∆uk(if σk = −1)
I Check the feasibility of an LP:min∆x ∆y s.t. A∆x > 0 and − inf < ∆xj < inf
I if it is feasible, there is a direction ∆x that realizes all the signof ∆uk ’s equal to σk ’s.
I if it is not feasible, there is no direction that realizes Σ.
19 / 31
Check the first order condition of locally minimum of the linearization
Check the first order condition of locally minimum of thelinearization
I There are 2s combinations of the values of σk ’s.
I Thus, we can compute a direction that gives ∆y < 0 (or∆y > 0) after solving LP 2s times.
I When we want to know locally minimum (maximum),we check there are no direction that gives ∆y < 0 (∆y > 0).
20 / 31
Algorithm
Algorithm
I (i) Fix a combination of the diagonal values ofΣ = diag(σ1, . . . , σs) = diag(±1, . . . ,±1).
I (ii) Compute ∆u = (I − LΣ)−1U∆x and∆y = (J + VΣ(I − LΣ)−1U)∆x.The coefficients of ∆x1, . . ., ∆xn of the above ∆y may giveone of the generalized gradients.
I (iii) Check the direction ∆x that gives the directed derivative∆y by feasibility of A∆x > 0 with LP
I (iv) Finally, for each matrix A corresponding to all thepossible diagonal matrices Σ, we can check the infeasibility of∆y < 0 and A∆x > 0. When there are no feasible solutionsthe given point x is local minimum or stationary point of thelinearization.
21 / 31
Branch and bound search
I The number of solving LP is at most 2s .I Let a k × n matrix A denote submatrix of A (k ≤ s).
I When A∆x > 0 is feasible, any subsystem of A∆x > 0 shouldbe feasible,
I When A∆x > 0 is infeasible, its extended system A∆x > 0should be infeasible.
I Construct matrix A from the first row to s th row with step bystep (gk indicates the k-th component of g(x, v))
∆u1 =∂g1
∂x∆x
∆u2 =∂g2
∂x∆x +
∂g2
∂v1∆v1 =
∂g2
∂x∆x +
∂g2
∂v1σ1∆u1
· · ·
∆uk =∂gk∂x
∆x +k−1∑j=1
∂gk∂vj
σj∆uj
22 / 31
Branch and bound
Construct sub matrix A(k)
∆uk =∂gk∂x
∆x +k−1∑j=1
∂gk∂vj· σj ·∆uj (k = 1, . . . , s).
After the computation of ∆uk in the explicit form with respect to∆x under the selected sign combination of σ1, . . ., σk , we have∆u1, . . ., ∆uk .Then, we can check the feasibility of A(k)∆x > 0, where the
coefficient matrix is defined by A(k) ≡
σ1∆u1...
σk∆uk
.
I When A(k)∆x > 0 is feasible, we should check A(k+1)∆x > 0is feasible or infeasible for σk+1 = 1 and σk+1 = −1.
I When it is infeasible, there are no feasible solution with thecurrent σ1, . . ., σk . Try the next combination.
23 / 31
Example
Again, f (x1, x2) = min(|x1 + x2|, |x1 − x2|), x1 = 0.0 and x2 = 0.0.
Abs-normal form
∆u =
∆u1
∆u2
∆u3
=
∆x1 + ∆x2
∆x1 −∆x2
∆v1 −∆v2
,
∆v1
∆v2
∆v3
=
σ1∆u1
σ2∆u2
σ3∆u3
,
∆y = 0.5 ∗ (∆v1 + ∆v2 −∆v3)
Partial derivativesWith the forward AD technique, we have
U = (∂g/∂x) =
1 11 −10 0
, L = (∂g/∂v) =
0 0 00 0 01 −1 0
,
J = (∂h/∂x) =(
0 0), V = (∂h/∂v) =
(0.5 0.5 −0.5
).
24 / 31
Explicit ∆y form with respect to ∆x
case: ∆u1 > 0, ∆u2 > 0 and ∆u3 > 0
Σ =
σ1 0 00 σ2 00 0 σ3
=
1 0 00 1 00 0 1
, ∆v1 = ∆u1, ∆v2 = ∆u2,
∆v3 = ∆u3.
The conditions, ∆u1 = ∆x1 + ∆x2 > 0, ∆u2 = ∆x1 −∆x2 > 0and ∆u3 = ∆v1 −∆v2 = ∆u1 −∆u2 = 2∆x2 > 0, hold for∆x1 > ∆x2 > 0.
∆y = (J + VΣ(I − LΣ)−1U)∆x
=
( 0 0)
+(
12
12 − 1
2
) 1 0 00 1 01 −1 1
1 11 −10 0
∆x
=(
1 −1)·∆x = ∆x1 −∆x2.
25 / 31
Generalized gradient
Table : The generalized gradient ∆y
No. ∆u1 ∆u2 ∆u3 σ1 σ2 σ3 Domain ∆y
(1) > 0 > 0 > 0 1 1 1 ∆x1 > ∆x2 > 0 ∆x1 −∆x2
(2) < 0 > 0 > 0 −1 1 1 0 > ∆x1 > ∆x2 ∆x1 −∆x2
(3) > 0 < 0 > 0 1 −1 1 0 < ∆x1 < ∆x2 −∆x1 + ∆x2
(4) < 0 < 0 > 0 −1 −1 1 ∆x1 < ∆x2 < 0 −∆x1 + ∆x2
(5) > 0 > 0 < 0 1 1 −1 ∆x1 > −∆x2 > 0 ∆x1 + ∆x2
(6) < 0 > 0 < 0 −1 1 −1 0 < ∆x1 < −∆x2 −∆x1 −∆x2
(7) > 0 < 0 < 0 1 −1 −1 0 < −∆x1 < ∆x2 ∆x1 + ∆x2
(8) < 0 < 0 < 0 −1 −1 −1 −∆x1 > ∆x2 > 0 −∆x1 −∆x2
27 / 31
Max-min of planes
Scalar function f (x1, x2) is defined with 2n planes as
f (x1, x2) ≡ max0≤`<n
min(a2`x1 + b2`x2, a2`+1x1 + b2`+1x2).
The k th plane akx1 + bkx2 (k = 0, . . ., 2n− 1) is defined by three points
(pk , qk , rk), (pk+1, qk+1, rk+1) and (0, 0, 0), where
(pk , qk) = (cos(πn k), sin(π
n k)) and arbitrarily given rk((p2n, q2n, r2n) ≡ (p0, q0, r0)).
Figure : Origin is local minimal point28 / 31
Example2-1
n = 32, r2k = 0.3, r2k+1 = 1.0 (k = 0, . . . , 31).Locally minimal point at (0, 0).There are 63 absolute operations whose results are zero, s = 63.The number of solving LP is 2s = 263 ' 1019
the total number of solving LP is only 10340 with the branch andbound search.Computational time is about 7 seconds (Ubuntu14.04LTS, VMware
Fusion 7.1.3, Macbook pro core i7).
Figure : Origin is local minimal point29 / 31
Example2-2
n = 16, r26 = −0.1, r2k = 0.3 (k = 0, . . . , 12, 14, 15), andr2k+1 = 1.0 (k = 0, . . . , 15).The origin (0, 0) is not locally minimal.There are directions along with which the value of f is decreased.There are 31 absolute operations whose results are zero, s = 31.The number of solving LP is 2s = 231 ' 2× 109
the total number of solving LP is only 2222 with the branch andbound search.
Figure : Origin is not local minimal point 30 / 31
Conclusion
Conclusion:
I Simple implementation of piecewise linearization with“abs-normal-form” in C++
I An efficient way to check the local minimal point with branchand bound technique,
I Two dimensional examples that may have 2n planes wereshown, where the number of solving LP is reduced:263 → 10340, 231 → 2222.
Future work:
I More practical experiments and the investigation of theoptimal topological order for the branch and bound are needed
I the investigation of the higher derivatives with absoluteoperations and effects of numerical computational errors.
31 / 31