Implicit Regularization in Nonconvex Statistical Estimation · "xk,! #+"k, with prob. 1 2 "xk,$ !...

Implicit Regularization inNonconvex Statistical Estimation

Yuxin Chen

Electrical Engineering, Princeton University

Cong MaPrinceton ORFE

Kaizheng WangPrinceton ORFE

Yuejie ChiCMU ECE / OSU ECE

Nonconvex estimation problems are everywhere

Empirical risk minimization is usually nonconvex

minimizex `(x;y)

• low-rank matrix completion• graph clustering• dictionary learning• mixture models• deep learning• ...

Nonconvex estimation problems are everywhere

Empirical risk minimization is usually nonconvex

minimizex `(x;y)

• low-rank matrix completion• graph clustering• dictionary learning• mixture models• deep learning• ...

Blessing of randomness

statistical models

benign landscape

Blessing of randomness

efficient algorithms

statistical models

exploit geometry

benign landscape

Optimization-based methods: two-stage approach

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

(hxk,i+ k, with prob.

hxk,i+ k, else

y hx,i

initial guess z

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

• Start from an appropriate initial point

• Proceed via some iterative optimization algorithms

Optimization-based methods: two-stage approach

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

y = |Ax|2

| · |2

entrywise

squared magnitude

minimizeX ` (b)

s.t. bk = a

kXak, k = 1, · · · ,m

hxk,i+ k, else

y hx,i

initial guess z

basin of attraction

• Start from an appropriate initial point

• Proceed via some iterative optimization algorithms

Proper regularization is often recommended

Improves computation by stabilizing search directions

phase retrieval

matrix completion

blind deconvolution

trimming regularized cost projection

regularized cost projection

regularized regularized regularized

Are unregularized methods suboptimal for nonconvex estimation?

Proper regularization is often recommended

phase retrieval

matrix completion

blind deconvolution

trimming regularized cost projection

regularized regularized regularized

How about unregularized gradient methods?

phase retrieval

matrix completion

blind deconvolution

trimming suboptimalcomput. cost

? regularized cost projection

regularized unregularized regularized unregularized regularized unregularized

phase retrieval

matrix completion

blind deconvolution

phase retrieval

matrix completion

blind deconvolution

Phase retrieval / solving quadratic systemsA

y = |Ax|2

Y likelihood: fY |X(y | x unknown signal: X observation: Y

X y n p

Wirtinger flow (Candes, Li, Soltanolkotabi ’14)

Empirical loss minimization

minimizex f(x) = 1m

Ë!a€k x

"2 ≠ ykÈ2

• Initialization by spectral method

• Gradient iterations: for t = 0, 1, . . .

xt+1 = xt ≠ ÷t Òf(xt)

10/ 29

Recover x\ ∈ Rn from m random quadratic measurements

yk = |a>k x\|2, k = 1, . . . ,m

Assume w.l.o.g. ‖x\‖2 = 17/ 27

minimizex f(x) = 1m

[(a>k x

)2 − yk]2

xt+1 = xt − ηt∇f(xt)

minimizex f(x) = 1m

[(a>k x

)2 − yk]2

xt+1 = xt − ηt∇f(xt)

Gradient descent theory revisited

0.2 0.4 0.6 0.8 1

f(t)/p1 + t2

Two standard conditions that enable linear convergence of GD

• (local) restricted strong convexity (or regularity condition)• (local) smoothness

0.2 0.4 0.6 0.8 1

f(t)/p1 + t2

• (local) restricted strong convexity (or regularity condition)

• (local) smoothness

0.2 0.4 0.6 0.8 1

f(t)/p1 + t2

• (local) restricted strong convexity (or regularity condition)• (local) smoothness

f is said to be α-strongly convex and β-smooth if

0 αI ∇2f(x) βI, ∀x

`2 error contraction: GD with η = 1/β obeys

‖xt+1 − x\‖2 ≤(

1− 1β/α

)‖xt − x\‖2

• Attains ε-accuracy within O(βα log 1

)iterations

10/ 27

f is said to be α-strongly convex and β-smooth if

0 αI ∇2f(x) βI, ∀x

`2 error contraction: GD with η = 1/β obeys

‖xt+1 − x\‖2 ≤(

1− 1β/α

)‖xt − x\‖2

• Attains ε-accuracy within O(βα log 1

)iterations

10/ 27

What does this optimization theory say about WF?

Gaussian designs: aki.i.d.∼ N (0, In), 1 ≤ k ≤ m

Population level (infinite samples)

E[∇2f(x)

] 0 and is well-conditioned (locally)

Consequence: WF converges within logarithmic iterations if m→∞

Finite-sample level (m n logn)

∇2f(x) 0

but ill-conditioned︸︷︷︸condition number n

(even locally)

Consequence (Candes et al ’14): WF attains ε-accuracy withinO(n log 1

)iterations if m n logn

Too slow ... can we accelerate it?

11/ 27

E[∇2f(x)

∇2f(x) 0

(even locally)

11/ 27

E[∇2f(x)

∇2f(x) 0

(even locally)

11/ 27

E[∇2f(x)

∇2f(x) 0 but ill-conditioned︸︷︷︸condition number n

(even locally)

11/ 27

E[∇2f(x)

(even locally)

11/ 27

E[∇2f(x)

(even locally)

11/ 27

One solution: truncated WF (Chen, Candes ’15)

Regularize / trim gradient components to accelerate convergence

12/ 27

But wait a minute ...

WF converges in O(n) iterations

Step size taken to be ηt = O(1/n)

This choice is suggested by optimization theory

Does it capture what really happens?

13/ 27

This choice is suggested by optimization theory

13/ 27

This choice is suggested by generic optimization theory

13/ 27

This choice is suggested by worst-case optimization theory

13/ 27

This choice is suggested by worst-case optimization theory

13/ 27

Numerical surprise with ηt = 0.1

0 100 200 300 400 50010-15

Vanilla GD (WF) can proceed much more aggressively!

14/ 27

A second look at gradient descent theoryWhich region enjoys both strong convexity and smoothness?

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

minimizeh,x f(h, x) =mX

i XX>ej e>i X\X\>ej

find X 2 Rnr s.t. e>j XX>ei = e>

j X\X\>ei, (i, j) 2

find h, x 2 Cn s.t. bi hix

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

• x is not far away from x\

• x is incoherent w.r.t. sampling vectors (incoherence region)

15/ 27

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

minimizeh,x f(h, x) =

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

• x is incoherent w.r.t. sampling vectors (incoherence region)15/ 27

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

minimizeh,x f(h, x) =

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

• x is incoherent w.r.t. sampling vectors (incoherence region)15/ 27

A second look at gradient descent theory

region of local strong convexity + smoothness

• Prior theory only ensures that iterates remain in `2 ball but notincoherence region

• Prior works enforce explicit regularization to promote incoherence

16/ 27

• Prior theory only ensures that iterates remain in `2 ball but notincoherence region• Prior works enforce explicit regularization to promote incoherence

16/ 27

Our findings: GD is implicitly regularized

GD implicitly forces iterates to remain incoherent

17/ 27

Theoretical guarantees

Theorem 1 (Phase retrieval)Under i.i.d. Gaussian design, WF achieves• maxk

∣∣a>k (xt − x\)∣∣ .√

logn ‖x\‖2 (incoherence)

• ‖xt − x\‖2 .(1− η

2)t ‖x\‖2 (near-linear convergence)

provided that step size η 1logn and sample size m & n logn.

• Much more aggressive step size: 1logn (vs. 1

• Computational complexity: n/ logn times faster than existingtheory for WF

18/ 27

∣∣a>k (xt − x\)∣∣ .√

logn ‖x\‖2 (incoherence)• ‖xt − x\‖2 .

(1− η

18/ 27

∣∣a>k (xt − x\)∣∣ .√

(1− η

18/ 27

∣∣a>k (xt − x\)∣∣ .√

(1− η

18/ 27

Key ingredient: leave-one-out analysis

For each 1 ≤ l ≤ m, introduce leave-one-out iterates xt,(l)

by dropping lth measurementA

19/ 27

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

incoherence region w.r.t. a1

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

xt,(1) xt

• Leave-one-out iterates xt,(l) are independent of al, and are henceincoherent w.r.t. al with high prob.

• Leave-one-out iterates xt,(l) ≈ true iterates xt

20/ 27

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

. log n

a1 a2 x\

a>2 (x x\)

kx x\k2

. log n

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

xt,(1) xt

minimizex f(x) =1

i x)2 yi

x 2 Rns.t.a>

, 1 i m

minimizeX f(X) =X

(i,j)2

i XX>ej e>i X\X\>ej

j X\X\>ei, (i, j) 2

i ai = b

i h\ix

\i ai, 1 i m

max(r2f(x))

a1 a2 x\

a>1 (x x\)

kx x\k2

a1 a2 x\

a>2 (x x\)

kx x\k2

xt,(1) xt

• Leave-one-out iterates xt,(l) are independent of al, and are henceincoherent w.r.t. al with high prob.• Leave-one-out iterates xt,(l) ≈ true iterates xt

20/ 27

This recipe is quite general

Low-rank matrix completion

X ? ? ? X ?? ? X X ? ?X ? ? X ? ?? ? X ? ? XX ? ? ? ? ?? X ? ? X ?? ? X X ? ?

? ? ? ?

Fig. credit: Candes

Given partial samples Ω of a low-rank matrix M , fill in missing entries

22/ 27

Prior art

minimizeX f(X) =∑

(j,k)∈Ω

(e>j XX>ek −Mj,k

Existing theory on gradient descent requires

• regularized loss (solve minX f(X) +R(X) instead) Keshavan, Montanari, Oh ’10, Sun, Luo ’14, Ge, Lee, Ma ’16

• projection onto set of incoherent matrices Chen, Wainwright ’15, Zheng, Lafferty ’16

23/ 27

Prior art

minimizeX f(X) =∑

(j,k)∈Ω

(e>j XX>ek −Mj,k

23/ 27

Prior art

minimizeX f(X) =∑

(j,k)∈Ω

(e>j XX>ek −Mj,k

23/ 27

Theorem 2 (Matrix completion)Suppose M is rank-r, incoherent and well-conditioned. Vanillagradient descent (with spectral initialization) achieves ε accuracy• in O

(log 1

)iterations

w.r.t. ‖ · ‖F, ‖ · ‖, and ‖ · ‖2,∞︸︷︷︸incoherence

if step size η . 1/σmax(M) and sample size & nr3 log3 n

• Byproduct: vanilla GD controls entrywise error — errors arespread out across all entries

24/ 27

(log 1

)iterations w.r.t. ‖ · ‖F, ‖ · ‖, and ‖ · ‖2,∞︸︷︷︸

incoherence

24/ 27

(log 1

)iterations w.r.t. ‖ · ‖F, ‖ · ‖, and ‖ · ‖2,∞︸︷︷︸

incoherence

24/ 27

Blind deconvolution

Fig. credit: Romberg

Fig. credit:EngineeringsALL

Reconstruct two signals from their convolution

Vanilla GD attains ε-accuracy within O(

log 1ε

)iterations

25/ 27

Incoherence region in high dimensions

·· · ··

2-dimensional high-dimensional (mental representation)︸︷︷︸incoherence region is vanishingly small

26/ 27

Summary

• Implict regularization: vanilla gradient descent automaticallyfoces iterates to stay incoherent

• Enable error controls in a much stronger sense (e.g. entrywiseerror control)

Paper:

“Implicit regularization in nonconvex statistical estimation: Gradient descentconverges linearly for phase retrieval, matrix completion, and blind deconvolution”,Cong Ma, Kaizheng Wang, Yuejie Chi, Yuxin Chen, arXiv:1711.10467

27/ 27

Summary

• Implict regularization: vanilla gradient descent automaticallyfoces iterates to stay incoherent

• Enable error controls in a much stronger sense (e.g. entrywiseerror control)

Paper:

“Implicit regularization in nonconvex statistical estimation: Gradient descentconverges linearly for phase retrieval, matrix completion, and blind deconvolution”,Cong Ma, Kaizheng Wang, Yuejie Chi, Yuxin Chen, arXiv:1711.10467

27/ 27

Implicit Regularization in Nonconvex Statistical Estimation · "xk,! #+"k, with prob. 1 2 "xk,$ !...

Documents

Transcript of Implicit Regularization in Nonconvex Statistical Estimation · "xk,! #+"k, with prob. 1 2 "xk,$ !...

Self-intersecting nonconvex polygons are filled according ...library.wolfram.com/Infocenter/Books/8510/VisualizationAndGraphicsPart2a.pdfSelf-intersecting nonconvex polygons are filled

Nonconvex Wireless

AD-A233 965 · imizer of the one-dimensional quadratic interpolating f(xk), "f(xk)Tpk, and f(xk + p.) but constrain the new Ak to be > 0.1, else select the new Ak such that X +±

XK Accessories - Matt Hoppermatt.zenfolio.com/xk-accessories--small-version-.pdf · considerations, such as wheel styles, XK accessories have been thoughtfully developed to complement

Introduction to nonconvex optimization

Nonconvex Optimization for Communication Systems

Misspeci ed Nonconvex Statistical Optimization for … · Misspeci ed Nonconvex Statistical Optimization for Phase Retrieval ... X-ray crystallography (Harrison, 1993), electron microscopy

CONSUMER THEORY WITH NONCONVEX CONSUMPTION

2007 XK Commander

Nonsmooth, Nonconvex OptimizationNonsmooth, Nonconvex Optimization Introduction Nonsmooth, Nonconvex Optimization Example Methods Suitable for Nonsmooth Functions Failure of Steepest

Nonconvex Mixed-Integer Nonlinear Optimization

Nonconvex ADMM for Distributed Sparse PCApeople.ece.umn.edu/~mhong/GlobalSIP15.pdf · IntroductionDistributed SPCA rmulationsFoProposed ADMM Algorithm Numerical Results Nonconvex

Chapter 6 Metric projection methodsbaumeist/Nonex-Kap6.pdf · We conclude from Theorem 3.3 (5) that xk −xk+1 is orthogonal to xk+1 and hence by the law of Pythagoras x n 2 = xn+1

Nonconvex Robust Optimization for Problems with Constraintsweb.mit.edu/~dbertsim/www/papers/Robust... · robust optimization problem is given through min x gx ≡min x max x∈ fx+

Print Preview - C:TEMP.aptcacheG4086494 … Upd… · · 2014-01-07Technical Guide X-TYPE Estate Introduction Plus X-TYPE, S-TYPE, XK 2005 Model Year Updates.

Nonconvex Optimization for Communication …chiangm/nonconvex.pdfKey words: Nonconvex optimization, ... • Nonconvex constraint set. ... Nonconvex Optimization for Communication Systems

Nonsmooth and nonconvex optimization under …...Smooth nonconvex optimization under statistical assumptions Empirical risk minimization. min x f(x;A) Hard in general, but “easy”

Chapter 2 State Space Models · 2014. 1. 21. · Chapter 2 State Space Models General format : (valid for any nonlinear causal system) CT : ˙x= f(x,u), DT : xk+1 = f(xk,uk), y= h(x,u).

Manual XK 315A

SolutionsandNotestoSelectedProblemsIn: NumericalOptimzation … · 2019-12-12 · From this and using the “Kronecker delta” we have ∂f2(x) ∂xk X i,j δikaijxj + X i,j xiaijδjk