WHY ARE SO MANY MATRICES OF LOW RANK IN ...pi.math.cornell.edu › ~ajt › presentations ›...

26
WHY ARE SO MANY MATRICES OF LOW RANK IN COMPUTATIONAL MATH? Alex Townsend Cornell University [email protected] Gil Strang Bernhard Beckermann https://goo.gl/NeGvlB Slides available @ Related paper @ https://goo.gl/iyrqTZ

Transcript of WHY ARE SO MANY MATRICES OF LOW RANK IN ...pi.math.cornell.edu › ~ajt › presentations ›...

WHY ARE SO MANY MATRICES OF LOW RANK IN COMPUTATIONAL MATH?

Alex Townsend Cornell [email protected]

Gil Strang Bernhard Beckermann

https://goo.gl/NeGvlBSlides available @ Related paper @ https://goo.gl/iyrqTZ

A DREADED QUESTIONMy DPhil thesis exploited the fact that many functions

are — for all practical purposes — of low rank.

Complex analysisVector calculusGlobal optimization

Problem: I didn’t understand this

May 2016: “F1 can be approximated by a rank 11 function to 15-digits”

MATHEMATICAL RANK

rank(X) = k if k is smallest s.t. X = u1v⇤1 + · · ·+ ukv

⇤k

Mathematical rank

m X

n

if k is smallest s.t. �k+1(X) = 0

singular value

Singular valuesplot

�k(X)/�1(X)

10⇥ 10 random

10⇥ 10 ones

WHAT DO LOW RANK MATRICES LOOK LIKE?

=red = 1white = 0

= + +

blue = 1 white = 0 orange = -1

``Extremely low rank matrices can be highly aligned to the grid.”

Austria Greece Scotland

NUMERICAL RANK

For 0 < ✏ < 1

Numerical rank

rank✏(X) = k if k is smallest s.t.

kX �Xkk2 ✏kXk2 for some Xk with rank(Xk) = k

336

289

Singular valuesplot

�k(X)/�1(X)

Bear

✏ = 10�2

“AVERAGE” MATRICES ARE NOT LOW RANK

almost surely of full rank!=

[Edelman (1988)]

Theorem: Let X 2 Cn⇥n and Xij ⇠ CN (0, 12I2). Then,

P [rank✏(X) n� 1] e�4(n✏)2 .

With n = 1000 and ✏ = 10�15, we have e�4(n✏)2 ⇡ 10�26

NUMERICALLY LOW RANK MATRICESVandermonde matrices:

Vn =⇥x

0 |x1 | · · · |xn�1⇤

x 2 Rn⇥1

Cauchy matrices: (under separation conditions)(Cn)ij =1

xi � yj

Solution to Poisson’s equation: uxx

+ uyy

= f Xij = u(i/n, j/n)

Pos. Def. Hankel matrices: (An)ij = hi+j

Appear in nonuniform transforms:

Diego Antolin-Ruiz

(F̃n

)ij

= e�2⇡ixij

10�2

State-of-the-art: Bear(Eckart-Young theorem)

COMMON ARGUMENTS

rank(H1000) = 1000Mathematical rankrank✏(H1000) = 28Numerical rank

Example: (The Hilbert matrix)

(Hn)ij =1

i+ j � 1

• The world is smooth: If

John Reade

f : [0, 1]2 ! C is smooth,

then has decaying s.v.’s.Xij = f(i/n, j/n)

Our reason rank✏(H1000) 34rank✏(H1000) 719Smoothness argument ( 86)

[Grasedyck, 1994]

• The world is Vandermonde: Sloppy-models

Jim Sethna

• The world is correlated: Rank correlation• The world is well-separated: FMM

THE WORLD IS SYLVESTER

THE WORLD IS SYLVESTER

2

6664

12

32

. . .n� 1

2

3

7775Hn �Hn

2

6664

� 12� 3

2. . .�n+ 1

2

3

7775=

2

6664

1 1 . . . 11 1 . . . 1...

.... . .

...1 1 . . . 1

3

7775

Example: (Hilbert matrix)

AX �XB = FSylvester matrix equation:

2

6664

x1

x2

. . .xn

3

7775Vn � Vn

2

6664

�11

. . .1

3

7775=

2

6664

⇥ 0 . . . 0⇥ 0 . . . 0...

.... . .

...⇥ 0 . . . 0

3

7775

Example: (Vandermonde matrix)

James Sylvester

Displacement rank matrices [Morf, 1974]

Others have linked Sylvester matrices to numerical low rank:[Penzl, 00], [Antoulas, 02], [Grasedyck, 04],[Sabino, 06], [Baker, et al.,15]

(Hn)ij =1

i+ j � 1

Vn =⇥x

0 |x1 | · · · |xn�1⇤

If A,B are normal and rank(F ) = r, thenTheorem:

�(A) = spectrum of A

ZOLOTAREV BOUNDSAX �XB = FSylvester matrix equation:

�k(H1000)/�1(H1000)Example: (The Hilbert matrix)

(Hn)ij =1

i+ j � 1

rank(F ) = 1

DHn �HnD = F

�1+kr(X) Zk(�(A),�(B))kXk2Zk = Zolotarev number

ZOLOTAREV NUMBERS

Yegor Zolotarevrational functionwhere

We have

Re

Im

small

�(A)�(B)

large

�(A) ✓ E�(B) ✓ F

Simplification

[Zolotarev, 1877]Studied by Goncar, Lebedev, Ganelius, Beckermann, Saff,

Wachpress, Guttel, Nakatsukasa.

Zk(�(A),�(B)) = infrk2Rk,k

supz2�(A) |rk(z)|infz2�(B) |rk(z)|

�1+kr(X) Zk(�(A),�(B))kXk2

Zk(�(A),�(B)) Zk(E ,F)

POTENTIAL THEORY

Andrei Gonchar

Re

Im

small

�(A)�(B)

large

�(A) ✓ E�(B) ✓ F

Simplification

Theorem: [Ganelius, 71], [Ganelius, 72]

If E and F are disjoint closed polygons or disks, then

Let 0 < a < b < 1. Then,

Theorem:

Zk(�(A),�(B)) Zk(E ,F)

Zk([�b,�a], [a, b]) 4⇢�2k, ⇢ � exp

✓⇡2

2 log(4b/a)

Zk(E ,F) 4000k2e�kV (E,F), V (E ,F) = min. condenser energy

[Beckermann & T.,16], [Braess,86], [Gonchar, 69]

THREE EXAMPLES

HILBERT MATRIX2

6664

12

32

. . .n� 1

2

3

7775Hn �Hn

2

6664

� 12� 3

2. . .�n+ 1

2

3

7775=

2

6664

1 1 . . . 11 1 . . . 1...

.... . .

...1 1 . . . 1

3

7775

12 n� 1

2�n+ 12 � 1

2

(Hn)ij =1

i+ j � 1

�(A) ✓ E�(B) ✓ F

�k(H1000)/�1(H1000)

Condition number bounds in [Wilf, 70]

With a = 12 and b = n� 1

2

Therefore,

rank✏(Hn) dlog(8n� 4) log(4/✏)/⇡2e

Zk([�b,�a], [a, b]) 4 exp

✓�k⇡2

log(8n� 4)

[Beckermann & T., 2017]

CAUCHY MATRIX

2

64x1

. . .xn

3

75Cn � Cn

2

64y1

. . .yn

3

75 =

2

641 · · · 1...

. . ....

1 · · · 1

3

75

a b c d

�1 1

Mobius transform

(Cn)ij =1

xi � yj, {x1, . . . , xn} ✓ [a, b], {y1, . . . , yn} ✓ [c, d]

↵�↵

�k(C1000)/�1(C1000)

Therefore,

rank✏(Cn) dlog(16�) log(4/✏)/⇡2e

With b < c or d < a, then

� =|c� a||d� b||c� b||d� a|

where

Zk([a, b], [c, d]) 4 exp

✓�k⇡2

log(16�)

VANDERMONDE MATRIX2

6664

x1

x2

. . .xn

3

7775Vn � Vn

2

6664

�11

. . .1

3

7775=

2

6664

⇥ 0 . . . 0⇥ 0 . . . 0...

.... . .

...⇥ 0 . . . 0

3

7775

E+

E�rank✏(Vn) 2d4 log(8bn/2c/⇡) log(4/✏)/⇡2e+ 2

Therefore,

x = linspace(-1,1,1000);

�k(V1000)/�1(V1000)

Vn =⇥x

0 |x1 | · · · |xn�1⇤

x 2 Rn⇥1

For the condition number, see [Beckermann, 00] & [Pan, 16]

Let x 2 Rn⇥1 and n be even:

Z2k(R, E+ [ E�) 4 exp

✓�k⇡2

4 log(4n/⇡)

If A,B are normal and rank(F ) = r, thenTheorem:

�(A) = spectrum of A

SYLVESTER TO ZOLOTAREV

Zk = Zolotarev number

�1+kr(X) Zk(�(A),�(B))kXk2

AX �XB = FSolve

ADI AS A RANK-REVEALING ALGORITHM

(A� pjI)X(B + qjI)� (A+ qjI)X(B � pjI) = (pj + qj)F

For any “shifts” pj and qj , we have

[Wachpress, 88]

[Peaceman & Rachford, 1955]The alternating direction implicit (ADI) method

Y0 = 0

Select shifts p0, . . . , pK�1 and q0, . . . , qK�1

for j = 0, . . . ,K � 1

Yj+1/2(B + qjI) = F � (A+ qjI)Yj

(A� pjI)Yj+1 = F � Yj+1/2(B � pjI)end

Optimal shifts: �1+jr(X) kX � Yjk2 = Zj(�(A),�(B))kXk2Our idea: rank(Yj) = rj, rank(F ) = r

Henry RachfordD. Peaceman

LOG-RANK MATRICES

JUST ANOTHER REASON SO FARQu: Why are so many matrices of low rank?

Ans: The world is Sylvester. Many matrix satisfyAX �XB = F, rank(F ) = r ⌧ n

and under certain assumptions are therefore numerically of low rank.

Examples that do not fit:

(H(2)n )ij =

1

(i+ j � 1)2�k(C

(2)1000)/�1(C

(2)1000)

(C(2)n )ij =

1

|!i � zj |2

LOG-RANK MATRICESm X

n

mnStorage

m

nkk

X ⇡

k(m+ n)

Mat-vec O(mn) O(k(m+ n))

Includes:

Hn C(2)n

p, q = integers

Def. A family of matrices X1, X2, . . . , are of (p, q)-log-rank if

Xn 2 Cn⇥n, rank✏(Xn) C log(n+ 1)

plog(2/✏)q, C > 1

WE CAN GENERAL LOG-RANK MATRICES TOO

(C(2)n )ij =

1

|!i � zj |2=

1

(!i � zj)(!i � zj)

Heather Wilber

2

64!1

. . .!n

3

75C(2)n � C(2)

n

2

64z1

. . .zn

3

75 = Cn =Cauchy matrix

Example: (from fast multipole method)

)if !i and zj separated

AX �XB = F, rank(F ) = rUp to now:�1+kr(X) Zk(�(A),�(B))kXk2

Dan Fortunato

Is there an optimal-complexity spectral element method?

(Log-rank is the key idea.)

Madeleine Udell

Why are low rank matrices found in big datasets?(Work just started.)

WHAT’S NEXT?

Explain the abundance of sloppy models.

Jim Sethna Katherine Quinn

A small selection of our bounds:

The world is Sylvester

WHY ARE SO MANY MATRICES OF LOW RANK?

If you have a low rank matrix and want to prove it: email me

SUMMARY

CHARACTERIZATION OF LOG-RANK MATRICES

Theorem: X1, X2, . . . , are a family of (c1p, c2q)-log-rank matrices (c1, c2 small

constants) if there exist normal A1, A2, . . . , and B1, B2, . . . , such that

AnXn �XnBn = Fn,

where �(An) and �(Bn) are separated by a gap independent of n and F1, F2, . . . ,are a family of (p, q)-log-rank matrices.