WHY ARE SO MANY MATRICES OF LOW RANK IN ...pi.math.cornell.edu › ~ajt › presentations ›...
Transcript of WHY ARE SO MANY MATRICES OF LOW RANK IN ...pi.math.cornell.edu › ~ajt › presentations ›...
WHY ARE SO MANY MATRICES OF LOW RANK IN COMPUTATIONAL MATH?
Alex Townsend Cornell [email protected]
Gil Strang Bernhard Beckermann
https://goo.gl/NeGvlBSlides available @ Related paper @ https://goo.gl/iyrqTZ
A DREADED QUESTIONMy DPhil thesis exploited the fact that many functions
are — for all practical purposes — of low rank.
Complex analysisVector calculusGlobal optimization
Problem: I didn’t understand this
May 2016: “F1 can be approximated by a rank 11 function to 15-digits”
MATHEMATICAL RANK
rank(X) = k if k is smallest s.t. X = u1v⇤1 + · · ·+ ukv
⇤k
Mathematical rank
m X
n
if k is smallest s.t. �k+1(X) = 0
singular value
Singular valuesplot
�k(X)/�1(X)
10⇥ 10 random
10⇥ 10 ones
WHAT DO LOW RANK MATRICES LOOK LIKE?
=red = 1white = 0
= + +
blue = 1 white = 0 orange = -1
``Extremely low rank matrices can be highly aligned to the grid.”
Austria Greece Scotland
NUMERICAL RANK
For 0 < ✏ < 1
Numerical rank
rank✏(X) = k if k is smallest s.t.
kX �Xkk2 ✏kXk2 for some Xk with rank(Xk) = k
336
289
Singular valuesplot
�k(X)/�1(X)
Bear
✏ = 10�2
“AVERAGE” MATRICES ARE NOT LOW RANK
almost surely of full rank!=
[Edelman (1988)]
Theorem: Let X 2 Cn⇥n and Xij ⇠ CN (0, 12I2). Then,
P [rank✏(X) n� 1] e�4(n✏)2 .
With n = 1000 and ✏ = 10�15, we have e�4(n✏)2 ⇡ 10�26
NUMERICALLY LOW RANK MATRICESVandermonde matrices:
Vn =⇥x
0 |x1 | · · · |xn�1⇤
x 2 Rn⇥1
Cauchy matrices: (under separation conditions)(Cn)ij =1
xi � yj
Solution to Poisson’s equation: uxx
+ uyy
= f Xij = u(i/n, j/n)
Pos. Def. Hankel matrices: (An)ij = hi+j
Appear in nonuniform transforms:
Diego Antolin-Ruiz
(F̃n
)ij
= e�2⇡ixij
10�2
State-of-the-art: Bear(Eckart-Young theorem)
COMMON ARGUMENTS
rank(H1000) = 1000Mathematical rankrank✏(H1000) = 28Numerical rank
Example: (The Hilbert matrix)
(Hn)ij =1
i+ j � 1
• The world is smooth: If
John Reade
f : [0, 1]2 ! C is smooth,
then has decaying s.v.’s.Xij = f(i/n, j/n)
Our reason rank✏(H1000) 34rank✏(H1000) 719Smoothness argument ( 86)
[Grasedyck, 1994]
• The world is Vandermonde: Sloppy-models
Jim Sethna
• The world is correlated: Rank correlation• The world is well-separated: FMM
THE WORLD IS SYLVESTER
2
6664
12
32
. . .n� 1
2
3
7775Hn �Hn
2
6664
� 12� 3
2. . .�n+ 1
2
3
7775=
2
6664
1 1 . . . 11 1 . . . 1...
.... . .
...1 1 . . . 1
3
7775
Example: (Hilbert matrix)
AX �XB = FSylvester matrix equation:
2
6664
x1
x2
. . .xn
3
7775Vn � Vn
2
6664
�11
. . .1
3
7775=
2
6664
⇥ 0 . . . 0⇥ 0 . . . 0...
.... . .
...⇥ 0 . . . 0
3
7775
Example: (Vandermonde matrix)
James Sylvester
Displacement rank matrices [Morf, 1974]
Others have linked Sylvester matrices to numerical low rank:[Penzl, 00], [Antoulas, 02], [Grasedyck, 04],[Sabino, 06], [Baker, et al.,15]
(Hn)ij =1
i+ j � 1
Vn =⇥x
0 |x1 | · · · |xn�1⇤
If A,B are normal and rank(F ) = r, thenTheorem:
�(A) = spectrum of A
ZOLOTAREV BOUNDSAX �XB = FSylvester matrix equation:
�k(H1000)/�1(H1000)Example: (The Hilbert matrix)
(Hn)ij =1
i+ j � 1
rank(F ) = 1
DHn �HnD = F
�1+kr(X) Zk(�(A),�(B))kXk2Zk = Zolotarev number
ZOLOTAREV NUMBERS
Yegor Zolotarevrational functionwhere
We have
Re
Im
small
�(A)�(B)
large
�(A) ✓ E�(B) ✓ F
Simplification
[Zolotarev, 1877]Studied by Goncar, Lebedev, Ganelius, Beckermann, Saff,
Wachpress, Guttel, Nakatsukasa.
Zk(�(A),�(B)) = infrk2Rk,k
supz2�(A) |rk(z)|infz2�(B) |rk(z)|
�1+kr(X) Zk(�(A),�(B))kXk2
Zk(�(A),�(B)) Zk(E ,F)
POTENTIAL THEORY
Andrei Gonchar
Re
Im
small
�(A)�(B)
large
�(A) ✓ E�(B) ✓ F
Simplification
Theorem: [Ganelius, 71], [Ganelius, 72]
If E and F are disjoint closed polygons or disks, then
Let 0 < a < b < 1. Then,
Theorem:
Zk(�(A),�(B)) Zk(E ,F)
Zk([�b,�a], [a, b]) 4⇢�2k, ⇢ � exp
✓⇡2
2 log(4b/a)
◆
Zk(E ,F) 4000k2e�kV (E,F), V (E ,F) = min. condenser energy
[Beckermann & T.,16], [Braess,86], [Gonchar, 69]
HILBERT MATRIX2
6664
12
32
. . .n� 1
2
3
7775Hn �Hn
2
6664
� 12� 3
2. . .�n+ 1
2
3
7775=
2
6664
1 1 . . . 11 1 . . . 1...
.... . .
...1 1 . . . 1
3
7775
12 n� 1
2�n+ 12 � 1
2
(Hn)ij =1
i+ j � 1
�(A) ✓ E�(B) ✓ F
�k(H1000)/�1(H1000)
Condition number bounds in [Wilf, 70]
With a = 12 and b = n� 1
2
Therefore,
rank✏(Hn) dlog(8n� 4) log(4/✏)/⇡2e
Zk([�b,�a], [a, b]) 4 exp
✓�k⇡2
log(8n� 4)
◆
[Beckermann & T., 2017]
CAUCHY MATRIX
2
64x1
. . .xn
3
75Cn � Cn
2
64y1
. . .yn
3
75 =
2
641 · · · 1...
. . ....
1 · · · 1
3
75
a b c d
�1 1
Mobius transform
(Cn)ij =1
xi � yj, {x1, . . . , xn} ✓ [a, b], {y1, . . . , yn} ✓ [c, d]
↵�↵
�k(C1000)/�1(C1000)
Therefore,
rank✏(Cn) dlog(16�) log(4/✏)/⇡2e
With b < c or d < a, then
� =|c� a||d� b||c� b||d� a|
where
Zk([a, b], [c, d]) 4 exp
✓�k⇡2
log(16�)
◆
VANDERMONDE MATRIX2
6664
x1
x2
. . .xn
3
7775Vn � Vn
2
6664
�11
. . .1
3
7775=
2
6664
⇥ 0 . . . 0⇥ 0 . . . 0...
.... . .
...⇥ 0 . . . 0
3
7775
E+
E�rank✏(Vn) 2d4 log(8bn/2c/⇡) log(4/✏)/⇡2e+ 2
Therefore,
x = linspace(-1,1,1000);
�k(V1000)/�1(V1000)
Vn =⇥x
0 |x1 | · · · |xn�1⇤
x 2 Rn⇥1
For the condition number, see [Beckermann, 00] & [Pan, 16]
Let x 2 Rn⇥1 and n be even:
Z2k(R, E+ [ E�) 4 exp
✓�k⇡2
4 log(4n/⇡)
◆
If A,B are normal and rank(F ) = r, thenTheorem:
�(A) = spectrum of A
SYLVESTER TO ZOLOTAREV
Zk = Zolotarev number
�1+kr(X) Zk(�(A),�(B))kXk2
AX �XB = FSolve
ADI AS A RANK-REVEALING ALGORITHM
(A� pjI)X(B + qjI)� (A+ qjI)X(B � pjI) = (pj + qj)F
For any “shifts” pj and qj , we have
[Wachpress, 88]
[Peaceman & Rachford, 1955]The alternating direction implicit (ADI) method
Y0 = 0
Select shifts p0, . . . , pK�1 and q0, . . . , qK�1
for j = 0, . . . ,K � 1
Yj+1/2(B + qjI) = F � (A+ qjI)Yj
(A� pjI)Yj+1 = F � Yj+1/2(B � pjI)end
Optimal shifts: �1+jr(X) kX � Yjk2 = Zj(�(A),�(B))kXk2Our idea: rank(Yj) = rj, rank(F ) = r
Henry RachfordD. Peaceman
JUST ANOTHER REASON SO FARQu: Why are so many matrices of low rank?
Ans: The world is Sylvester. Many matrix satisfyAX �XB = F, rank(F ) = r ⌧ n
and under certain assumptions are therefore numerically of low rank.
Examples that do not fit:
(H(2)n )ij =
1
(i+ j � 1)2�k(C
(2)1000)/�1(C
(2)1000)
(C(2)n )ij =
1
|!i � zj |2
LOG-RANK MATRICESm X
n
mnStorage
m
nkk
X ⇡
k(m+ n)
Mat-vec O(mn) O(k(m+ n))
Includes:
Hn C(2)n
p, q = integers
Def. A family of matrices X1, X2, . . . , are of (p, q)-log-rank if
Xn 2 Cn⇥n, rank✏(Xn) C log(n+ 1)
plog(2/✏)q, C > 1
WE CAN GENERAL LOG-RANK MATRICES TOO
(C(2)n )ij =
1
|!i � zj |2=
1
(!i � zj)(!i � zj)
Heather Wilber
2
64!1
. . .!n
3
75C(2)n � C(2)
n
2
64z1
. . .zn
3
75 = Cn =Cauchy matrix
Example: (from fast multipole method)
)if !i and zj separated
AX �XB = F, rank(F ) = rUp to now:�1+kr(X) Zk(�(A),�(B))kXk2
Dan Fortunato
Is there an optimal-complexity spectral element method?
(Log-rank is the key idea.)
Madeleine Udell
Why are low rank matrices found in big datasets?(Work just started.)
WHAT’S NEXT?
Explain the abundance of sloppy models.
Jim Sethna Katherine Quinn
A small selection of our bounds:
The world is Sylvester
WHY ARE SO MANY MATRICES OF LOW RANK?
If you have a low rank matrix and want to prove it: email me
SUMMARY
CHARACTERIZATION OF LOG-RANK MATRICES
Theorem: X1, X2, . . . , are a family of (c1p, c2q)-log-rank matrices (c1, c2 small
constants) if there exist normal A1, A2, . . . , and B1, B2, . . . , such that
AnXn �XnBn = Fn,
where �(An) and �(Bn) are separated by a gap independent of n and F1, F2, . . . ,are a family of (p, q)-log-rank matrices.