Discrete random matrices · 2010-07-01 · Random matrices Singularity probability of Bernoulli...

Random matricesSingularity probability of Bernoulli matrices

Singularity probability of symmetric Bernoulli matricesOpen problems

Discrete random matrices

Terence Tao

University of California, Los Angeles

SIAM DM 10 - June 14, 2010

Terence Tao Random matrices



What is a random matrix?

A random matrix is a matrix, each of whose entries is arandom variable.There are many different types of random matrices that areof interest: square (n × n) versus rectangular (n × p), smallversus large, dense versus sparse, continuous versusdiscrete, symmetric (or Hermitian) versus non-symmetric,independent entries versus coupled entries, etc.




Given such a random matrix, one is interested inproperties such as invertibility, or statistics such as thedistribution of eigenvalues and eigenvalue gaps,particularly in the asymptotic regime n→∞.To control such quantities, tools from many areas ofmathematics are used: linear algebra, complex analysis,free probability, high-dimensional geometry, random walks,enumerative combinatorics, additive combinatorics, ...We will focus here on a few selected results, in whichcombinatorial tools play a prominent role. (This is not anexhaustive survey.)




Random matrix models

There are many, many, random matrix models of interest, butwe shall focus on just two:

Bernoulli ensemble (random sign matrices) These aren × n matrices whose entries are equal to ±1 (with equalprobability of each), with all entries independent. (Amongother things, such matrices can model the effect ofroundoff error in numerical analysis.)Symmetric Bernoulli ensemble These are similar torandom sign matrices, but where the matrix is nowconstrained to be symmetric. (These are closely related tothe adjacency matrices of random graphs of Erdos-Turántype.)




The singularity problem

Let Mn be an n × n random sign matrix. What is theasymptotic behaviour of the singularity probability

pn = det(Mn) = 0

that Mn is singular, as n→∞?Equivalently, given n independent vectors X1, . . . ,Xn in thediscrete unit cube {−1,1}n, what is the probability thatX1, . . . ,Xn span a proper subspace of Pn?




Results

By considering the probability that two rows Xi ,Xj match,one has pn ≥ (1

2 + o(1))n. This is conjectured to be thesharp bound.In 1967, Komlós showed that pn = o(1).In a breakthrough paper in 1995, Kahn, Komlós, andSzemerédi showed that pn ≤ (0.999 + o(1))n.The 0.999 was improved to 0.958 (T.-Vu, 2006), then to3/4 = 0.75 (T.-Vu, 2007), then to 1/

√2 ≈ 0.707

(Bourgain-Vu-Wood, 2009). This seems to be the limit ofcurrent methods.




Proof techniques

1− pn is the probability that the rows X1, . . . ,Xn are linearlyindependent. We can factor this as

1− pn =n∏

i=1

P(Xi 6∈ Vi |X1, . . . ,Xi−1 independent)

where Vi is the span of X1, . . . ,Xi−1. It turns out that thedominant term is the i = n term: heuristically, we thus have

pn ≈ P(Xn ∈ Vn).

A crucial observation: because the rows X1, . . . ,Xn arejointly independent random variables, the random vectorXn and the hyperplane Vn are also independent randomvariables.




Rich and poor hyperplanes

Call a hyperplane V rich if P(X ∈ V ) is larger than somethreshold λ to be optimised later, where X is a randomelement of {−1,1}n; thus V captures a large fraction of thediscrete cube {−1,1}n. Call V poor if it is not rich.Example: the hyperplane {x1 = x2} is rich, but a generichyperplane (whose coefficients are linearly independentover the rationals) will be poor.Clearly, if Vn is poor, the probability P(Xn ∈ Vn) will be low.So the main task is to deal with the rich hyperplanes.Key observation: The rich hyperplanes have a specialstructure, and there are very few of them (entropy is low).




Inverse Littlewood-Offord theory

Let V be a hyperplane with normal vector (v1, . . . , vn).Then

P(X ∈ V ) = P(S = 0)

where S is the random walk

S := ±v1 ± . . .± vn

and the ± denote independent Bernoulli signs. Soclassifying the rich hyperplanes is equivalent to theinverse Littlewood-Offord problem: which random walksare concentrated at the origin?




A key example

Suppose all the vi are equal, v1 = . . . = vn = v . Then S isessentially a simple random walk, and (if n is even) theprobability that S = 0 is equal to 1/

( nn/2

)≈ n−1/2.




More generally, if the v1, . . . , vn lie in a generalisedarithmetic progression

P = {a1w1 + . . .+ ar wr : |a1| ≤ N1, . . . , |ar | ≤ Nr}

of some bounded rank (or “dimension”) r = O(1), then (bythe central limit theorem) S mostly takes values in thelarger progression

C√

nP := {a1w1+. . .+ar wr : |a1| ≤ C√

nN1, . . . , |ar | ≤ C√

nNr}

for some constant C. From this and the pigeonholeprinciple we see that S concentrates somewhere:

supx

P(S = x)�r n−r/2/|P|.




Erdos’ inverse Littlewood-Offord theorem

(Erdos, 1945) If P(S = 0) ≥ k−1/2, then at most O(k) ofthe coefficients v1, . . . , vn are non-zero. (This is enough toget a bound of pn = O(n−1/2).)The proof uses Sperner’s theorem, noting that the set ofsign patterns for which ±v1 . . .± vn = 0 can be viewed asan antichain in {−1,1}n if one normalises all the v1, . . . , vnto be positive numbers.Many refinements and generalisations of this result(Kleitman, Frankl-Furedi, Halasz, Sarkozy-Szemerédi,Stanley, ...)




More inverse Littlewood-Offord theorems

More generally, there are a family of results (T.-Vu,Rudelson-Vershynin, Nguyen-Vu) that assert (roughlyspeaking) that if P(S = 0) is large (e.g. larger than n−B forsome fixed B), then most of the v1, . . . , vn are contained insidea generalised arithmetic progression

{a1w1 + . . .+ ar wr : |a1| ≤ N1, . . . , |ar | ≤ Nr}

of controlled size. These results are similar in spirit to theinverse sumset theorems of Freiman and Ruzsa in additivecombinatorics. (This is enough to get a bound of pn = O(n−A)for any fixed A.)




These inverse Littlewood-Offord theorems can either beestablished by Fourier analytic techniques (e.g. by computingthe characteristic function EeitS of the random walk S) or bygreedy algorithm methods, in which one uses the concentratednature of S to locate directions in which the distribution of S is“smooth”, and then “quotients” out these directions.




But the most powerful method, pioneered by Kahn, Komlós andSzemerédi, is the swapping trick, in which one replaces therandom Bernoulli vector X ∈ {−1,1}n with a sparse Bernoullivector Y ∈ {−1,0,1}n which has many coefficients equal tozero, and replaces the random walk S by a lazy random walk S′

in which many steps are set to zero. A Fourier-analyticargument of Halasz then shows that (in typical situations), thelazy walk is significantly more concentrated than the ordinarylazy walk,

P(S = 0) ≤ cP(S′ = 0)

(where 0 < c < 1 is an absolute constant), and thus Yconcentrates in V more often than X ,

P(X ∈ V ) ≤ cP(Y ∈ V ).




Morally, the probability pn that X1, . . . ,Xn are dependent issomething like

pn ≈∑

VrichP(X1, . . . ,Xn ∈ V ) =

∑Vrich

P(X ∈ V )n.

Similarly, the probability p′n that Y1, . . . ,Yn are dependent issomething like

p′n ≈∑

VrichP(Y ∈ V )n.

Since p′n is trivially bounded by 1, and P(X ∈ V ) ≤ cP(Y ∈ V ),this morally gives an upper bound of cn for pn.




Modulo a lot of details, the best bounds for pn then comefrom trying to optimise in c while also controlling variouserror terms not mentioned here.One also needs Fourier-based inverse Littlewood-Offordtheory to deal with those exceptional V for which theinequality P(X ∈ V ) ≤ cP(Y ∈ V ) fails.It appears though that (1/

√2 + o(1))n is basically the limit

of current methods.




Determinant of a Bernoulli matrix

Now we consider the determinant det(Mn) of a Bernoulli matrixMn = (ai,j)1≤i,j≤n.

For comparison, Hadamard’s inequality gives the bound|det(Mn)| ≤ nn/2, with equality if and only if Mn is aHadamard matrix (an orthogonal Bernoulli matrix).Turán observed that in the cofactor expansion

det(Mn) =∑σ∈Sn

sgn(σ)n∏

j=1

aj,σ(j)

the summands have mean zero, variance one, and arepairwise independent, so det(Mn) has mean zero andvariance n!; in particular, the standard deviation of det(Mn)is√

n! = e−n/2+o(n)nn/2.




The methods used to control the singularity probabilitypn = P(det(Mn) = 0) can also be applied to estimate thesize of the determinant det(Mn).The starting point is to replace the formula

1− pn =n∏

i=1

P(Xi 6∈ Vi |X1, . . . ,Xi−1 independent)

by the iterated base-times-height formula

|det(Mn)| =n∏

i=1

dist(Xi ,Vi).




Using this and some concentration of measure tools (e.g.Talagrand’s inequality), one can show that log |det(Mn)| isasymptotically comparable to log

√(n − 1)! with high

probability (Girko, T.-Vu).




There is also some partial understanding of how thedeterminant behaves modulo p for various moduli p.From Gaussian elimination it is easy to see that det(Mn) isalways an integer multiple of 2n−1.For fixed odd primes p, the probability that det(Mn) iscoprime to p (i.e. that Mn is invertible mod p) is asymptoticto

∞∏j=1

(1− 1pj )

(Kahn-Komlós, 2001); not coincidentally, this is theasymptotic probability that n randomly chosen vectors inFn

p form a basis.More precise asymptotics and generalisations wereobtained recently (Maples, 2010).




Permanent of a Bernoulli matrix

A somewhat different method also allows one to control thepermanent

Per(Mn) :=∑σ∈Sn

n∏j=1

aj,σ(j)

of a random sign matrix Mn = (ai,j)1≤i,j≤n.




The basic idea is to expand the permanent of each k × kminor as a random sum of k − 1× k − 1 minors, and viewthe evolution of these minors in k as a random birth-deathprocess, using tools such as Erdos’ inverseLittlewood-Offord theorem to prevent the minors frombecoming unexpectedly small too frequently.As a consequence, one can show that the permanent ofMn is equal to n(1/2+o(1))n with probability 1− o(1) (T.-Vu,2009).




Least singular value

The least singular value σn of an n × n matrix An can bedefined by the formula

σn := inf‖x‖=1

‖Anx‖.

It is positive precisely when An is invertible. Thus forinstance, if An is a Bernoulli matrix, then

P(σn = 0) = pn ≤ (1/√

2 + o(1))n.

The least singular value of a matrix is related to itscondition number σ1/σn, which is a quantity ofimportance in numerical linear algebra.




By variants of the above methods, one can obtain further tailbounds on σn, for instance

P(σn ≤ t/√

n) ≤ O(t + cn)

for all t > 0 and some absolute constant c < 1(Rudelson-Vershynin, 2008; related results by T.-Vu).




In particular, we have σn � n−1/2 with high probability.These sorts of lower bounds are of importance in derivingthe circular law that describes the asymptotic bulkdistribution of the eigenvalues of such random matrices(Götze-Tikhomirov 2007, Pan-Zhou 2008, T.-Vu 2008,T.-Vu-Krishnapur 2009)By using linear algebra tools and variants of theBerry-Esséen theorem we also know that the least singularvalue of a Bernoulli matrix has the same asymptoticdistribution as that of its gaussian counterpart (which canbe explicitly computed). (T.-Vu, 2009)




The symmetric case

The techniques discussed above rely heavily on the jointindependence of all the entries in a Bernoulli matrix.In general, we do not know how to extend these methodsto non-independent settings, such as symmetric Bernoullimatrices.Nevertheless, some results are known in this case.For instance, if qn is the probability that a randomsymmetric Bernoulli matrix Bn is singular, then one hasqn = O(n−1/2+o(1)) (Costello, 2008; earlier results byCostello-T.-Vu, 2006).




The basic idea is to expand

Bn =

(Bn−1 XX ∗ bnn

)where X ∈ {−1,1}n−1 is a random vector independent ofBn−1 and bnn. By Schur’s complement, Bn is invertible ifBn−1 and bnn − X ∗B−1

n−1X are invertible.

The expression bnn − X ∗B−1n−1X is a quadratic expression

in the coefficients of X . One then needs a quadraticinverse Littlewood-Offord theory that classifies thosequadratic forms that can concentrate at the origin. Thistheory is only partially developed at present.




Despite much recent progress, there are still plenty of openquestions remaining. For instance:

Apart from the moment method, most of the methodsdescribed here rely heavily on the independenceproperties of the entries. For instance, we have nonon-trivial bounds on the singularity probability of theadjacency matrix of a random regular graph. How toreduce reliance on independence?Can we improve the bounds for the singularity probabilityof a symmetric Bernoulli matrix beyond the current recordof O(n−1/2+o(1))? In truth, the bound should beexponentially decaying in n.




The upper bound for the singularity probabilitypn = P(det(Mn) = 0) for random sign matrices rely onFourier-analytic methods, and these methods seem to beunable to do better than (1/

√2 + o(1))n. Are there

non-Fourier analytic methods that can at least get anexponential bound cn for some c < 1?The known upper bounds for pn also apply toP(det(Mn) = a) for any other a. But for non-zero a, thebounds should be far better, and in fact besuper-exponentially decaying (e.g. of shape n−n/2 or so).Can one improve the current boundP(det(Mn) = a) ≤ (1/

√2 + o(1))n when a is non-zero?




Are the eigenvalues of random sign matrices all simple?We don’t even know that the expected proportion of simpleeigenvalues is 1− o(1) yet. (But we do know this in thesymmetric case, thanks to recent results on the eigenvaluegap distribution.)In many cases, we still lack good large deviation bounds -how small are the tail events that the distribution deviatesfar from the expected scenario?


Discrete random matrices · 2010-07-01 · Random matrices Singularity probability of Bernoulli...

Documents

Transcript of Discrete random matrices · 2010-07-01 · Random matrices Singularity probability of Bernoulli...