Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research...

50
Big Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1 / 40

Transcript of Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research...

Page 1: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Big Data is Low Rank

Madeleine Udell

Operations Research and Information EngineeringCornell University

UC Davis, 11/5/2018

1 / 40

Page 2: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Data table

age gender state income education · · ·29 F CT $53,000 college · · ·57 ? NY $19,000 high school · · ·? M CA $102,000 masters · · ·

41 F NV $23,000 ? · · ·...

......

...

I detect demographic groups?

I find typical responses?

I identify related features?

I impute missing entries?

2 / 40

Page 3: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Data table

m examples (patients, respondents, households, assets)n features (tests, questions, sensors, times) A

=

A11 · · · A1n...

. . ....

Am1 · · · Amn

I ith row of A is feature vector for ith example

I jth column of A gives values for jth feature across allexamples

3 / 40

Page 4: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank model

given: m × n data table A, k m, nfind: X ∈ Rm×k , Y ∈ Rk×n for whichX

[ Y]≈

A

i.e., xiyj ≈ Aij , whereX

=

—x1—...

—xm—

[Y

]=

| |y1 · · · yn| |

interpretation:

I X and Y are (compressed) representation of AI xTi ∈ Rk is a point associated with example iI yj ∈ Rk is a point associated with feature jI inner product xiyj approximates Aij

4 / 40

Page 5: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Why?

I reduce storage; speed transmission

I understand (visualize, cluster)

I remove noise

I infer missing data

I simplify data processing

5 / 40

Page 6: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Outline

PCA

Generalized low rank models

Applications

Impute missing data

Dimensionality reduction

Causal inference

Automatic machine learning

Why low rank?

6 / 40

Page 7: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Principal components analysis

PCA: for A ∈ Rm×n,

minimize ‖A− XY ‖2F =

∑mi=1

∑nj=1(Aij − xiyj)

2

with variables X ∈ Rm×k , Y ∈ Rk×n

I old roots [Pearson 1901, Hotelling 1933]

I least squares low rank fitting

I (analytical) solution via SVD of A = UΣV T

I (numerical) solution via alternating minimization

7 / 40

Page 8: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Outline

PCA

Generalized low rank models

Applications

Impute missing data

Dimensionality reduction

Causal inference

Automatic machine learning

Why low rank?

8 / 40

Page 9: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Generalized low rank model

minimize∑

(i ,j)∈Ω Lj(xiyj ,Aij) +∑m

i=1 ri (xi ) +∑n

j=1 rj(yj)

I loss functions Lj for each columnI e.g., different losses for reals, booleans, categoricals,

ordinals, . . .

I regularizers r : R1×k → R, r : Rk → R

I observe only (i , j) ∈ Ω (other entries are missing)

Note: can be NP-hard to optimize exactly. . .

9 / 40

Page 10: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Matrix completion

observe Aij only for (i , j) ∈ Ω ⊂ 1, . . . ,m × 1, . . . , n

minimize∑

(i ,j)∈Ω(Aij − xiyj)2 + λ

∑mi=1 ‖xi‖2 + λ

∑nj=1 ‖yj‖2

two regimes:

I some entries missing: don’t waste data; “borrowstrength” from entries that are not missing

I most entries missing: matrix completion still works!

Theorem ([Keshavan Montanari 2010])

If A has rank k ′ ≤ k and |Ω| = O(nk ′ log n) (and A is incoherent

and Ω is chosen UAR), then matrix completion exactly recovers thematrix A with high probability.

10 / 40

Page 11: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Maximum likelihood low rank estimation

Choose loss function to maximize (log) likelihood ofobservations:

I gaussian noise: L(u, a) = (u − a)2

I laplacian (heavy-tailed) noise: L(u, a) = |u − a|I gaussian + laplacian noise: L(u, a) = huber(u − a)

I poisson (count) noise: L(u, a) = exp(u)− au + a log a− a

I bernoulli (coin toss) noise: L(u, a) = log(1 + exp(−au))

11 / 40

Page 12: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Maximum likelihood low rank estimation works

Theorem (Template)

If a number of samples |Ω| = O(n log(n)) drawn UAR frommatrix entries is observed according to a probabilistic modelwith parameter Z , the solution to (appropriately) regularizedmaximum likelihood estimation is close to the true Z with highprobability.

examples (not exhaustive!):

I additive gaussian noise [Candes Plan 2009]

I additive subgaussian noise [Keshavan Montanari Oh 2009]

I gaussian + laplacian noise [Xu Caramanis Sanghavi 2012]

I 0-1 (Bernoulli) observations [Davenport et al. 2012]

I entrywise exponential family distribution [Gunasekar Ravikumar

Ghosh 2014]

I multinomial logit [Kallus Udell 2016]

12 / 40

Page 13: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Losses

minimize∑

(i ,j)∈Ω Lj(xiyj ,Aij) +∑m

i=1 ri (xi ) +∑n

j=1 rj(yj)

choose loss L : R×F → R adapted to data type F :

data type loss L(u, a)

real quadratic (u − a)2

real absolute value |u − a|real huber huber(u − a)

boolean hinge (1− ua)+

boolean logistic log(1 + exp(−au))

integer poisson exp(u)− au + a log a− a

ordinal ordinal hinge∑a−1

a′=1(1− u + a′)++∑da′=a+1(1 + u − a′)+

categorical one-vs-all (1− ua)+ +∑

a′ 6=a(1 + ua′)+

categorical multinomial logit exp(ua)∑da′=1 exp(ua′ ) 13 / 40

Page 14: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Regularizers

minimize∑

(i ,j)∈Ω Lj(xiyj ,Aij) +∑m

i=1 ri (xi ) +∑n

j=1 rj(yj)

choose regularizers r , r to impose structure:

structure r(x) r(y)

small ‖x‖22 ‖y‖2

2

sparse ‖x‖1 ‖y‖1

nonnegative 1(x ≥ 0) 1(y ≥ 0)clustered 1(card(x) = 1) 0

14 / 40

Page 15: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Outline

PCA

Generalized low rank models

Applications

Impute missing data

Dimensionality reduction

Causal inference

Automatic machine learning

Why low rank?

15 / 40

Page 16: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Impute missing data

impute most likely true data Aij

Aij = argmina

Lj(xiyj , a)

I implicit constraint: Aij ∈ Fj

I when Lj is quadratic, `1, or Huber loss, then Aij = xiyjI if F 6= R, argmina Lj(xiyj , a) 6= xiyj

I e.g., for hinge loss L(u, a) = (1− ua)+, Aij = sign(xiyj)

16 / 40

Page 17: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Impute heterogeneous data

mixed data types

−12−9−6−3036912

remove entries

−12−9−6−3036912

qpca rank 10 recovery

−12−9−6−3036912

error

−3.0−2.4−1.8−1.2−0.60.00.61.21.82.43.0

glrm rank 10 recovery

−12−9−6−3036912

error

−3.0−2.4−1.8−1.2−0.60.00.61.21.82.43.0

17 / 40

Page 18: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Impute heterogeneous data

mixed data types

−12−9−6−3036912

remove entries

−12−9−6−3036912

qpca rank 10 recovery

−12−9−6−3036912

error

−3.0−2.4−1.8−1.2−0.60.00.61.21.82.43.0

glrm rank 10 recovery

−12−9−6−3036912

error

−3.0−2.4−1.8−1.2−0.60.00.61.21.82.43.0

17 / 40

Page 19: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Impute heterogeneous data

mixed data types

−12−9−6−3036912

remove entries

−12−9−6−3036912

qpca rank 10 recovery

−12−9−6−3036912

error

−3.0−2.4−1.8−1.2−0.60.00.61.21.82.43.0

glrm rank 10 recovery

−12−9−6−3036912

error

−3.0−2.4−1.8−1.2−0.60.00.61.21.82.43.0

17 / 40

Page 20: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Hospitalizations are low rank

hospitalization data set

GLRM outperforms PCA

[Schuler Udell et al., 2016]

18 / 40

Page 21: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

American community survey

2013 ACS:

I 3M respondents, 87 economic/demographic surveyquestions

I incomeI cost of utilities (water, gas, electric)I weeks worked per yearI hours worked per weekI home ownershipI looking for workI use foodstampsI education levelI state of residenceI . . .

I 1/3 of responses missing

19 / 40

Page 22: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Fitting a GLRM to the ACS

I construct a rank 10 GLRM with loss functions respectingdata types

I huber for real valuesI hinge loss for booleansI ordinal hinge loss for ordinalsI one-vs-all hinge loss for categoricals

I scale losses and regularizers by 1/σ2j

I fit the GLRM

in 2 lines of code:

glrm, labels = GLRM(A, 10, scale = true)

X,Y = fit!(glrm)

20 / 40

Page 23: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

American community survey

most similar features (in demography space):

I Alaska: Montana, North Dakota

I California: Illinois, cost of water

I Colorado: Oregon, Idaho

I Ohio: Indiana, Michigan

I Pennsylvania: Massachusetts, New Jersey

I Virginia: Maryland, Connecticut

I Hours worked: weeks worked, education

21 / 40

Page 24: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank models for dimensionality reduction1

U.S. Wage & Hour Division (WHD) compliance actions:

company zip violations · · ·Holiday Inn 14850 109 · · ·

Moosewood Restaurant 14850 0 · · ·Cornell Orchards 14850 0 · · ·

Lakeside Nursing Home 14850 53 · · ·...

......

I 208,806 rows (cases) × 252 columns (violation info)

I 32,989 zip codes. . .

1labor law violation demo: https://github.com/h2oai/h2o-3/blob/

master/h2o-r/demos/rdemo.census.labor.violations.large.R22 / 40

Page 25: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank models for dimensionality reduction

ACS demographic data:

zip unemployment mean income · · ·94305 12% $47,000 · · ·06511 19% $32,000 · · ·60647 23% $23,000 · · ·94121 4% $178,000 · · ·

......

...

I 32,989 rows (zip codes) × 150 columns (demographic info)

I GLRM embeds zip codes into (low dimensional)demography space

23 / 40

Page 26: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank models for dimensionality reduction

Zip code features:

24 / 40

Page 27: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank models for dimensionality reduction

build 3 sets of features to predict violations:

I categorical: expand zip code to categorical variable

I concatenate: join tables on zip

I GLRM: replace zip code by low dimensional zip codefeatures

fit a supervised (deep learning) model:

method train error test error runtime

categorical 0.2091690 0.2173612 23.7600000concatenate 0.2258872 0.2515906 4.4700000

GLRM 0.1790884 0.1933637 4.3600000

25 / 40

Page 28: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Causal inference with messy data

Income College

5500 1

3000 0

5800 1

3300 0

4000 0

Gender State Age IQ

M CT 29 110

? NY 57 ?

F ? ? 108

F CA 41 99

? NV 50 ? × ×

× ×

× ×

× ×

× ×

× × × ×

× × × ×

fit

Estimate

treatment

effectof

college

[Kallus Mao Udell, 2018]

26 / 40

Page 29: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Matrix completion reduces error

0.0

0.2

0.4

0.6

500 1000 1500

p

Rel

ativ

e R

MS

E

Method Lasso Ridge OLS MF (Proposal) Oracle

Binary Covariates, High Dimension

0.025

0.050

0.075

10 20 30 40 50

Number of Proxies

RM

SE

MethodMatch

MF + Match

LR

MF + LR

DR

MF + DR

PS Match

MF + PS Match

Twins Data

[Kallus Mao Udell, 2018]

27 / 40

Page 30: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank method for automatic machine learning

I yellow blocks are fully observed: results of each algorithmon each training dataset

I last row is a new datasetI blue blocks are observed results of algorithms on new

datasetI white blocks are unknown entriesI fit a low rank model to impute white blocks

[Yang Akimoto Udell, 2018]28 / 40

Page 31: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Low rank fit correctly identifies best algorithm type

[Yang Akimoto Udell, 2018]29 / 40

Page 32: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Experiment design for timely model selection

Which algorithms to use to predict performance?

maximizevj

log det( n∑

j=1

vjyjyTj

)subject to

n∑j=1

vj tj ≤ τ

vj ∈ [0, 1] ∀j ∈ [n].

I tj : estimated runtime of each machine learning model

I τ : runtime budget

[Yang Akimoto Udell, 2018]

30 / 40

Page 33: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Oboe: Time-constrained AutoML

1.0 2.0 4.0 8.0 16.0 32.0 64.0runtime budget (s)

0.1

0.2

0.3

0.4

0.5b

alan

ced

erro

rra

te

(a) OpenML

1.0 2.0 4.0 8.0 16.0 32.0 64.0runtime budget (s)

0.0

0.1

0.2

0.3

0.4

0.5

bal

ance

der

ror

rate

(b) UCI

2.0 4.0 8.0 16.0 32.0 64.0runtime budget (s)

1.6

1.8

2.0

2.2

2.4

aver

age

ran

k

Oboe

auto-sklearn

random

1.0 2.0 4.0 8.0 16.0 32.0 64.0runtime budget (s)

1.6

1.8

2.0

2.2

2.4

aver

age

ran

k

(c) OpenML

1.0 2.0 4.0 8.0 16.0 32.0 64.0runtime budget (s)

1.6

1.8

2.0

2.2

2.4

aver

age

ran

k

(d) UCI

Figure: In 1a and 1b, shaded area = 75th–25th percentile.In 1c and 1d, rank 1 is best and 3 is worst.

[Yang Akimoto Udell, 2018]

31 / 40

Page 34: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Outline

PCA

Generalized low rank models

Applications

Impute missing data

Dimensionality reduction

Causal inference

Automatic machine learning

Why low rank?

32 / 40

Page 35: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models

Suppose A ∈ Rm×n generated by a latent variable model (LVM):

I αi ∼ A iid, i = 1, . . . ,m

I βj ∼ B iid, j = 1, . . . , n

I Aij = g(αi , βj)

33 / 40

Page 36: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models: examples

inner product:

I αi ∼ A ⊆ Rk iid, i = 1, . . . ,m

I βj ∼ B ⊆ Rk iid, j = 1, . . . , n

I Aij = αTi βj

I rank of A? k

univariate:

I αi ∼ Unif(0, 1) iid, i = 1, . . . ,m

I βj ∼ Unif(0, 1) iid, j = 1, . . . , n

I Aij = g(αi , βj)I rank of A?

I can be large!I if g is analytic with supx∈R |g (d)(x)| ≤ M,

entrywise ε-approximation to A has rank O(log(1/ε))

34 / 40

Page 37: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models: examples

inner product:

I αi ∼ A ⊆ Rk iid, i = 1, . . . ,m

I βj ∼ B ⊆ Rk iid, j = 1, . . . , n

I Aij = αTi βj

I rank of A?

k

univariate:

I αi ∼ Unif(0, 1) iid, i = 1, . . . ,m

I βj ∼ Unif(0, 1) iid, j = 1, . . . , n

I Aij = g(αi , βj)I rank of A?

I can be large!I if g is analytic with supx∈R |g (d)(x)| ≤ M,

entrywise ε-approximation to A has rank O(log(1/ε))

34 / 40

Page 38: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models: examples

inner product:

I αi ∼ A ⊆ Rk iid, i = 1, . . . ,m

I βj ∼ B ⊆ Rk iid, j = 1, . . . , n

I Aij = αTi βj

I rank of A? k

univariate:

I αi ∼ Unif(0, 1) iid, i = 1, . . . ,m

I βj ∼ Unif(0, 1) iid, j = 1, . . . , n

I Aij = g(αi , βj)I rank of A?

I can be large!I if g is analytic with supx∈R |g (d)(x)| ≤ M,

entrywise ε-approximation to A has rank O(log(1/ε))

34 / 40

Page 39: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models: examples

inner product:

I αi ∼ A ⊆ Rk iid, i = 1, . . . ,m

I βj ∼ B ⊆ Rk iid, j = 1, . . . , n

I Aij = αTi βj

I rank of A? k

univariate:

I αi ∼ Unif(0, 1) iid, i = 1, . . . ,m

I βj ∼ Unif(0, 1) iid, j = 1, . . . , n

I Aij = g(αi , βj)

I rank of A?I can be large!I if g is analytic with supx∈R |g (d)(x)| ≤ M,

entrywise ε-approximation to A has rank O(log(1/ε))

34 / 40

Page 40: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models: examples

inner product:

I αi ∼ A ⊆ Rk iid, i = 1, . . . ,m

I βj ∼ B ⊆ Rk iid, j = 1, . . . , n

I Aij = αTi βj

I rank of A? k

univariate:

I αi ∼ Unif(0, 1) iid, i = 1, . . . ,m

I βj ∼ Unif(0, 1) iid, j = 1, . . . , n

I Aij = g(αi , βj)I rank of A?

I can be large!I if g is analytic with supx∈R |g (d)(x)| ≤ M,

entrywise ε-approximation to A has rank O(log(1/ε))

34 / 40

Page 41: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Latent variable models: examples

inner product:

I αi ∼ A ⊆ Rk iid, i = 1, . . . ,m

I βj ∼ B ⊆ Rk iid, j = 1, . . . , n

I Aij = αTi βj

I rank of A? k

univariate:

I αi ∼ Unif(0, 1) iid, i = 1, . . . ,m

I βj ∼ Unif(0, 1) iid, j = 1, . . . , n

I Aij = g(αi , βj)I rank of A?

I can be large!I if g is analytic with supx∈R |g (d)(x)| ≤ M,

entrywise ε-approximation to A has rank O(log(1/ε))

34 / 40

Page 42: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Nice latent variable models

We say LVM is nice if

I distributions A and B have bounded support

I g is piecewise analytic and on each piece: for some M ∈ R,

‖Dµg(α, β)‖ ≤ CM |µ|‖g‖.

(‖g‖ = supx∈dom g g(x) is sup norm.)

Examples: g(α, β) = poly(α, β) or g(α, β) = exp(poly(α, β))

35 / 40

Page 43: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Rank of nice latent variable models?

Question: Suppose A ∈ Rm×n is drawn from a nice LVM.How does rank of ε-approximation to A change with m and n?

minimize Rank(X )subject to ‖X − A‖∞ ≤ ε

Answer: rank grows as O(log(m + n)/ε2)

Theorem (Udell and Townsend, 2017)

Nice latent variable models are of log rank.

36 / 40

Page 44: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Rank of nice latent variable models?

Question: Suppose A ∈ Rm×n is drawn from a nice LVM.How does rank of ε-approximation to A change with m and n?

minimize Rank(X )subject to ‖X − A‖∞ ≤ ε

Answer: rank grows as O(log(m + n)/ε2)

Theorem (Udell and Townsend, 2017)

Nice latent variable models are of log rank.

36 / 40

Page 45: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Rank of nice latent variable models?

Question: Suppose A ∈ Rm×n is drawn from a nice LVM.How does rank of ε-approximation to A change with m and n?

minimize Rank(X )subject to ‖X − A‖∞ ≤ ε

Answer: rank grows as O(log(m + n)/ε2)

Theorem (Udell and Townsend, 2017)

Nice latent variable models are of log rank.

36 / 40

Page 46: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Proof sketch

I For each α, expand g around β = 0 by its Taylor series

g(α, β)− g(α, 0) = 〈∇g(α, 0), β〉+ 〈∇2g(α, 0), ββ>〉+ . . .

=

∇g(α, 0)vec(∇2g(α, 0))

...

> β

vec(ββ>)...

collect terms depending on α and on β

I Notice: very high dimensional inner product!

I apply Johnson Lindenstrauss lemma to reduce dimension

37 / 40

Page 47: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Johnson Lindenstrauss Lemma

Lemma (The Johnson–Lindenstrauss Lemma)

Consider x1, . . . , xn ∈ RN . Pick 0 < ε < 1 and setr = d8(log n)/ε2e.There is a linear map Q : RN → Rr such that for all 1 ≤ i , j ≤ n,

(1− ε)‖xi − xj‖2 ≤ ‖Q(xi − xj)‖2 ≤ (1 + ε)‖xi − xj‖2.

(Here, dae is the smallest integer larger than a.)

Lemma (Variant of the Johnson–Lindenstrauss Lemma)

Under the same assumptions, there is a linear mapQ : RN → Rr such that for all 1 ≤ i , j ≤ n,∣∣∣xTi xj − xTi QTQxj

∣∣∣ ≤ ε(‖xi‖2 + ‖xj‖2 − xTi xj

).

38 / 40

Page 48: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Johnson Lindenstrauss Lemma

Lemma (The Johnson–Lindenstrauss Lemma)

Consider x1, . . . , xn ∈ RN . Pick 0 < ε < 1 and setr = d8(log n)/ε2e.There is a linear map Q : RN → Rr such that for all 1 ≤ i , j ≤ n,

(1− ε)‖xi − xj‖2 ≤ ‖Q(xi − xj)‖2 ≤ (1 + ε)‖xi − xj‖2.

(Here, dae is the smallest integer larger than a.)

Lemma (Variant of the Johnson–Lindenstrauss Lemma)

Under the same assumptions, there is a linear mapQ : RN → Rr such that for all 1 ≤ i , j ≤ n,∣∣∣xTi xj − xTi QTQxj

∣∣∣ ≤ ε(‖xi‖2 + ‖xj‖2 − xTi xj

).

38 / 40

Page 49: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

Summary

big data is low rank

I in social science

I in medicine

I in machine learning

we can exploit low rank to

I fill in missing data

I embed data in vector space

I infer causality

I choose machine learning models

39 / 40

Page 50: Big Data is Low Rank - Cornell UniversityBig Data is Low Rank Madeleine Udell Operations Research and Information Engineering Cornell University UC Davis, 11/5/2018 1/40 Data table

References

I Generalized Low Rank Models. M. Udell, C. Horn, R. Zadeh, and S.Boyd. Foundations and Trends in Machine Learning, 2016.

I Revealed Preference at Scale: Learning Personalized Preferences fromAssortment Choices. N. Kallus and M. Udell. EC 2016.

I Discovering Patient Phenotypes Using Generalized Low Rank Models.A. Schuler, V. Liu, J. Wan, A. Callahan, M. Udell, D. Stark, and N.Shah. Pacific Symposium on Biocomputing (PSB), 2016.

I Low rank models for high dimensional categorical variables: labor lawdemo. A. Fu and M. Udell.https://github.com/h2oai/h2o-3/blob/master/h2o-r/demos/

rdemo.census.labor.violations.large.R

I Causal Inference with Noisy and Missing Covariates via MatrixFactorization. N. Kallus, X. Mao, and M. Udell. NIPS 2018.

I OBOE: Collaborative Filtering for AutoML Initialization. C. Yang, Y.Akimoto, D. Kim, and M. Udell.

I Why are Big Data Matrices Approximately Low Rank? M. Udell andA. Townsend. SIMODS, to appear.

40 / 40