Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix,...

61
Numerical Linear Algebra for data-related applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 2019 Olomouc, September 20, 2019

Transcript of Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix,...

Page 1: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Numerical Linear Algebra for data-relatedapplicationsYousef Saad

Department of Computer Scienceand Engineering

University of Minnesota

Modelling 2019Olomouc, September 20, 2019

Page 2: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Introduction: a historical perspective

In 1953, George Forsythe published a paper titled:“Solving linear systems can be interesting”.

ä Survey of the state of the art linear algebra at that time: di-rect methods, iterative methods, conditioning, preconditioning,The Conjugate Gradient method, acceleration methods, ....

ä An amazing paper in which the author was urging researchersto start looking at solving linear systems

Olomouc, 09-20-2019 p. 2

Page 3: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Introduction: a historical perspective

In 1953, George Forsythe published a paper titled:“Solving linear systems can be interesting”.

ä Survey of the state of the art linear algebra at that time: di-rect methods, iterative methods, conditioning, preconditioning,The Conjugate Gradient method, acceleration methods, ....

ä An amazing paper in which the author was urging researchersto start looking at solving linear systems

ä 66 years later – we can certainly state that:

“Linear Algebra problems in Machine Learn-ing can be interesting”

Olomouc, 09-20-2019 p. 3

Page 4: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Focus of numerical linear algebra changed many times overthe years

ä This is because linear algebra is a key tool when solvingchallenging new problems in various disciplines

1940s–1950s: Major issue: the flutter problem in aerospaceengineering→ eigenvalue problem [cf. Olga Taussky Todd]

ä Then came the discoveries of the LR and QR algorithms.The package Eispack followed a little later

1960s: Problems related to the power grid promoted what wewould call today general sparse matrix techniques

Late 1980s: Thrust on parallel matrix computations.

Late 1990s: Spur of interest in “financial computing”

Olomouc, 09-20-2019 p. 4

Page 5: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Solution of PDEs (e.g., Fluid Dynamics) and problems in me-chanical eng. (e.g. structures) major force behind numericallinear algebra algorithms in the past few decades.

ä Strong new forces are now reshaping the field today: Appli-cations related to the use of “data”

ä Machine learning is appearing in unexpected places:

• design of materials

• machine learning in geophysics

• self-driving cars, ..

• ....

Olomouc, 09-20-2019 p. 5

Page 6: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Big impact on the economy

ä New economy driven by Google,Facebook, Netflix, Amazon, Twitter,Ali-Baba, Tencent, ..., and even thebig department stores (Walmart, ...)ä Huge impact on Jobs

Olomouc, 09-20-2019 p. 6

Page 7: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Big impact on the economy

ä New economy driven by Google,Facebook, Netflix, Amazon, Twitter,Ali-Baba, Tencent, ..., and even thebig department stores (Walmart, ...)ä Huge impact on Jobs

ä In contrast: Old economy [drivenby Boeing, GM, Ford, Mining industry,US Steel, Aerospatiale, ...] does nothave as much to offer...

ä Imperative to look at what you we do under new lenses:DATA

Olomouc, 09-20-2019 p. 7

Page 8: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

LARGE SYSTEMSH2 / HSS matrices

Ax=b GraphPartitioning

Model reduction A x = xλ DomainDecomposition

−∆ u = f

Preconditioning

Sparse matrices

Page 9: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Tran

sla

te H2 / HSS matrices

Conquer

Divide &

Data Sparsity

Ax=b GraphPartitioning

Model reduction A x = xλ DomainDecomposition

−∆ u = f

Preconditioning

TΣA = U V PCA ClusteringDimension

Reduction

Semi−Supervised

Learning

Regression

Sparse matricesLARGE SYSTEMS

LASSO

GraphLaplaceans

BIG DATA!

Page 10: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Introduction, background, and motivation

Common goal of data mining & machine learning: to extractmeaningful information or patterns from data. Very broadarea – includes: data analysis, pattern recognition, informa-tion retrieval, ...

ä Main tools used: linear algebra; Statistics; Optimization;graph theory; approximation theory; ...

Olomouc, 09-20-2019 p. 10

Page 11: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

A few sample (classes of) problems & methods:

ä Classification: ’Benign – Malignant’, ’Dangerous-Safe’, Facerecognition, digit recognition, pattern recognition, ..

ä Graphs/ networks analysis: Pagerank, communicability, nodecentrality, ...

ä Matrix completion: Recommender systems

ä Projection type methods: PCA, LSI, Clustering, Eigenmaps,LLE, Isomap, ...

ä (Deep) Neural Networks: Convolutional Neural Networks;Image recognition; Speech recognition; ...

ä Computational statistics: [ Diag(inv(Cov)), Log det (A), ...]

Olomouc, 09-20-2019 p. 11

Page 12: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Two broad classes methods: supervised and unsupervisedlearning.

ä General approach in both methods: Reduce dimension firstthen tackle task in lower dimension space.

Olomouc, 09-20-2019 p. 12

Page 13: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Major tool of Data Mining: Dimension reduction

ä Goal is not as much to reduce size (& cost) but to:

• Reduce noise and redundancy in data before performing atask [e.g., classification as in digit/face recognition]

• Discover important ‘features’ or ‘paramaters’

The problem: Given: X = [x1, · · · , xn] ∈ Rm×n, find a

low-dimens. representation Y = [y1, · · · , yn] ∈ Rd×n of X

ä Achieved by a mapping Φ : x ∈ Rm −→ y ∈ Rd so:

φ(xi) = yi, i = 1, · · · , n

Olomouc, 09-20-2019 p. 13

Page 14: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

m

n

X

Y

x

y

i

id

n

ä Φ may be linear : yj = W>xj, ∀j, or, Y = W>X

ä ... or nonlinear (implicit).

ä Mapping Φ required to: Preserve proximity? Maximizevariance? Preserve a certain graph?

Olomouc, 09-20-2019 p. 14

Page 15: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Basics: Principal Component Analysis (PCA)

In Principal Component Analysis W is computed to maxi-mize variance of projected data:

maxW∈Rm×d;W>W=I

n∑i=1

∥∥∥∥∥∥yi − 1

n

n∑j=1

yj

∥∥∥∥∥∥2

2

, yi = W>xi.

ä Leads to maximizing

Tr[W>(X − µe>)(X − µe>)>W

], µ = 1

nΣni=1xi

ä SolutionW = { dominant eigenvectors } of the covariancematrix≡ Set of left singular vectors of X̄ = X − µe>

Olomouc, 09-20-2019 p. 15

Page 16: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

SVD:

X̄ = UΣV >, U>U = I, V >V = I, Σ = Diag

ä Optimal W = Ud ≡ matrix of first d columns of U

ä Solution W also minimizes ‘reconstruction error’ ..

∑i

‖xi −WW Txi‖2 =∑i

‖xi −Wyi‖2

ä In some methods recentering to zero is not done, i.e., X̄replaced by X.

Olomouc, 09-20-2019 p. 16

Page 17: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Unsupervised learning

“Unsupervised learning” : meth-ods do not exploit labeled dataä Example of digits: perform a 2-Dprojectionä Images of same digit tend tocluster (more or less)ä Such 2-D representations arepopular for visualizationä Can also try to find natural clus-ters in data, e.g., in materialsä Basic clusterning technique: K-means

−6 −4 −2 0 2 4 6 8−5

−4

−3

−2

−1

0

1

2

3

4

5PCA − digits : 5 −− 7

567

SuperhardPhotovoltaic

Superconductors

Catalytic

Ferromagnetic

Thermo−electricMulti−ferroics

Olomouc, 09-20-2019 p. 17

Page 18: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Example: Digit images (a random sample of 30)

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

Olomouc, 09-20-2019 p. 18

Page 19: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

2-D ’reductions’:

−10 −5 0 5 10−6

−4

−2

0

2

4

6PCA − digits : 0 −− 4

01234

−0.2 −0.1 0 0.1 0.2−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15LLE − digits : 0 −− 4

0.07 0.071 0.072 0.073 0.074−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2K−PCA − digits : 0 −− 4

01234

−5.4341 −5.4341 −5.4341 −5.4341 −5.4341

x 10−3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1ONPP − digits : 0 −− 4

Olomouc, 09-20-2019 p. 19

Page 20: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

GRAPH PARTITIONING→ CLUSTERING

Page 21: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Graph Laplaceans - Definition

ä “Laplace-type” matri-ces associated with gen-eral undirected graphs –

● ●

● ●

●●

● −→ L =

?

ä Given a graph G = (V,E) define

• A matrix W of weights wij for each edge with:wij ≥ 0, wii = 0, and wij = wji ∀(i, j)

• The diagonal matrix D = diag(di) with di =∑j 6=iwij

ä Corresponding graph Laplacean of G is:

L = D −W

Olomouc, 09-20-2019 p. 21

Page 22: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Gershgorin’s theorem→ L is positive semidefinite.

ä One eigenvalue equal to zero

ä Simplest case:

wij =

{1 if (i, j) ∈ E & i 6= j0 else di =

∑j 6=i

wij

Example: Consider the graph

●● ●

5

2

34

1

L =

1 −1 0 0 0−1 2 0 0 −10 0 1 0 −10 0 0 1 −10 −1 −1 −1 3

Olomouc, 09-20-2019 p. 22

Page 23: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Basic results on graph Laplaceans

Proposition:(i) L is symmetric semi-positive definite.(ii) L is singular with 1 as a null vector.(iii) If G is connected, then Null(L) = span{ 1}(iv) IfG has k > 1 connected componentsG1, G2, · · · , Gk,then the nullity of L is k and Null(L) is spanned by thevectors z(j), j = 1, · · · , k defined by:

(z(j))i =

{1 if i ∈ Gj

0 if not.

Olomouc, 09-20-2019 p. 23

Page 24: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

A few properties of graph Laplaceans

Define: oriented incidence matrix H : (1)First orient theedges i ∼ j into i → j or j → i. (2) Rows of Hindexed by vertices of G. Columns indexed by edges. (3)For each (i, j) in E, define the corresponding column in Has√w(i, j)(ei − ej).

Example: In previous ex-ample, orient i → j sothat j > i [lower triangularmatrix representation].Then matrix L is: −→

H =

1 0 0 0−1 1 0 00 0 1 00 0 0 10 −1 −1 −1

Property 1 L = HHT

Olomouc, 09-20-2019 p. 24

Page 25: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

A few properties of graph Laplaceans

x

xj

i

Strong relation between xTLx andlocal distances between entries of x

ä Let L = any graph Laplacean

Then:

Property 2: for any x ∈ Rn : [Recall L = D −W ]

x>Lx =∑j>i

wij|xi − xj|2

Olomouc, 09-20-2019 p. 25

Page 26: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Property 3: (generalization) for any Y ∈ Rd×n :

Tr [Y LY >] =∑j>i

wij‖yi − yj‖2

ä Note: yj = j-th colunm of Y . Usually d < n. Each columncan represent a data sample.

Property 4: For the particular L = I − 1n

1 1>

XLX> = X̄X̄> == n× Covariance matrix

Property 5: L is singular and admits the null vector1 =ones(n,1)

Olomouc, 09-20-2019 p. 26

Page 27: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Property 6: (Graph partitioning) Consider situation whenwij ∈{0, 1}. If x is a vector of signs (±1) then

x>Lx = 4× (‘number of edge cuts’)

... where edge-cut = pair (i, j) with xi 6= xj

ä Consequence: Can be usedto partition graphs....

+1

−1

ä ...by minimizing (Lx, x) subject to x ∈ {−1, 1}n and1Tx = 0 [balanced sets]

minx∈{−1,1}n; 1Tx=0

(Lx, x)

(x, x)

ä This problem is hard [combinatorial]→

Olomouc, 09-20-2019 p. 27

Page 28: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Instead we solve a relaxed form of this problem :

minx∈{−1,1}n; 1Tx=0

(Lx, x)

(x, x)→ min

x∈Rn; 1Tx=0

(Lx, x)

(x, x)

ä Define v = u2 then lab = sign(v −med(v))

Background:

ä Consider any symmetric (real) matrix A with eigenvaluesλ1 ≤ λ2 ≤ · · · ≤ λn and eigenvectors u1, · · · , un

ä Recall that:(Min reached for x = u1)

minx∈Rn

(Ax, x)

(x, x)= λ1

Page 29: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä In addition:(Min reached for x = u2)

minx⊥u1

(Ax, x)

(x, x)= λ2

ä For a graph Laplacean u1 = 1 = vector of all ones and

ä ...vector u2 is called the Fiedler vector. It solves the relaxedoptimization problem -

Olomouc, 09-20-2019 p. 29

Page 30: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Clustering

ä Problem: we are given n data items: x1, x2, · · · , xn.Would like to ‘cluster’ them, i.e., group them so that each groupor cluster contains items that are similar in some sense.

ä Example: materialsSuperhard

Photovoltaic

Superconductors

Catalytic

Ferromagnetic

Thermo−electricMulti−ferroics

ä Example: Digits

−6 −4 −2 0 2 4 6 8−5

−4

−3

−2

−1

0

1

2

3

4

5PCA − digits : 5 −− 7

567

ä Refer to each group as a ‘cluster’ or a ‘class’

ä Unsupervised learning

Olomouc, 09-20-2019 p. 30

Page 31: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Example: Block structure detection→ Community Detection

ä Communities modeled by an ‘affinity’ graph [e.g., ’user Asends frequent e-mails to user B’]ä Adjacency Graph represented by a sparse matrix

← OriginalmatrixGoal: Find

ordering soblocks areas dense aspossible→

ä Use ‘blocking’ techniques for sparse matricesä Advantage of this viewpoint: need not know # of clusters.[data: www-personal.umich.edu/~mejn/netdata/]

Olomouc, 09-20-2019 p. 31

Page 32: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Example of application Data set from :

http://www-personal.umich.edu/~mejn/netdata/

ä Network connecting bloggers of different political orienta-tions [2004 US presidentual election]

ä ‘Communities’: liberal vs. conservative

ä Graph: 1, 490 vertices (blogs): 758 liberal, 732 conserva-tive. Edge: i→ j : a citation between blogs i and j

ä Used blocking algorithm (Density theshold=0.4): subgraphs[note: density = |E|/|V |2.]

ä Details in [J. Chen & YS, IEEE TKDE (24), 2012]

Olomouc, 09-20-2019 p. 32

Page 33: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

A basic clustering method: K-means (Background)

ä A basic algorithm that uses Euclidean distance

1 Select p initial centers: c1, c2, ..., cp for classes1, 2, · · · , p2 For each xi do: determine class of xi as argmink‖xi−ck‖3 Redefine each ck to be the centroid of class k4 Repeat until convergence

●●

●●

●●

● ●

●●●

c

c

c

1

2

3

ä Simple algorithmä Works well (gives goodresults) but can be slowä Performance depends oninitialization

Olomouc, 09-20-2019 p. 33

Page 34: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Clustering methods based on similarity graphs

ä Perform clustering by exploiting a graph that describes thesimilarities between any two items in the data.

ä Need to:

1. decide what nodes are in the neighborhood of a given node

2. quantify their similarities - by assigning a weight to any pairof nodes.

Example: For text data: Can decide that any columns i andj with a cosine greater than 0.95 are ‘similar’ and assign thatcosine value to wij

Olomouc, 09-20-2019 p. 34

Page 35: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

First task: build a ‘similarity’ graph

Need: a similarity graph, i.e., a graph that captures the simi-larity between any two items

GraphData

w = ?ij

i

j

ä For each data item find a small number of its nearest neigh-bors

Olomouc, 09-20-2019 p. 35

Page 36: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Two techniques are often used:

ε-graph: Edges consist of pairs (xi, xj) such thatρ(xi, xj) ≤ ε

kNN graph: Nodes adjacent to xi are those nodes x` withthe k with smallest distances ρ(xi, x`).

ä ε-graph is undirected and is geometrically motivated. Is-sues: 1) may result in disconnected components 2) what ε?

ä kNN graphs are directed in general (can be trivially fixed).

ä kNN graphs especially useful in practice.

ä See [J. Chen, H-R Fang, YS, JMLR, 2009] for algorithms.

Olomouc, 09-20-2019 p. 36

Page 37: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Similarity graphs: Using ‘heat-kernels’

Define weight between i and j as:

wij = fij ×

e−‖xi−xj‖

2

σ2X if ‖xi − xj‖ < r

0 if not

ä Note ‖xi − xj‖ could be any measure of distance...

ä fij = optional = some measure of similarity - other thandistance

ä Only nearby points kept.

ä Sparsity depends on parameters

Olomouc, 09-20-2019 p. 37

Page 38: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Edge cuts, ratio cuts, normalized cuts, ...

ä Assume now that we have built a ‘similarity graph’

ä Setting is identical with that of graph partitioning.

ä Need a Graph Laplacean: L = D − W with wii =0, wij ≥ 0 and D = diag(W ∗ ones(n, 1)) [matlab]

ä Partition vertex set V in two sets A and B with

A ∪B = V, A ∩B = ∅

ä Define cut(A,B) =∑

u ∈A,v∈Bw(u, v)

Olomouc, 09-20-2019 p. 38

Page 39: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Naive approach: use this measure to partition graph, i.e.,

... Find A and B that minimize cut(A,B).

ä Issue: Small sets, isolated nodes, big imbalances,

●●

●●

●●●

●●

●●

●●

●●

Better cut

Min−cut 2

Min−cut 1

Olomouc, 09-20-2019 p. 39

Page 40: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Normalized cuts [Shi-Malik,2000]

ä Recall notation w(X,Y ) =∑x∈X,y∈Y w(x, y) - then

define:

ncut(A,B) = cut(A,B)w(A,V )

+ cut(A,B)w(B,V )

ä Goal is to avoid small sets A, B

ä Let x be an indicator vector:

xi =

{1 if i ∈ A0 if i ∈ B

ä Recall that: xTLx =∑

(i,j)∈E wij|xi − xj|2 (note: eachedge counted once)

Olomouc, 09-20-2019 p. 40

Page 41: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Let β =w(A, V )

w(B, V )=

xTD 1

( 1− x)TD 1y = x− β( 1− x)

ä Then we need tosolve:

minyi {0,−β}

yTLy

yTDySubject to yTD 1 = 0

ä + Relax→ need to solve Generalized eigenvalue problem

Ly = λDy

ä y1 = 1 is eigenvector associated with eigenvalue λ1 = 0

ä y2 associated with second eigenvalue solves problem.

Olomouc, 09-20-2019 p. 41

Page 42: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Spectral clustering: General approach

1 Given: Collection of data samples {x1, x2, · · · , xn}

2 Build a similarity graph betweenitems

i

j

w(i,j)=?

3 Compute (smallest) eigenvector (s) of resulting graphLaplacean4 Use k-means on eigenvector (s) of Laplacean

ä For normalized cuts solve generalized eigen problem.

ä Application: Image segmentation

Olomouc, 09-20-2019 p. 42

Page 43: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

DEMO

Page 44: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

AMG→ DATA COARSENING

Page 45: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Multilevel techniques in brief

ä Divide and conquer paradigms as well as multilevel methodsin the sense of ‘domain decomposition’

ä Main principle: very costly to do an SVD [or Lanczos] onthe whole set. Why not find a smaller set on which to do theanalysis – without too much loss?

ä Tools used: graph coarsening, divide and conquer –

ä For text data we use hypergraphs

Olomouc, 09-20-2019 p. 45

Page 46: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Multilevel Dimension Reduction

Main Idea: coarsen fora few levels. Use theresulting data set X̂ tofind a projector P fromRm to Rd. P can be usedto project original data ornew data

l

l l

ll

l.

.

.

.

. .

.

.

.

.

.

.

.

..

. .

.

.

.

l

l l

ll

l.

.

.

.

. .

.

.

.

.

.

.

.

..

. .

.

.

.

l

l ll

l

l.

.

.

.

. .

.

.

.

.

.

.

.

..

. .

.

.

.

dY

n

Y

n

d

r

r

Coarsen

Coarsen

Graphn

mX

Last Level Xm

nr

r

Project

ä Gain: Dimension reduction is done with a much smaller set.Hope: not much loss compared to using whole data

Olomouc, 09-20-2019 p. 46

Page 47: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Making it work: Use of Hypergraphs for sparse data

1 2 3 4 5 6 7 8 9* * * * a

* * * * bA = * * * * c

* * * d* * e

6 6

l

6 6

l

ll

6

66

66

l

1

2

3

4

5

67

89

a b

cd

e

net e

net d

Left: a (sparse) data set of n entries in Rm represented by amatrix A ∈ Rm×n

Right: corresponding hypergraph H = (V,E) with vertex setV representing to the columns of A.

Olomouc, 09-20-2019 p. 47

Page 48: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Hypergraph Coarsening uses column matching – similar toa common one used in graph partitioning

ä Compute the non-zero inner product 〈a(i), a(j)〉 betweentwo vertices i and j, i.e., the ith and jth columns of A.

ä Note: 〈a(i), a(j)〉 = ‖a(i)‖‖a(j)‖ cos θij.

Modif. 1: Parameter: 0 < ε < 1. Match two vertices, i.e.,columns, only if angle between the vertices satisfies:

tan θij ≤ ε

Modif. 2: Scale coarsened columns. If i and j matched andif ‖a(i)‖0 ≥ ‖a(j)‖0 replace a(i) and a(j) by

c(`) =

(√1 + cos2 θij

)a(i)

Olomouc, 09-20-2019 p. 48

Page 49: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Call C the coarsened matrix obtained from A using theapproach just described

Lemma: Let C ∈ Rm×c be the coarsened matrix of Aobtained by one level of coarsening of A ∈ Rm×n, withcolumns a(i) and a(j) matched if tan θi ≤ ε. Then

|xTAATx− xTCCTx| ≤ 3ε‖A‖2F ,

for any x ∈ Rm with ‖x‖2 = 1.

ä Very simple bound for Rayleigh quotients for any x.

ä Implies some bounds on singular values and norms - skipped.

Olomouc, 09-20-2019 p. 49

Page 50: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Tests: Comparing singular values

20 25 30 35 40 45 5025

30

35

40

45

50

55

Exactpower−it1coarsencoars+ZSrand

Singular values 20 to 50 and approximations

20 25 30 35 40 45 5020

22

24

26

28

30

32

34

36

38

Exactpower−it1coarsencoars+ZSrand

Singular values 20 to 50 and approximations

20 25 30 35 40 45 5025

30

35

40

45

50

Exactpower−it1coarsencoars+ZSrand

Singular values 20 to 50 and approximations

Results for the datasets CRANFIELD (left), MEDLINE (middle),and TIME (right).

ä See [S. Ubaru, YS. NLAA, 2018]

Olomouc, 09-20-2019 p. 50

Page 51: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

LANCZOS FOR EIGENVALUES→ LANCZOS FOR DIM. REDUCTION

Page 52: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

IR: Use of the Lanczos algorithm (J. Chen, YS ’09)

ä Lanczos algorithm = Projection method on Krylov subspaceSpan{v,Av, · · · , Am−1v}

ä Can get singular vectors with Lanczos, & use them in LSI

ä Better: Use the Lanczos vectors directly for the projection

ä K. Blom and A. Ruhe [SIMAX, vol. 26, 2005] perform aLanczos run for each query [expensive].

ä Proposed: One Lanczos run- random initial vector. Thenuse Lanczos vectors in place of singular vectors.

ä In short: Results comparable to those of SVD at a muchlower cost.

Olomouc, 09-20-2019 p. 52

Page 53: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Tests: IR

Informationretrievaldatasets

# Terms # Docs # queries sparsityMED 7,014 1,033 30 0.735CRAN 3,763 1,398 225 1.412

Pre

proc

essi

ngtim

es

Med dataset.

0 50 100 150 200 250 3000

10

20

30

40

50

60

70

80

90Med

iterations

prep

roce

ssin

g tim

e

lanczostsvd

Cran dataset.

0 50 100 150 200 250 3000

10

20

30

40

50

60Cran

iterations

prep

roce

ssin

g tim

e

lanczostsvd

Olomouc, 09-20-2019 p. 53

Page 54: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Average query times

Med dataset

0 50 100 150 200 250 3000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5x 10

−3 Med

iterations

aver

age

quer

y tim

e

lanczostsvdlanczos−rftsvd−rf

Cran dataset.

0 50 100 150 200 250 3000

0.5

1

1.5

2

2.5

3

3.5

4x 10

−3 Cran

iterations

aver

age

quer

y tim

e

lanczostsvdlanczos−rftsvd−rf

Olomouc, 09-20-2019 p. 54

Page 55: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Average retrieval precision

Med dataset

0 50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Med

iterations

aver

age

prec

isio

n

lanczostsvdlanczos−rftsvd−rf

Cran dataset

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Cran

iterations

aver

age

prec

isio

n

lanczostsvdlanczos−rftsvd−rf

Retrieval precision comparisons

Olomouc, 09-20-2019 p. 55

Page 56: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

A few words on Deep Neural Networks (DNNs)

ä Ideas of neural networks goes back to the 1960s - werepopularized in early 1990s – then laid dormant until recently.

ä Two reasons for the come-back:

• DNN are remarkably effective in some applications

• big progress made in hardware [→ affordable ‘training cost’]

Olomouc, 09-20-2019 p. 56

Page 57: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

ä Training a neural network can be viewed as a problemof approximating a function φ which is defined via sets ofparameters:

φ( )x

1st set o

f parameters

2nd set of p

arameters

4th set of p

arameters

3rd set o

f parameters

y

1

2

y

x

x

x1

2

n

.

.

my

.

.

.

Problem: find sets of parameters such that φ(x) ≈ y

Olomouc, 09-20-2019 p. 57

Page 58: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Input: x, Output: ySet: z0 = xFor l = 1 : L+1 Do:

zl = σ(W Tl zl−1 + bl)

EndSet: y = φ(x) := zL+1

• layer # 0 = input layer• layer # (L+1) = output layer

Layer

Input

Layer

OutputHidden

Layer

ä A matrix Wl is associated with layers 1,2, L+ 1.

ä Problem: Find φ (i.e., matrices Wl) s.t. φ(x) ≈ y

Olomouc, 09-20-2019 p. 58

Page 59: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

DNN (continued)

ä Problem is not convex, highly parameterized, ...,

ä .. Main method used: Stochastic gradient descent [basic]

ä It all looks like alchemy... but it works well for certain appli-cations

ä Training is still quite expensive – GPUs can help

ä *Very* active area of research

Olomouc, 09-20-2019 p. 59

Page 60: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

Conclusion

ä *Many* interesting new matrix problems in areas that involvethe effective mining of data

ä Among the most pressing issues is that of reducing compu-tational cost - [SVD, SDP, ..., too costly]

ä Many online resources available

ä Huge potential in areas like materials science though inertiahas to be overcome

ä To a researcher in computational linear algebra : Tsunami ofchange on types or problems, algorithms, frameworks, culture,..

ä But change should be welcome :

In the words of “Who Moved My Cheese?” [ Spencer Johnson,2002]

Page 61: Yousef Saad Department of Computer Science and Engineeringsaad/PDF/OlomoucTut.pdfFacebook, Netflix, Amazon, Twitter, Ali-Baba, Tencent, ..., and even the big department stores (Walmart,

“If you do not change, you can become extinct !”

ä In the words of Einstein:

“Life is like riding a bicycle: To keep your balance you need tokeep moving”

Thank you !

ä Visit my web-site at www.cs.umn.edu/~saad

Olomouc, 09-20-2019 p. 61