Large Scale Data Analysis Using Deep Learningukang/courses/17S-DL/L3-linear... · 2017-03-12 ·...

U Kang 1

Large Scale Data Analysis UsingDeep Learning

Linear Algebra

U KangSeoul National University

U Kang 2

In This Lecture

Overview of linear algebra (but, not a comprehensive survey)

Focused on the subset most relevant to deep learning

For details on linear algebra, refer to Linear Algebra and Its Applications by Gilbert Strang

U Kang 3

Scalar

A scalar is a single number

Integers, real number, rational numbers, etc.

We denote it with italic font a, n, x

U Kang 4

Vectors

A vector is a 1-D array of numbers

Can be real, binary, integer, etc. Notation for type and size

U Kang 5

Matrices

A matrix is a 2-D array of numbers

Example notation for type and shape

U Kang 6

Tensors

A tensor is an array of numbers, that may have Zero dimensions, and be a scalar One dimension, and be a vector Two dimensions, and be a matrix Three dimensions or more Matrix over time Knowledge base (subject, verb, object) …

U Kang 7

Matrix Transpose

(AT)i,j = Aj,I

(AB)T = BTAT

U Kang 8

Matrix Product

C = AB

U Kang 9

Matrix Product

Matrix product as sum of outer product

Product with a diagonal matrix

U Kang 10

Identity Matrix

Example identity matrix: I3

U Kang 11

Systems of Equations

expands to

U Kang 12

Solving Systems of Equations

A linear system of equations can have: No solution 3x + 2y = 6, 3x + 2y = 12

Many solutions 3x + 2y = 6, 6x + 4y = 12

Exactly one solution: this means multiplication by the matrix is an invertible function 𝐴𝐴𝐴𝐴 = 𝑏𝑏 𝐴𝐴 = 𝐴𝐴−1𝑏𝑏

U Kang 13

Matrix Inversion

Matrix inverse: an inverse A-1 of an nxn matrix A satisfies

𝐴𝐴𝐴𝐴−1 = 𝐴𝐴−1𝐴𝐴 = 𝐼𝐼𝑛𝑛

Solving a system using an inverse

U Kang 14

Invertibility

Matrix A cannot be inverted if More rows than columns More columns than rows Redundant rows/columns (“linearly dependent”, “low

rank”) full rank The number 0 is an eigenvalue of A

A non-invertible matrix is called a singular matrix An invertible matrix is called non-singular

U Kang 15

Linear Dependence and Span

Linear combination of vectors {𝑣𝑣 1 , … , 𝑣𝑣 𝑛𝑛 } : ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑣𝑣(𝑖𝑖)

The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors

Matrix-vector product Ax can be viewed as a linear combination of column vectors of A: 𝐴𝐴𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖𝐴𝐴:,𝑖𝑖

The span of columns of A is called column space or range of A

A set of vectors is linearly independent if no vector in the set is a linear combination of the other vectors linearly dependent

U Kang 16

Rank of a Matrix

Rank of a matrix A: Number of linearly independentcolumns (or rows) of A

For example: Matrix A = ’s rank ?

Why?

U Kang 17

Norms

Functions that measure how “large” a vector is

Similar to a distance between zero and the point represented by the vector

Formally, a norm is any function f that satisfies the following:

U Kang 18

Norms

Lp norm

Most popular norm: L2 norm, p = 2 Euclidean distance For matrix/tensor, Frobenius norm (LF) does the same thing

L1 norm, p = 1: Called ‘Manhattan’ distance

Max norm, infinite p:

Dot product in terms of norm:

U Kang 19

Special Matrices and Vectors

Diagonal matrix

2 by 2 : 𝑎𝑎𝑏𝑏

We use diag(v) to denote the diagonal matrix where v is the vector containing the diagonal elements

Inverse of diagonal matrix is computed easily

U Kang 20

Special Matrices and Vectors

Unit vector

Symmetric matrix

Orthogonal matrix

U Kang 21

Eigenvector and Eigenvalue

Eigenvector and eigenvalue of A

Eigenvalue/eigenvector pairs are defined only for square matrix 𝐴𝐴 ∈ 𝑅𝑅𝑛𝑛×𝑛𝑛

# of eigenvalues: n There may be duplicates

Eigenvalues and eigenvectors can contain real or imaginary numbers

U Kang 22

Intuition

A as vector transformation

2 11 3

A

10

x

21

x’

= x

x’

2

1

1

3

1

0

2

1

U Kang 23

Intuition

By defn., eigenvectors remain parallel to themselves (‘fixed points’)

2 11 3

A0.520.85

v1v1

=0.520.853.62 *

λ1

2

1

1

3

0.52

0.85

0.52

0.85

U Kang 24

Eigendecomposition

Eigendecomposition of a matrix Let A be a square (n x n) matrix with n linearly independent

eigenvectors (=diagonizable) Then A can be factorized as

where V is an (nxn) matrix whose i th column is the i theigenvector of A

U Kang 25

Eigendecomposition

Every real symmetric matrix has a real, orthogonal eigendecomposition

A real orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors (orthonormal vectors) 𝑄𝑄𝑇𝑇𝑄𝑄 = 𝑄𝑄 𝑄𝑄𝑇𝑇 = 𝐼𝐼 𝑄𝑄𝑇𝑇 = 𝑄𝑄−1

U Kang 26

Eigendecomposition

Interpreting matrix-vector multiplication Ax using eigendecomposition

U Kang 27

Eigendecomposition

Understanding optimization of 𝑓𝑓 𝐴𝐴 = 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴, s.t.| 𝐴𝐴 |2 = 1, using eigenvalues and eigenvectors

U Kang 28

Eigendecomposition

Positive definite matrix: a real symmetric matrix whose eigenvalues are all positive Positive semidefinite matrix Negative definite matrix Negative semidefinite matrix

For positive semidefinite matrix A, ∀𝐴𝐴, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 ≥ 0. For positive definite matrix A, ∀𝐴𝐴 ≠ 0, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 > 0

U Kang 29

Singular Value Decomposition (SVD)

Similar to eigendecomposition

More general; matrix need not be square

𝐴𝐴 = 𝑈𝑈Λ𝑉𝑉𝑇𝑇

U Kang 30

SVD - Definition

A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) Left singular vectors

Λ: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)

V: m x r matrix (m terms, r concepts) Right singular vectors

U Kang 31

SVD - Definition

r x r r x m

n x rn x m

A U

Λ VT

A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

= x x

U Kang 32

SVD - Properties

THEOREM [Press+92]: always possible to decompose matrix A into A = U Λ VT , where

U, Λ, V: unique (*) U, V: column orthonormal (ie., columns are unit

vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix)

Λ: singular are positive, and sorted in decreasing order

U Kang 33

𝐴𝐴 ≈ 𝑈𝑈Λ𝑉𝑉𝑇𝑇 = �𝑖𝑖𝜎𝜎𝑖𝑖 (𝑢𝑢𝑖𝑖𝑣𝑣𝑖𝑖𝑇𝑇)

Best rank-k approximation in LF

Am

n

𝚲𝚲m

n

U

VT≈

SVD - Properties

U Kang 34

SVD and eigendecomposition

The left-singular vectors of A are the eigenvectors of AAT

The right-singular vectors of A are the eigenvectors of ATA

The non-zero singular values of A are the square root of the eigenvalues of ATA

U Kang 35

Moore-Penrose Pseudoinverse

Assume we want to solve A x = y for x What if A is not invertible? What if A is not square?

Still, we can find the ‘best’ x by using pseudoinverse

For an (n x m) matrix A, the pseudoinverse A+ is an (m x n) matrix

U Kang 36

Moore-Penrose Pseudoinverse

If the equation has: Exactly one solution: this is the same as the inverse

No solution: this gives us the solution with the smallest error ||Ax – y||2 Over-specified case

Many solutions: this gives us the solution with the smallest norm of x Under-specified case

U Kang 37

Pseudoinverse: Over-specified case

No solution: this gives us the solution with the smallest error ||Ax – y||2

[3 2]T [x] = [1 2]T (i.e., 3x = 1, 2x = 2)

([3 2]T)+ [1 2] = … = 7/13 This method is called the ‘least square’

1 2 3 4

1

2reachable points (3x, 2x)

desirable point y

U Kang 38

Pseudoinverse: Under-specified case

Many solutions: this gives us the solution with the smallest norm of x

[1 2] [w z]T = 4 (i.e., 1 w + 2 z = 4)

[1 2]+ 4 = [1/5 2/5]T 4 = [4/5 8/5]

1 2 3 4

1

2 all possible solutionsx0

w

z shortest-length solution

U Kang 39

Computing the Pseudoinverse

The SVD allows the computation of the pseudoinverse:

U Kang 40

Trace

𝑇𝑇𝑇𝑇 𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖,𝑖𝑖 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝐴𝐴) 𝑇𝑇𝑇𝑇 𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝑇𝑇)

| 𝐴𝐴 |𝐹𝐹 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝑇𝑇) For scalar a, 𝑇𝑇𝑇𝑇 𝑎𝑎 = 𝑎𝑎

The trace of a matrix is also computed by the sum of all eigenvalues

U Kang 41

What you need to know

Linear algebra is crucial for understanding notations and mechanisms of many ML/deep learning methods

Important concepts Matrix product, identity, invertibility, norm,

eigendecomposition, singular value decomposition, pseudo inverse, and trace

U Kang 42

Questions?

슬라이드 번호 1In This LectureScalarVectorsMatricesTensorsMatrix TransposeMatrix ProductMatrix ProductIdentity MatrixSystems of EquationsSolving Systems of EquationsMatrix InversionInvertibilityLinear Dependence and SpanRank of a MatrixNormsNormsSpecial Matrices and VectorsSpecial Matrices and VectorsEigenvector and EigenvalueIntuitionIntuitionEigendecompositionEigendecompositionEigendecompositionEigendecompositionEigendecompositionSingular Value Decomposition (SVD)SVD - DefinitionSVD - DefinitionSVD - Properties슬라이드 번호 33SVD and eigendecompositionMoore-Penrose PseudoinverseMoore-Penrose PseudoinversePseudoinverse: Over-specified casePseudoinverse: Under-specified caseComputing the PseudoinverseTraceWhat you need to know슬라이드 번호 42

Large Scale Data Analysis Using Deep Learningukang/courses/17S-DL/L3-linear... · 2017-03-12 ·...

Documents

Transcript of Large Scale Data Analysis Using Deep Learningukang/courses/17S-DL/L3-linear... · 2017-03-12 ·...