Large Scale Data Analysis Using Deep Learningukang/courses/17S-DL/L3-linear... · 2017-03-12 ·...
Transcript of Large Scale Data Analysis Using Deep Learningukang/courses/17S-DL/L3-linear... · 2017-03-12 ·...
-
U Kang 1
Large Scale Data Analysis UsingDeep Learning
Linear Algebra
U KangSeoul National University
-
U Kang 2
In This Lecture
Overview of linear algebra (but, not a comprehensive survey)
Focused on the subset most relevant to deep learning
For details on linear algebra, refer to Linear Algebra and Its Applications by Gilbert Strang
-
U Kang 3
Scalar
A scalar is a single number
Integers, real number, rational numbers, etc.
We denote it with italic font a, n, x
-
U Kang 4
Vectors
A vector is a 1-D array of numbers
Can be real, binary, integer, etc. Notation for type and size
-
U Kang 5
Matrices
A matrix is a 2-D array of numbers
Example notation for type and shape
-
U Kang 6
Tensors
A tensor is an array of numbers, that may have Zero dimensions, and be a scalar One dimension, and be a vector Two dimensions, and be a matrix Three dimensions or more Matrix over time Knowledge base (subject, verb, object) …
-
U Kang 7
Matrix Transpose
(AT)i,j = Aj,I
(AB)T = BTAT
-
U Kang 8
Matrix Product
C = AB
-
U Kang 9
Matrix Product
Matrix product as sum of outer product
Product with a diagonal matrix
-
U Kang 10
Identity Matrix
Example identity matrix: I3
-
U Kang 11
Systems of Equations
expands to
-
U Kang 12
Solving Systems of Equations
A linear system of equations can have: No solution 3x + 2y = 6, 3x + 2y = 12
Many solutions 3x + 2y = 6, 6x + 4y = 12
Exactly one solution: this means multiplication by the matrix is an invertible function 𝐴𝐴𝐴𝐴 = 𝑏𝑏 𝐴𝐴 = 𝐴𝐴−1𝑏𝑏
-
U Kang 13
Matrix Inversion
Matrix inverse: an inverse A-1 of an nxn matrix A satisfies
𝐴𝐴𝐴𝐴−1 = 𝐴𝐴−1𝐴𝐴 = 𝐼𝐼𝑛𝑛
Solving a system using an inverse
-
U Kang 14
Invertibility
Matrix A cannot be inverted if More rows than columns More columns than rows Redundant rows/columns (“linearly dependent”, “low
rank”) full rank The number 0 is an eigenvalue of A
A non-invertible matrix is called a singular matrix An invertible matrix is called non-singular
-
U Kang 15
Linear Dependence and Span
Linear combination of vectors {𝑣𝑣 1 , … , 𝑣𝑣 𝑛𝑛 } : ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑣𝑣(𝑖𝑖)
The span of a set of vectors is the set of all points obtainable by linear combination of the original vectors
Matrix-vector product Ax can be viewed as a linear combination of column vectors of A: 𝐴𝐴𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖𝐴𝐴:,𝑖𝑖
The span of columns of A is called column space or range of A
A set of vectors is linearly independent if no vector in the set is a linear combination of the other vectors linearly dependent
-
U Kang 16
Rank of a Matrix
Rank of a matrix A: Number of linearly independentcolumns (or rows) of A
For example: Matrix A = ’s rank ?
Why?
-
U Kang 17
Norms
Functions that measure how “large” a vector is
Similar to a distance between zero and the point represented by the vector
Formally, a norm is any function f that satisfies the following:
-
U Kang 18
Norms
Lp norm
Most popular norm: L2 norm, p = 2 Euclidean distance For matrix/tensor, Frobenius norm (LF) does the same thing
L1 norm, p = 1: Called ‘Manhattan’ distance
Max norm, infinite p:
Dot product in terms of norm:
-
U Kang 19
Special Matrices and Vectors
Diagonal matrix
2 by 2 : 𝑎𝑎𝑏𝑏
We use diag(v) to denote the diagonal matrix where v is the vector containing the diagonal elements
Inverse of diagonal matrix is computed easily
-
U Kang 20
Special Matrices and Vectors
Unit vector
Symmetric matrix
Orthogonal matrix
-
U Kang 21
Eigenvector and Eigenvalue
Eigenvector and eigenvalue of A
Eigenvalue/eigenvector pairs are defined only for square matrix 𝐴𝐴 ∈ 𝑅𝑅𝑛𝑛×𝑛𝑛
# of eigenvalues: n There may be duplicates
Eigenvalues and eigenvectors can contain real or imaginary numbers
-
U Kang 22
Intuition
A as vector transformation
2 11 3
A
10
x
21
x’
= x
x’
2
1
1
3
1
0
2
1
-
U Kang 23
Intuition
By defn., eigenvectors remain parallel to themselves (‘fixed points’)
2 11 3
A0.520.85
v1v1
=0.520.853.62 *
λ1
2
1
1
3
0.52
0.85
0.52
0.85
-
U Kang 24
Eigendecomposition
Eigendecomposition of a matrix Let A be a square (n x n) matrix with n linearly independent
eigenvectors (=diagonizable) Then A can be factorized as
where V is an (nxn) matrix whose i th column is the i theigenvector of A
-
U Kang 25
Eigendecomposition
Every real symmetric matrix has a real, orthogonal eigendecomposition
A real orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors (orthonormal vectors) 𝑄𝑄𝑇𝑇𝑄𝑄 = 𝑄𝑄 𝑄𝑄𝑇𝑇 = 𝐼𝐼 𝑄𝑄𝑇𝑇 = 𝑄𝑄−1
-
U Kang 26
Eigendecomposition
Interpreting matrix-vector multiplication Ax using eigendecomposition
-
U Kang 27
Eigendecomposition
Understanding optimization of 𝑓𝑓 𝐴𝐴 = 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴, s.t.| 𝐴𝐴 |2 = 1, using eigenvalues and eigenvectors
-
U Kang 28
Eigendecomposition
Positive definite matrix: a real symmetric matrix whose eigenvalues are all positive Positive semidefinite matrix Negative definite matrix Negative semidefinite matrix
For positive semidefinite matrix A, ∀𝐴𝐴, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 ≥ 0. For positive definite matrix A, ∀𝐴𝐴 ≠ 0, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 > 0
-
U Kang 29
Singular Value Decomposition (SVD)
Similar to eigendecomposition
More general; matrix need not be square
𝐴𝐴 = 𝑈𝑈Λ𝑉𝑉𝑇𝑇
-
U Kang 30
SVD - Definition
A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T
A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) Left singular vectors
Λ: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix)
V: m x r matrix (m terms, r concepts) Right singular vectors
-
U Kang 31
SVD - Definition
r x r r x m
n x rn x m
A U
Λ VT
A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T
= x x
-
U Kang 32
SVD - Properties
THEOREM [Press+92]: always possible to decompose matrix A into A = U Λ VT , where
U, Λ, V: unique (*) U, V: column orthonormal (ie., columns are unit
vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix)
Λ: singular are positive, and sorted in decreasing order
-
U Kang 33
𝐴𝐴 ≈ 𝑈𝑈Λ𝑉𝑉𝑇𝑇 = �𝑖𝑖𝜎𝜎𝑖𝑖 (𝑢𝑢𝑖𝑖𝑣𝑣𝑖𝑖𝑇𝑇)
Best rank-k approximation in LF
Am
n
𝚲𝚲m
n
U
VT≈
SVD - Properties
-
U Kang 34
SVD and eigendecomposition
The left-singular vectors of A are the eigenvectors of AAT
The right-singular vectors of A are the eigenvectors of ATA
The non-zero singular values of A are the square root of the eigenvalues of ATA
-
U Kang 35
Moore-Penrose Pseudoinverse
Assume we want to solve A x = y for x What if A is not invertible? What if A is not square?
Still, we can find the ‘best’ x by using pseudoinverse
For an (n x m) matrix A, the pseudoinverse A+ is an (m x n) matrix
-
U Kang 36
Moore-Penrose Pseudoinverse
If the equation has: Exactly one solution: this is the same as the inverse
No solution: this gives us the solution with the smallest error ||Ax – y||2 Over-specified case
Many solutions: this gives us the solution with the smallest norm of x Under-specified case
-
U Kang 37
Pseudoinverse: Over-specified case
No solution: this gives us the solution with the smallest error ||Ax – y||2
[3 2]T [x] = [1 2]T (i.e., 3x = 1, 2x = 2)
([3 2]T)+ [1 2] = … = 7/13 This method is called the ‘least square’
1 2 3 4
1
2reachable points (3x, 2x)
desirable point y
-
U Kang 38
Pseudoinverse: Under-specified case
Many solutions: this gives us the solution with the smallest norm of x
[1 2] [w z]T = 4 (i.e., 1 w + 2 z = 4)
[1 2]+ 4 = [1/5 2/5]T 4 = [4/5 8/5]
1 2 3 4
1
2 all possible solutionsx0
w
z shortest-length solution
-
U Kang 39
Computing the Pseudoinverse
The SVD allows the computation of the pseudoinverse:
-
U Kang 40
Trace
𝑇𝑇𝑇𝑇 𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖,𝑖𝑖 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝐴𝐴) 𝑇𝑇𝑇𝑇 𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝑇𝑇)
| 𝐴𝐴 |𝐹𝐹 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝑇𝑇) For scalar a, 𝑇𝑇𝑇𝑇 𝑎𝑎 = 𝑎𝑎
The trace of a matrix is also computed by the sum of all eigenvalues
-
U Kang 41
What you need to know
Linear algebra is crucial for understanding notations and mechanisms of many ML/deep learning methods
Important concepts Matrix product, identity, invertibility, norm,
eigendecomposition, singular value decomposition, pseudo inverse, and trace
-
U Kang 42
Questions?
슬라이드 번호 1In This LectureScalarVectorsMatricesTensorsMatrix TransposeMatrix ProductMatrix ProductIdentity MatrixSystems of EquationsSolving Systems of EquationsMatrix InversionInvertibilityLinear Dependence and SpanRank of a MatrixNormsNormsSpecial Matrices and VectorsSpecial Matrices and VectorsEigenvector and EigenvalueIntuitionIntuitionEigendecompositionEigendecompositionEigendecompositionEigendecompositionEigendecompositionSingular Value Decomposition (SVD)SVD - DefinitionSVD - DefinitionSVD - Properties슬라이드 번호 33SVD and eigendecompositionMoore-Penrose PseudoinverseMoore-Penrose PseudoinversePseudoinverse: Over-specified casePseudoinverse: Under-specified caseComputing the PseudoinverseTraceWhat you need to know슬라이드 번호 42