Applied Math for Machine Learning - AIoT Lab
Transcript of Applied Math for Machine Learning - AIoT Lab
![Page 1: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/1.jpg)
Applied Math for Machine Learning
Prof. Kuan-Ting Lai
2021/3/11
![Page 2: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/2.jpg)
Applied Math for Machine Learning
β’ Linear Algebra
β’Probability
β’Calculus
β’Optimization
![Page 3: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/3.jpg)
Linear Algebra
β’ Scalarβ real numbers
β’ Vector (1D)β Has a magnitude & a direction
β’ Matrix (2D)β An array of numbers arranges in rows &
columns
β’ Tensor (>=3D)β Multi-dimensional arrays of numbers
![Page 4: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/4.jpg)
Real-world examples of Data Tensors
β’ Timeseries Data β 3D (samples, timesteps, features)
β’ Images β 4D (samples, height, width, channels)
β’ Video β 5D (samples, frames, height, width, channels)
4
![Page 5: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/5.jpg)
Vector Dimension vs. Tensor Dimension
β’ The number of data in a vector is also called βdimensionβ
β’ In deep learning , the dimension of Tensor is also called βrankβ
β’ Matrix = 2d array = 2d tensor = rank 2 tensor
https://deeplizard.com/learn/video/AiyK0idr4uM
![Page 6: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/6.jpg)
The Matrix
![Page 7: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/7.jpg)
Matrix
β’ Define a matrix with m rows and n columns:
Santanu Pattanayak, βPro Deep Learning with TensorFlow,β Apress, 2017
![Page 8: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/8.jpg)
Matrix Operations
β’ Addition and Subtraction
![Page 9: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/9.jpg)
Matrix Multiplication
β’ Two matrices A and B, where
β’ The columns of A must be equal to the rows of B, i.e. n == p
β’ A * B = C, where
β’m
n
p
m
![Page 10: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/10.jpg)
Example of Matrix Multiplication (3-1)
https://www.mathsisfun.com/algebra/matrix-multiplying.html
![Page 11: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/11.jpg)
Example of Matrix Multiplication (3-2)
https://www.mathsisfun.com/algebra/matrix-multiplying.html
![Page 12: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/12.jpg)
Example of Matrix Multiplication (3-3)
https://www.mathsisfun.com/algebra/matrix-multiplying.html
![Page 14: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/14.jpg)
Dot Product
β’ Dot product of two vectors become a scalar
β’ Inner product is a generalization of the dot product
β’ Notation: π£1 β π£2 or π£1ππ£2
![Page 15: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/15.jpg)
Dot Product in a Matrix
![Page 16: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/16.jpg)
Outer Product
https://en.wikipedia.org/wiki/Outer_product
![Page 17: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/17.jpg)
Linear Independence
β’ A vector is linearly dependent on other vectors if it can be expressed as the linear combination of other vectors
β’ A set of vectors π£1, π£2,β― , π£π is linearly independent if π1π£1 +π2π£2 +β―+ πππ£π = 0 implies all ππ = 0, βπ β {1,2,β―π}
![Page 18: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/18.jpg)
Span the Vector Space
β’ n linearly independent vectors can span n-dimensional space
![Page 19: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/19.jpg)
Rank of a Matrix
β’ Rank is:β The number of linearly independent row or column vectors
β The dimension of the vector space generated by its columns
β’ Row rank = Column rank
β’ Example:
https://en.wikipedia.org/wiki/Rank_(linear_algebra)
Row-echelon
form
![Page 20: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/20.jpg)
Identity Matrix Iβ’ Any vector or matrix multiplied by I remains unchanged
β’ For a matrix π΄, π΄πΌ = πΌπ΄ = π΄
![Page 21: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/21.jpg)
Inverse of a Matrix
β’ The product of a square matrix π΄ and its inverse matrix π΄β1
produces the identity matrix πΌ
β’ π΄π΄β1 = π΄β1π΄ = πΌ
β’ Inverse matrix is square, but not all square matrices has inverses
![Page 22: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/22.jpg)
Pseudo Inverse
β’ Non-square matrix and have left-inverse or right-inverse matrix
β’ Example:
β Create a square matrix π΄ππ΄
β Multiplied both sides by inverse matrix (π΄ππ΄)β1
β (π΄ππ΄)β1π΄π is the pseudo inverse function
π΄π₯ = π, π΄ β βπΓπ, π β βπ
π΄ππ΄π₯ = π΄ππ
π₯ = (π΄ππ΄)β1π΄ππ
![Page 23: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/23.jpg)
Norm
β’ Norm is a measure of a vectorβs magnitude
β’ π2 norm
β’ π1 norm
β’ ππ norm
β’ πβ norm
![Page 24: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/24.jpg)
Eigen Vectors
β’ Eigenvector is a non-zero vector that changed by only a scalar factor Ξ»when linear transform π΄ is applied to:
β’ π₯ are Eigenvectors and π are Eigenvalues
β’ One of the most important concepts in machine learning, ex:β Principle Component Analysis (PCA)
β Eigenvector centrality
β PageRank
β β¦
π΄π₯ = ππ₯, π΄ β βπΓπ, π₯ β βπ
![Page 25: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/25.jpg)
Example: Shear Mappingβ’ Horizontal axis is the
Eigenvector
![Page 26: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/26.jpg)
Principle Component Analysis (PCA)
β’ Eigenvector of Covariance Matrix
https://en.wikipedia.org/wiki/Principal_component_analysis
![Page 27: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/27.jpg)
NumPy for Linear Algebra
β’ NumPy is the fundamental package for scientific computing with Python. It contains among other things:βa powerful N-dimensional array objectβsophisticated (broadcasting) functionsβtools for integrating C/C++ and Fortran codeβuseful linear algebra, Fourier transform, and random
number capabilities
![Page 28: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/28.jpg)
Create Tensors
Scalars (0D tensors) Vectors (1D tensors) Matrices (2D tensors)
![Page 29: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/29.jpg)
Create 3D Tensor
![Page 30: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/30.jpg)
Attributes of a Numpy Tensor
β’ Number of axes (dimensions, rank)β x.ndim
β’ Shapeβ This is a tuple of integers showing how many data the tensor has along each axis
β’ Data typeβ uint8, float32 or float64
![Page 31: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/31.jpg)
Numpy Multiplication
![Page 32: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/32.jpg)
Unfolding the Manifold
β’ Tensor operations are complex geometric transformation in high-dimensional spaceβ Dimension reduction
![Page 33: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/33.jpg)
Basics of Probability
![Page 34: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/34.jpg)
Three Axioms of Probability
β’ Given an Event πΈ in a sample space π, S = π=1Ϊπ πΈπ
β’ First axiomβ π πΈ β β, 0 β€ π(πΈ) β€ 1
β’ Second axiomβ π π = 1
β’ Third axiomβ Additivity, any countable sequence of mutually exclusive events πΈπβ π π=1Ϊ
π πΈπ = π πΈ1 + π πΈ2 +β―+ π πΈπ = Οπ=1π π πΈπ
![Page 35: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/35.jpg)
Union, Intersection, and Conditional Probability β’ π π΄ βͺ π΅ = π π΄ + π π΅ β π π΄ β© π΅
β’ π π΄ β© π΅ is simplified as π π΄π΅
β’ Conditional Probability π π΄|π΅ , the probability of event A given B has occurred
β π π΄|π΅ = ππ΄π΅
π΅
β π π΄π΅ = π π΄|π΅ π π΅ = π π΅|π΄ π(π΄)
![Page 36: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/36.jpg)
Chain Rule of Probability
β’ The joint probability can be expressed as chain rule
![Page 37: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/37.jpg)
Mutually Exclusive
β’ π π΄π΅ = 0
β’ π π΄ βͺ π΅ = π π΄ + π π΅
![Page 38: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/38.jpg)
Independence of Events
β’ Two events A and B are said to be independent if the probability of their intersection is equal to the product of their individual probabilitiesβ π π΄π΅ = π π΄ π π΅
β π π΄|π΅ = π π΄
![Page 39: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/39.jpg)
Bayes Rule
π π΄|π΅ =π π΅|π΄ π(π΄)
π(π΅)
Proof:
Remember π π΄|π΅ = ππ΄π΅
π΅
So π π΄π΅ = π π΄|π΅ π π΅ = π π΅|π΄ π(π΄)Then Bayes π π΄|π΅ = π π΅|π΄ π(π΄)/π π΅
![Page 40: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/40.jpg)
NaΓ―ve Bayes Classifier
![Page 41: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/41.jpg)
NaΓ―ve = Assume All Features Independent
![Page 42: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/42.jpg)
Normal (Gaussian) Distributionβ’ One of the most important distributions
β’ Central limit theoremβ Averages of samples of observations of random variables independently drawn from
independent distributions converge to the normal distribution
![Page 43: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/43.jpg)
![Page 44: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/44.jpg)
Differentiation
OR
![Page 45: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/45.jpg)
Derivatives of Basic Function ππ¦
ππ₯
![Page 46: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/46.jpg)
Gradient of a Functionβ’ Gradient is a multi-variable generalization of the derivative
β’ Apply partial derivatives
β’ Example
![Page 47: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/47.jpg)
Chain Rule
47
![Page 48: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/48.jpg)
Maxima and Minima for Univariate Function
β’ If ππ(π₯)
ππ₯= 0, itβs a minima or a maxima point, then we study the
second derivative:
β If π2π(π₯)
ππ₯2< 0 => Maxima
β If π2π(π₯)
ππ₯2> 0 => Minima
β If π2π(π₯)
ππ₯2= 0 => Point of reflection
Minima
![Page 49: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/49.jpg)
Gradient Descent
![Page 50: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/50.jpg)
Gradient Descent along a 2D Surface
![Page 51: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/51.jpg)
Avoid Local Minimum using Momentum
![Page 52: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/52.jpg)
Optimization
https://en.wikipedia.org/wiki/Optimization_problem
![Page 53: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/53.jpg)
Principle Component Analysis (PCA)
β’ Assumptionsβ Linearity
β Mean and Variance are sufficient statistics
β The principal components are orthogonal
![Page 54: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/54.jpg)
Principle Component Analysis (PCA)
max. cov π, π
π . π. π‘ πTπ = π
![Page 55: Applied Math for Machine Learning - AIoT Lab](https://reader034.fdocuments.us/reader034/viewer/2022043013/626bba2cff5f0b098c0b05c4/html5/thumbnails/55.jpg)
References
β’ Francois Chollet, βDeep Learning with Python,β Chapter 2 βMathematical Building Blocks of Neural Networksβ
β’ Santanu Pattanayak, βPro Deep Learning with TensorFlow,β Apress, 2017
β’ Machine Learning Cheat Sheet
β’ https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
β’ https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when
β’ Wikipedia