Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS,...
Transcript of Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS,...
![Page 1: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/1.jpg)
Lecture 1
![Page 2: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/2.jpg)
Dynamical Systems and Machine learning
Qianxiao Li(NUS, A*STAR)
![Page 3: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/3.jpg)
Overview
Dynamical Systems
Machine Learning
Part I
Part II
![Page 4: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/4.jpg)
Syllabus
• Introduction• Part I: Dynamical Systems Approach to Deep Learning• Optimal control theory• Optimal control formulation of deep learning• Dynamics/Control-inspired training algorithms• Dynamics/Control-inspired network architectures• Mathematical results
• Part II: Data-Driven Dynamical Systems• PCA/SVD based model reduction methods (DMD, POD)• Koopman operator methods• Data driven control
![Page 5: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/5.jpg)
Other Matters
• Class participation is encouraged!• Use the chat• TA will moderate
• You can email me at [email protected]• Questions about the material• Mistakes in lecture notes• Suggestions on lecture pace and style• Research opportunities
![Page 6: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/6.jpg)
SupervisedlearningBasic formulation
N• Data : D= { Xi , Yi Yi,
• Xi C- Rd inputs•
yi ERM outputs ( labels* NEI size of data
• Data distribution : Xi'd'll .
• Target function :
←NHI)
* Deterministic : Yi = f-*Cxi )
• Stochastic : yin Ptc. ( Xi ) e.g . yin f-* Gi ) + Ei
• General : Cxiiyi ) Nfl
• Output space• Regression Yi EIR• Classification yie fi , 2, . . . .cz ⇒ IRC
in
![Page 7: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/7.jpg)
Goal of supervised learning*F*
Given D i find F- a F* ,I
E
Learn with risk minimization M
•
seeing .IM:9;"::*.am , dgoal is to And E C-Htt . Iff
"- Ell is small .
• Define a loss function⇒ I IRM x 112mm -7 IR .
• Define the empirical risk ✓Yi ' F''
Gi)
Kempff) = # II, Cffxi ) , y!)• Empirical risk minimization
F- ← arginine Kemp
![Page 8: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/8.jpg)
• Loss function• mean - square boss
Ily , y' ) - Elly- y'll'
• zero - one toes
yfy'
Ely , y'
) = yay' ' { I y - y
'
• Population (Expected Risk Nhnization CPRMD
¥912 E×nµ CFGDIFTXD ( deterministic)-
N y
F ←FEEL Easy,nµECRx) , y) general )
If µ is empirical measure
a- * Eisa;⇒, fam .
![Page 9: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/9.jpg)
Three paradigms of Supervised LearningA lineation
←
¥""
H p*•
"""""
![Page 10: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/10.jpg)
Examplechinearllodels
Regression Yf IR
Linear model hypothesis space
H - { f : FCX) = wjoljcx) , Wjfk, Jao, e , . My }E Basis fois , feature maps .
Olj : Ird → R .
Examples of Basis function CIH)
• Go a 1 , 01,47 - X ⇒ FAK wot Wix .
• Polynomial : Oljfx) = XJ
• RBF (Gaussian : Ofjcx) = expC- ztsjz Cx- Mj))
![Page 11: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/11.jpg)
• Empirical Risk Minimization✓If Ffxi )- Yi )'
Kemp (Wo , . . - swim- D= Tv.IE?ECHxi7iyi)=IqIECIIj'wioljcxi3-yi)&-
FG)
Matrix formReupcw) = INHI w - y 112
W ' In,) ERM , y
' ERN, ijnojcxi )fRN×M
.
Mum Reuplw) → I
![Page 12: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/12.jpg)
0rdinaryheastsquaresformuk.supp.beITI is invertible , then
✓ = E' E)'' E' y .
ProstKemp Cw) = IN KEW - TR
TFwRempcw) a #FLEW -y)Set FwRemplwn) - O
⇒ E) I = E' y .
if = E'ET' E' y .
. I TERM"?
If M>N , ITI NA invertible . ( I = Itg)consider regularized problem
mum IHIw - yr -17kWh'
⇒ D= It I'E)''
E'y .
70)
![Page 13: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/13.jpg)
f-*
: IRD → IRM'
G → group of transformations. gcx) group action .
=
F'Cgcx) ) = f-*
Cx) (invariant)f
(equivariant ) F' Cgex) ) a @ ( Eaa)t
'' "" "rn.
RppCE ) - Kemp CE ) s CCE)NT.
![Page 14: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/14.jpg)
Neurallvetworkltypothesisspacefeiy= { F : Faa ' 772
,Vjocwjxtbj ) , wjfrdgvj.bjflpyjy.nu}-
ol;
•
"
adaptive basis model"
• O is the activation functionsea,
O' IR → IR .
leaky -Defy#
Examples :--
- o
• Retell (Rectified linear unit ) NZ) '- Max Z)• sigmoid
gcz, = ,+¥ o• tanh Mxd .
• Matrix form of IR
Fader" Cwxtb) Corelli a Oki)
![Page 15: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/15.jpg)
UniversalftpproxinationtheoremtetKC Rd be compact ; F* : K -3 IR be continuous .
Assume o : 112-3 IR is sigmoidal , i.e .it is continuous
andzliyh OCH - o ¥1,7, NZ' ' I ✓
Then , t ETO l I f C- UmaHM Sit .
HE - f Heck, = ¥42 I FED- Faa l E E .
"±
![Page 16: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/16.jpg)
Optimizengtraimnglvlv → Euclidean space .
• H = { Fo i O C- ⑦ Y• ERM none R c- FEL RCH
• If R is C'
necessary condition for optimality is
To RTO3=0
• Gradient descent .
Oka , = Ok - Y FRI OK) K- o, I , - - -
I 970 is the learning rate .
suppose Oe -7 Ox
¥5-47k¥TO
![Page 17: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/17.jpg)
• Local vs Global Minima .
0*8 is a load minimum of R if
I 870 Sit .
RIOTS 12107 A 110-0*1118 .
.- . .
. .
global - -- -
Rio) SRCO) H O C- ⑦
peg
The notion of local (global mining④ ' IR . depends on
Q - aid•
. ¥ ,local
gaC '
s.
I got localT global
^
![Page 18: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/18.jpg)
Deeplveurallvetworkstrl= { F : F a VTXCK ) , VEIRDK
where xckai) - OCWCKIXCKIT blk))
4=0, I , n - -, K- l
X = X - (input 7
WCKIEIRDK" xd" , bug girdle't yXtiotwloxtblos) Xtofwltxtb u
"
Ii! → ff→xas→L7→xn, -in - - → xckty -
-K layers
![Page 19: Lecture 1 - math.pku.edu.cnLecture 1. Dynamical Systems and Machine learning QianxiaoLi (NUS, A*STAR) Overview Dynamical Systems Machine Learning Part I Part II. Syllabus •Introduction](https://reader033.fdocuments.us/reader033/viewer/2022050506/5f97f7107761d7494566d764/html5/thumbnails/19.jpg)
Residual
XCk ti ) = XCk) T OC WCKIXCK) t lock ) )
I '
tf-Ce, x . a
ofwxtb
Generally , we can unite .
✓hidden state← weights )
Mkt 1) = XCK) + f- ( K , XIN , Oga, )trainable parvus.
I layer id .
Ka O , I , 21 - -'
l K- I
f : # x Rdx → IRA'l # of layers or depth