Executing Deep Learning Strategies Masterclass Preview - Enterprise Deep Learning
Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning...
Transcript of Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning...
Deep Learning: A Statistical Perspective
Myunghee Cho PaikGuest lectures by Gisoo Kim, Yongchan Kwon, Young-geun Kim, Minjin Kim
and Wonyoung Kim
Seoul National University
September-December, 2019
Seoul National University Deep Learning September-December, 2019 1 / 105
Introduction
Introduction
Seoul National University Deep Learning September-December, 2019 2 / 105
Introduction
In 2009
August 5, 2009
For Today’s Graduates, Just One Word: Statistics“....the sexy job in the next 10 years will be statisticians”
Seoul National University Deep Learning September-December, 2019 3 / 105
Introduction
Attractive component of statistics
Seoul National University Deep Learning September-December, 2019 4 / 105
Introduction
Deep impact
Computer vision, speech recognition, natural language processing..
Alpha go, Face recognition, autonomous driving, AI assistant...
Combining domains...
Figures from Kiros, Salakhutdinov, and Zemel (2014)
Seoul National University Deep Learning September-December, 2019 5 / 105
Introduction
Seoul National University Deep Learning September-December, 2019 6 / 105
Introduction
Deep impact on education
Medical school: Hinton said: “People should stop training radiologistsnow, it’s just completely obvious that in five years deep learning isgoing to do better than radiologists, it might be ten years”.
Should we learn foreign languages?
Seoul National University Deep Learning September-December, 2019 7 / 105
Introduction
Deep impact continues
After training Latex files of an algebraic geometry textbook
Disruption continues
Seoul National University Deep Learning September-December, 2019 8 / 105
Introduction
Deep learning and statistical community
In a nutshell, a deep neural network model has a form:
E (y |x ; θ) = fk(· · · f3(f2(f1(x ; θ1); θ2); θ3) · · · ; θk).
Is deep learning statistics?
Should it be taught in the department of statistics?
Whether it is statistics or not, deep learning is changing the way ofliving and this disruption is likely to continue.
Seoul National University Deep Learning September-December, 2019 9 / 105
Introduction
Deep learning and machine learning
All machine learning can be deep learning if the involved model is deep(compositional).
Table: Examples of deep vs. machine learning
ML DL
Supervised Logistic regression E (y |x ; θ) =DNN
Unsupervised Dimension reduction g(x) =DNN
Reinforced Contexual Bandit E (r |x ; θ) =DNN
Seoul National University Deep Learning September-December, 2019 10 / 105
Introduction
Deep learning: Theory vs. Practice
‘Deep deep trouble’ by Michael Eladhttps://sinews.siam.org/Details-Page/deep-deep-trouble-4
At NIPS 2017, Ali Rahimi’s speech: Deep learning is alchemy.
Le Cun’s facebook response: ‘In the history of science and technology,the engineering artifacts have almost always preceded the theoreticalunderstanding: the lens and the telescope preceded optic theory, thesteam engine preceded thermodynamics......, the computer precededcomputer science. Why? Because theorists will spontaneously study‘simple’ phenomena, and will not be enticed to study complex oneuntil there a practical importance to it’
Donoho in his lecture: Polarization of the rationalists (Decartes,Spinoza, Leibniz) vs. the empiricalists (Locke, Berkeley, Hume). Callsfor engagement of theorist in experimental studies, urges hibridizationof theory and empirical science to feed each other.
Seoul National University Deep Learning September-December, 2019 11 / 105
Introduction
Course
We cover some theoretical elements of machine learning topics ofdeep learning in supervised and upsupervised learning.
We cover a branch of reinforcement learning, multi-armed bandit (notnecessarily deep learning).
We do not cover computational issues such as parallel computing.
Intro to python and frameworks (pytorch and tensorflow) will beprovided to help get started.
Seoul National University Deep Learning September-December, 2019 12 / 105
Introduction
Course
The goal of the course is to (i) learn deep neural network (DNN) and(ii) solve a real-life problem including implementation (iii) gainunderstanding and explanation why some of deep learning techniqueswork.
Evaluation will be based on closed-book written exam and finalprojects.
Final projects could include (i) implementation of a real data example(ii) numerical experiments or theoretical exposition to gainexplanations in unknown aspects of architecture or optimization (iii)critical literature review of a topic.
One- or two-paged written proposal will be submitted presented toevaluate relevance to the course.
Final exam constitutes a publication-worthy paper and presentationwith explicit explanation of contribution of each member in a team.
Seoul National University Deep Learning September-December, 2019 13 / 105
Introduction
Contents
Introduction
Pre-deep learning: Separate Feature extraction and Discrimination
Components of well-established machine learning tools (SVM, RKHS)
Brief overview of learning theory (concentration of measure, modelcomplexity)
NN, multi-layer perceptron, backprop
CNN
Optimization and regularization
Visualization/Understanding
Python/Deep learning framework
RNN
Detection/Segmentation/Natural Language Processing
Seoul National University Deep Learning September-December, 2019 14 / 105
Introduction
Contents-continued
Generative models: Autoencoder, Variational autoencoder, Generativeadversarial network
Variational inference
Wasserstein distance
Multi-armed Bandit, Contextual Bandit
Seoul National University Deep Learning September-December, 2019 15 / 105
Introduction
Resources
Lectures: Stanford: CS231 “Computer vision”; CS224 “NaturalLanguage Processing”STAT385 “Theories of Deep Learning” (https://stats385.github.io/)STAT311 “Information Theory and Statistics”(http://web.stanford.edu/class/stats311/lecture-notes.pdf)HKUST “Deep Learning: Towards Deeper Understanding”(https://deeplearning-math.github.io/)Berkeley SAT212b “Topics course on Deep Learning”(http://joanbruna.github.io/stat212b/)UCLA CS269 “Foundations of Deep Learning”(https://uclaml.github.io/CS269-Winter2019/ )
Seoul National University Deep Learning September-December, 2019 16 / 105
Introduction
Resources
Books: Deep Learning by Ian Goodfellow and Yoshua Bengio andAaron Courville. www.deeplearningbook.org/Understanding machine learning: From theory to algorithms by ShaiShalev-Shwartz, and Shai Ben-David. Cambridge University Press,2014Pattern Recognition and Machine Learning by Christopher Bishop.Springer, 2006.
Image data: Name (input dimension, training, test data seize):MNIST (28x28, 60K, 10K), CIFAR10 (32x32, 50K, 10K), CIFAR100,ImageNet (256x256 after preprocessing, 14million)
Seoul National University Deep Learning September-December, 2019 17 / 105
A Brief History of Deep Learning
A Brief History of Deep Learning
Seoul National University Deep Learning September-December, 2019 18 / 105
A Brief History of Deep Learning
What triggered popularity of deep learning
It will take a story to explain ‘what’ but it is simple to pinpoint‘when’: Turning point is ImageNet competition 2012.
Some people say deep learning made ‘paradigm shift’.
A break-through model? no. New theory? no. Then what?
Seoul National University Deep Learning September-December, 2019 19 / 105
A Brief History of Deep Learning
Biology of vision
Hubel and Wiesel (1959) inserted a microelectrode into the primaryvisual cortex of an anesthetized cat and found that some neuronsfired rapidly when presented with lines at one angle, while othersresponded best to another angle. Some of these neurons responded tolight patterns and dark patterns differently. Hubel and Wiesel calledthese neurons simple cells. Still other neurons, which they termedcomplex cells, detected edges and could preferentially detect motionin certain directions.
Seoul National University Deep Learning September-December, 2019 20 / 105
A Brief History of Deep Learning
History: Emergence
1962 Frank Rosenblatt, 1969 Minsky & Papert: Perceptrons
1980 Fukushima proposed Neurocognitron, a hierarchical,multilayered artificial neural network
Seoul National University Deep Learning September-December, 2019 21 / 105
A Brief History of Deep Learning
History: Emergence
1986 Rumelhart, Hinton, & Williams, Back propagation
1988 LeCun, Bottou, Bengio, Haffner: LeNet-5: A deep neuralnetwork with stochastic gradient descent by Robins and Monroe(1951)
Seoul National University Deep Learning September-December, 2019 22 / 105
A Brief History of Deep Learning
History: Decline and resurgence
1990-2005: Bayesian approaches for neural nets were attempted.Support Vector Machines emerged. Except handful of researchers likeHinton, LeCun, Bengio, people left neural nets. No presence in NeuralInformation Processing Systems (NeurIPS).
2005-2010: Attempts to resurrect neural nets with unsupervisedpretraining, alternative learning method
2012-present Most of the alternative techniques discarded and neuralnets in 1980 are revived with more data, computing power and sometricks under the new brand ‘deep learning’
Seoul National University Deep Learning September-December, 2019 23 / 105
A Brief History of Deep Learning
Massive data since 2010
Emergence of smartphone photos
Generation of huge data by smartphone, sensors
Cloud storage of data and photos
Fei-Fei Li organizes ImageNet: 14 million+ labeled images, 21,000+classes; Labeling took 1+ year of human labor. Starts ImageNetcompetition.
Seoul National University Deep Learning September-December, 2019 24 / 105
A Brief History of Deep Learning
Computing power
Popularity of game and emergence of GPU propelled computationalpower
Seoul National University Deep Learning September-December, 2019 25 / 105
A Brief History of Deep Learning
Decades old neural network meets massive data andcomputing power
Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton win theImageNet 2012 by a big margin.
Since then, deep learning reign has begun.
Seoul National University Deep Learning September-December, 2019 26 / 105
Practical success, limited theoretical understanding
Practical success, limited theoretical understanding
Seoul National University Deep Learning September-December, 2019 27 / 105
Practical success, limited theoretical understanding
Deep learning culture and practical success
Common task framework, competition using publically availabledataset, became popular: Netflix challenge (2009), Kaggle (2010)
Competitions became important medium to foster development ofdeep learning.
Models winning competitions are not necessarily better models.
Practical success but little understanding of unreasonableeffectiveness.
Even in practical levels, only point estimation is addressed. Noinferential tools.
Seoul National University Deep Learning September-December, 2019 28 / 105
Practical success, limited theoretical understanding
Theoretical challenges
Recall E (y |x , θ) = fk(· · · f3(f2(f1(x ; θ1); θ2); θ3) · · · ; θk).
Models are highdimensional.
Requires non-convex optimization. Should deal with local minima,saddle points.
Black-box model, no interpretation
Seoul National University Deep Learning September-December, 2019 29 / 105
Practical success, limited theoretical understanding
Universal approximation
Theorem: Let ρ(.) be a bounded, non-constant continuous function.Let Im denote the m-dimensional hypercube, and C (Im) denote thespace of continuous functions on Im. Given any f ∈ C (Im) and ε > 0,there exists N > 0 and vi ,wi , bi , i = 1, · · · ,N such that
F (x) =∑i≤N
viρ(wTi x + bi )
satisfies supx∈Im |f (x)− F (x)| < ε.
It guarantees that even a single hidden-layer network can representany classification problem in which the boundary is locally linear.
Seoul National University Deep Learning September-December, 2019 30 / 105
Practical success, limited theoretical understanding
Mysteries of unreasonable effectiveness of DL applications
CNN models appear to capture high level image properties moreeffectively than previous models.
Perform well in various applications: beating the curse ofdimensionality?
Which architectural components are responsible for this effectivenessmathematically?
Which optimization tools are responsible for effective learning?(overcoming problems of local minima or saddle points. Equivalenceof local solutions? role of stochastic optimization? role ofnormalization?)
Universal approximation theorem does not explain why a deep modelperforms better in practice nor how it relate to the optimization.
Seoul National University Deep Learning September-December, 2019 31 / 105
Practical success, limited theoretical understanding
Areas of theoretical studies related to DL
Why and when deep networks can lessen or break the curse ofdimensionality?
Approximation theory: Expressive power of neural networks
Optimization: Characterize the landscapes of loss function
Statistical learning: Generalization errors
Seoul National University Deep Learning September-December, 2019 32 / 105