Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning...

32
Deep Learning: A Statistical Perspective Myunghee Cho Paik Guest lectures by Gisoo Kim, Yongchan Kwon, Young-geun Kim, Minjin Kim and Wonyoung Kim Seoul National University September-December, 2019 Seoul National University Deep Learning September-December, 2019 1 / 105

Transcript of Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning...

Page 1: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Deep Learning: A Statistical Perspective

Myunghee Cho PaikGuest lectures by Gisoo Kim, Yongchan Kwon, Young-geun Kim, Minjin Kim

and Wonyoung Kim

Seoul National University

September-December, 2019

Seoul National University Deep Learning September-December, 2019 1 / 105

Page 2: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Introduction

Seoul National University Deep Learning September-December, 2019 2 / 105

Page 3: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

In 2009

August 5, 2009

For Today’s Graduates, Just One Word: Statistics“....the sexy job in the next 10 years will be statisticians”

Seoul National University Deep Learning September-December, 2019 3 / 105

Page 4: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Attractive component of statistics

Seoul National University Deep Learning September-December, 2019 4 / 105

Page 5: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Deep impact

Computer vision, speech recognition, natural language processing..

Alpha go, Face recognition, autonomous driving, AI assistant...

Combining domains...

Figures from Kiros, Salakhutdinov, and Zemel (2014)

Seoul National University Deep Learning September-December, 2019 5 / 105

Page 6: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Seoul National University Deep Learning September-December, 2019 6 / 105

Page 7: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Deep impact on education

Medical school: Hinton said: “People should stop training radiologistsnow, it’s just completely obvious that in five years deep learning isgoing to do better than radiologists, it might be ten years”.

Should we learn foreign languages?

Seoul National University Deep Learning September-December, 2019 7 / 105

Page 8: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Deep impact continues

After training Latex files of an algebraic geometry textbook

Disruption continues

Seoul National University Deep Learning September-December, 2019 8 / 105

Page 9: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Deep learning and statistical community

In a nutshell, a deep neural network model has a form:

E (y |x ; θ) = fk(· · · f3(f2(f1(x ; θ1); θ2); θ3) · · · ; θk).

Is deep learning statistics?

Should it be taught in the department of statistics?

Whether it is statistics or not, deep learning is changing the way ofliving and this disruption is likely to continue.

Seoul National University Deep Learning September-December, 2019 9 / 105

Page 10: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Deep learning and machine learning

All machine learning can be deep learning if the involved model is deep(compositional).

Table: Examples of deep vs. machine learning

ML DL

Supervised Logistic regression E (y |x ; θ) =DNN

Unsupervised Dimension reduction g(x) =DNN

Reinforced Contexual Bandit E (r |x ; θ) =DNN

Seoul National University Deep Learning September-December, 2019 10 / 105

Page 11: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Deep learning: Theory vs. Practice

‘Deep deep trouble’ by Michael Eladhttps://sinews.siam.org/Details-Page/deep-deep-trouble-4

At NIPS 2017, Ali Rahimi’s speech: Deep learning is alchemy.

Le Cun’s facebook response: ‘In the history of science and technology,the engineering artifacts have almost always preceded the theoreticalunderstanding: the lens and the telescope preceded optic theory, thesteam engine preceded thermodynamics......, the computer precededcomputer science. Why? Because theorists will spontaneously study‘simple’ phenomena, and will not be enticed to study complex oneuntil there a practical importance to it’

Donoho in his lecture: Polarization of the rationalists (Decartes,Spinoza, Leibniz) vs. the empiricalists (Locke, Berkeley, Hume). Callsfor engagement of theorist in experimental studies, urges hibridizationof theory and empirical science to feed each other.

Seoul National University Deep Learning September-December, 2019 11 / 105

Page 12: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Course

We cover some theoretical elements of machine learning topics ofdeep learning in supervised and upsupervised learning.

We cover a branch of reinforcement learning, multi-armed bandit (notnecessarily deep learning).

We do not cover computational issues such as parallel computing.

Intro to python and frameworks (pytorch and tensorflow) will beprovided to help get started.

Seoul National University Deep Learning September-December, 2019 12 / 105

Page 13: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Course

The goal of the course is to (i) learn deep neural network (DNN) and(ii) solve a real-life problem including implementation (iii) gainunderstanding and explanation why some of deep learning techniqueswork.

Evaluation will be based on closed-book written exam and finalprojects.

Final projects could include (i) implementation of a real data example(ii) numerical experiments or theoretical exposition to gainexplanations in unknown aspects of architecture or optimization (iii)critical literature review of a topic.

One- or two-paged written proposal will be submitted presented toevaluate relevance to the course.

Final exam constitutes a publication-worthy paper and presentationwith explicit explanation of contribution of each member in a team.

Seoul National University Deep Learning September-December, 2019 13 / 105

Page 14: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Contents

Introduction

Pre-deep learning: Separate Feature extraction and Discrimination

Components of well-established machine learning tools (SVM, RKHS)

Brief overview of learning theory (concentration of measure, modelcomplexity)

NN, multi-layer perceptron, backprop

CNN

Optimization and regularization

Visualization/Understanding

Python/Deep learning framework

RNN

Detection/Segmentation/Natural Language Processing

Seoul National University Deep Learning September-December, 2019 14 / 105

Page 15: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Contents-continued

Generative models: Autoencoder, Variational autoencoder, Generativeadversarial network

Variational inference

Wasserstein distance

Multi-armed Bandit, Contextual Bandit

Seoul National University Deep Learning September-December, 2019 15 / 105

Page 16: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Resources

Lectures: Stanford: CS231 “Computer vision”; CS224 “NaturalLanguage Processing”STAT385 “Theories of Deep Learning” (https://stats385.github.io/)STAT311 “Information Theory and Statistics”(http://web.stanford.edu/class/stats311/lecture-notes.pdf)HKUST “Deep Learning: Towards Deeper Understanding”(https://deeplearning-math.github.io/)Berkeley SAT212b “Topics course on Deep Learning”(http://joanbruna.github.io/stat212b/)UCLA CS269 “Foundations of Deep Learning”(https://uclaml.github.io/CS269-Winter2019/ )

Seoul National University Deep Learning September-December, 2019 16 / 105

Page 17: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Introduction

Resources

Books: Deep Learning by Ian Goodfellow and Yoshua Bengio andAaron Courville. www.deeplearningbook.org/Understanding machine learning: From theory to algorithms by ShaiShalev-Shwartz, and Shai Ben-David. Cambridge University Press,2014Pattern Recognition and Machine Learning by Christopher Bishop.Springer, 2006.

Image data: Name (input dimension, training, test data seize):MNIST (28x28, 60K, 10K), CIFAR10 (32x32, 50K, 10K), CIFAR100,ImageNet (256x256 after preprocessing, 14million)

Seoul National University Deep Learning September-December, 2019 17 / 105

Page 18: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

A Brief History of Deep Learning

Seoul National University Deep Learning September-December, 2019 18 / 105

Page 19: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

What triggered popularity of deep learning

It will take a story to explain ‘what’ but it is simple to pinpoint‘when’: Turning point is ImageNet competition 2012.

Some people say deep learning made ‘paradigm shift’.

A break-through model? no. New theory? no. Then what?

Seoul National University Deep Learning September-December, 2019 19 / 105

Page 20: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

Biology of vision

Hubel and Wiesel (1959) inserted a microelectrode into the primaryvisual cortex of an anesthetized cat and found that some neuronsfired rapidly when presented with lines at one angle, while othersresponded best to another angle. Some of these neurons responded tolight patterns and dark patterns differently. Hubel and Wiesel calledthese neurons simple cells. Still other neurons, which they termedcomplex cells, detected edges and could preferentially detect motionin certain directions.

Seoul National University Deep Learning September-December, 2019 20 / 105

Page 21: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

History: Emergence

1962 Frank Rosenblatt, 1969 Minsky & Papert: Perceptrons

1980 Fukushima proposed Neurocognitron, a hierarchical,multilayered artificial neural network

Seoul National University Deep Learning September-December, 2019 21 / 105

Page 22: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

History: Emergence

1986 Rumelhart, Hinton, & Williams, Back propagation

1988 LeCun, Bottou, Bengio, Haffner: LeNet-5: A deep neuralnetwork with stochastic gradient descent by Robins and Monroe(1951)

Seoul National University Deep Learning September-December, 2019 22 / 105

Page 23: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

History: Decline and resurgence

1990-2005: Bayesian approaches for neural nets were attempted.Support Vector Machines emerged. Except handful of researchers likeHinton, LeCun, Bengio, people left neural nets. No presence in NeuralInformation Processing Systems (NeurIPS).

2005-2010: Attempts to resurrect neural nets with unsupervisedpretraining, alternative learning method

2012-present Most of the alternative techniques discarded and neuralnets in 1980 are revived with more data, computing power and sometricks under the new brand ‘deep learning’

Seoul National University Deep Learning September-December, 2019 23 / 105

Page 24: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

Massive data since 2010

Emergence of smartphone photos

Generation of huge data by smartphone, sensors

Cloud storage of data and photos

Fei-Fei Li organizes ImageNet: 14 million+ labeled images, 21,000+classes; Labeling took 1+ year of human labor. Starts ImageNetcompetition.

Seoul National University Deep Learning September-December, 2019 24 / 105

Page 25: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

Computing power

Popularity of game and emergence of GPU propelled computationalpower

Seoul National University Deep Learning September-December, 2019 25 / 105

Page 26: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

A Brief History of Deep Learning

Decades old neural network meets massive data andcomputing power

Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton win theImageNet 2012 by a big margin.

Since then, deep learning reign has begun.

Seoul National University Deep Learning September-December, 2019 26 / 105

Page 27: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Practical success, limited theoretical understanding

Practical success, limited theoretical understanding

Seoul National University Deep Learning September-December, 2019 27 / 105

Page 28: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Practical success, limited theoretical understanding

Deep learning culture and practical success

Common task framework, competition using publically availabledataset, became popular: Netflix challenge (2009), Kaggle (2010)

Competitions became important medium to foster development ofdeep learning.

Models winning competitions are not necessarily better models.

Practical success but little understanding of unreasonableeffectiveness.

Even in practical levels, only point estimation is addressed. Noinferential tools.

Seoul National University Deep Learning September-December, 2019 28 / 105

Page 29: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Practical success, limited theoretical understanding

Theoretical challenges

Recall E (y |x , θ) = fk(· · · f3(f2(f1(x ; θ1); θ2); θ3) · · · ; θk).

Models are highdimensional.

Requires non-convex optimization. Should deal with local minima,saddle points.

Black-box model, no interpretation

Seoul National University Deep Learning September-December, 2019 29 / 105

Page 30: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Practical success, limited theoretical understanding

Universal approximation

Theorem: Let ρ(.) be a bounded, non-constant continuous function.Let Im denote the m-dimensional hypercube, and C (Im) denote thespace of continuous functions on Im. Given any f ∈ C (Im) and ε > 0,there exists N > 0 and vi ,wi , bi , i = 1, · · · ,N such that

F (x) =∑i≤N

viρ(wTi x + bi )

satisfies supx∈Im |f (x)− F (x)| < ε.

It guarantees that even a single hidden-layer network can representany classification problem in which the boundary is locally linear.

Seoul National University Deep Learning September-December, 2019 30 / 105

Page 31: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Practical success, limited theoretical understanding

Mysteries of unreasonable effectiveness of DL applications

CNN models appear to capture high level image properties moreeffectively than previous models.

Perform well in various applications: beating the curse ofdimensionality?

Which architectural components are responsible for this effectivenessmathematically?

Which optimization tools are responsible for effective learning?(overcoming problems of local minima or saddle points. Equivalenceof local solutions? role of stochastic optimization? role ofnormalization?)

Universal approximation theorem does not explain why a deep modelperforms better in practice nor how it relate to the optimization.

Seoul National University Deep Learning September-December, 2019 31 / 105

Page 32: Deep Learning: A Statistical Perspectivestat.snu.ac.kr/mcp/lecture_1.pdf · All machine learning can be deep learning if the involved model is deep (compositional). Table:Examples

Practical success, limited theoretical understanding

Areas of theoretical studies related to DL

Why and when deep networks can lessen or break the curse ofdimensionality?

Approximation theory: Expressive power of neural networks

Optimization: Characterize the landscapes of loss function

Statistical learning: Generalization errors

Seoul National University Deep Learning September-December, 2019 32 / 105