Statistical learning intro

47
Introduction to Machine/Statistical Learning [email protected] Taipei Hackerspace, 2014.9.20

Transcript of Statistical learning intro

Page 1: Statistical learning intro

Introduction toMachine/Statistical Learning

[email protected] Hackerspace, 2014.9.20

Page 2: Statistical learning intro

The purpose of this talk

• Not to develop robust understanding of ML algorithms nor to derive them

• But to provide sufficient basis to do applied predictive modeling

• Our goal is to do prediction modeling, building accurate models by utilizing statistical principles, feature engineering, model tuning, applying appropriate ML and do error analysis

Page 3: Statistical learning intro

Preliminary outline

• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning

• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it

• The Single Algorithm Hypothesis & Deep Learning

Page 4: Statistical learning intro
Page 5: Statistical learning intro

Ex. Models for Explanation

Wong, P. T. P. (2014). Viktor Frankl’s meaning seeking model and positive psychology.

Page 6: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 7: Statistical learning intro

Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning

Page 8: Statistical learning intro

Receptive field in Humans

Page 9: Statistical learning intro

Preliminary outline

• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning

• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it

• The Single Algorithm Hypothesis & Deep Learning

Page 10: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 11: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 12: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 13: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 14: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 15: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 16: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 17: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 18: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 19: Statistical learning intro

Independent VariablesPredictors

Features

Dependent VariablesResponses

Page 20: Statistical learning intro

Preliminary outline

• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning

• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it

• The Single Algorithm Hypothesis & Deep Learning

Page 21: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 22: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 23: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 24: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 25: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 26: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 27: Statistical learning intro

To recap: some definitions

• Variance – the amount which the prediction would change if we

estimated it using a different training data set• Bias– the error that is introduced by approximating a real-

life problem– more flexible methods result in less bias, but more

variance• Flexibility = degrees of freedom ~ Complexity– Can be modified by regularization parameter– or increase/reduce number of features

Page 28: Statistical learning intro

Study design – training/test sets

An Introduction to Statistical Learning, Ch 5 Resampling Methods

Page 29: Statistical learning intro

In practice – training/CV/test set

• Training set– used to fit the models

• Validation set – used to estimate prediction error for model selection

• Test set – used for assessment of the generalization error of the final chosen

model.

The Elements of Statistical Learning ch7. Model Assessment and Selection

Page 30: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

參數來源 θ (x(i), y(i))

Training error Training set Training set

CV error Training set CV set

Page 31: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 32: Statistical learning intro

Coursera Course, Machine learning by Andrew Ng

Page 33: Statistical learning intro

The Bias-Variance Trade-Off

An Introduction to Statistical Learning, Ch 5 Resampling Methods

Page 34: Statistical learning intro

Cross validation – single split

An Introduction to Statistical Learning, Ch 5 Resampling Methods

Page 35: Statistical learning intro

Cross validation – n = 10 folds

An Introduction to Statistical Learning, Ch 5 Resampling Methods

Page 36: Statistical learning intro

K-fold Cross validation ensures better estimation of test error

Page 37: Statistical learning intro

Compare these two CV methods, what’s different and what’s wrong ?

1. Screen the predictors– find a subset of “good”

predictors that show fairly strong (univariate) correlation with the class labels

2. Build a multivariate classifier– Using just this subset of

predictors3. Apply cross-validation

– to estimate the unknown tuning parameters and to estimate the prediction error of the final model.

1. Divide the samples into K cross-validation folds (groups) at random

2. For each fold k = 1,2,...,Ka. Find a subset of “good”

predictors that show fairly strong (univariate) correlation with the class labels, using all of the samples except those in fold k.

b. Using just this subset of predictors, build a multivariate classifier, using all of the samples except those in fold k.

c. Use the classifier to predict the class labels for the samples in fold k.

Page 38: Statistical learning intro

The predictors chosen by the left method have an unfair advantage

• they were chosen in step (1) on the basis of all of the samples.

• Leaving samples out after the variables have been selected does not correctly mimic the application of the classifier to a completely independent test set

• these predictors “have already seen” the left out samples.

The Elements of Statistical Learning ch7. Model Assessment and Selection

Page 39: Statistical learning intro

Recap principles from Statistics – K-fold CV is a form of random sampling

Coursera Course, Data Analysis and Statistical Inference by Dr. Mine Çetinkaya-Rundel

Page 40: Statistical learning intro

ML algorithm performance is dependent on the underlying data

An Introduction to Statistical Learning, Ch 8 Tree methods

Page 41: Statistical learning intro

More issues to be covered in next talk

• Remedies for Severe Class Imbalance• Measuring Predictor Importance• Factors That Can Affect Model Performance

Page 42: Statistical learning intro

Preliminary outline

• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning

• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it

• The Single Algorithm Hypothesis & Deep Learning

Page 43: Statistical learning intro

Back then, the prevailing wisdom

• MIT's Marvin Minsky - a "Society of Mind”– To achieve AI, it was believed, engineers would

have to build and combine thousands of individual computing units or agents.

– One group of agents, or module, would handle vision, another language, and so on…

Page 44: Statistical learning intro

The Single Algorithm Hypothesis

• Human intelligence stems from a single learning algorithm– In 1978 paper by Vernon Mountcastle: An Organizing

Principle for Cerebral Function – Jeff Hawkins “Memory-prediction framework”

• Origin– Neuroplasticity during brain development– Potential of other cortical areas to cover previous lost

function after brain injury (eg. stroke)

Page 45: Statistical learning intro

Deep Learning - 1• Single Algorithm– neural networks to mimic human brain behavior• A basic layer of artificial neurons that can detect simple

things like the edges of a particular shape• The next layer could then piece together these edges

to identify the larger shape• Then the shapes could be strung together to

understand an object

• Key: the software does all this on its own– give the system a lot of data, so it can discover by

itself what some of the concepts in the world are

The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, Wired

Page 46: Statistical learning intro

Deep Learning - 2• This approach is inspired by how scientists believe that

humans learn. – The algorithm didn’t know the word “cat” — Ng had to

supply that — but over time, it learned to identify the furry creatures we know as cats, all on its own.

– As babies, we watch our environments and start to understand the structure of objects we encounter, but until a parent tells us what it is, we can’t put a name to it.

• Building High-level Features Using Large Scale Unsupervised Learning

The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, WiredBuilding High-level Features Using Large Scale Unsupervised Learning, QV Le, et al

Page 47: Statistical learning intro

References

Stanford Andrew Ng course