Recent Advances of Machine Learning -...

94
Recent Advances of Machine Learning Jong-June Jeon University of Seoul September 7, 2017 Jong-June Jeon Recent Advances of Machine Learning

Transcript of Recent Advances of Machine Learning -...

Page 1: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Recent Advances of Machine Learning

Jong-June Jeon

University of Seoul

September 7, 2017

Jong-June Jeon Recent Advances of Machine Learning

Page 2: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Learning based on data

Regularization method

Classification and deep learning

Jong-June Jeon Recent Advances of Machine Learning

Page 3: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Learning based on data

Jong-June Jeon Recent Advances of Machine Learning

Page 4: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Regression: Statistical method for learning the relation betweentwo more variables

Figure: Scatter plots of paired data

Jong-June Jeon Recent Advances of Machine Learning

Page 5: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Regression: Statistical method for learning the relation betweentwo more variables

Figure: Which one is better?

Jong-June Jeon Recent Advances of Machine Learning

Page 6: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Conventional interest in regression models

Detected signal is statistically significant?

If the true model is linear, then how can I measure theuncertainty of the estimated model

What is the lower bound of the asymptotic variance of theestimated models.

What is the most efficient estimation method?

How can I select the true model under large samples?

Inference !

Jong-June Jeon Recent Advances of Machine Learning

Page 7: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Prediction

If we are only interested in prediction, there is another story.

Inference is out of interest (eg p-value, R square, selectionconsistency)Under the circumstances of accumulating data, the learned(estimated) model depends on the observed data.Predictive performances should be assessed by future data.

Goal is to improve prediction accuracy !

Jong-June Jeon Recent Advances of Machine Learning

Page 8: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Change of paradigm

Past: data is scare resource. If data is observed once, themodel is estimated based on the observed data.

Now: data is not scare resource any more. Whenever data isobserved, the model is learned(estimated) based on the data.

Jong-June Jeon Recent Advances of Machine Learning

Page 9: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model

Explanatory variable : x ∈ Rp

Response variable: y ∈ RLinear predictor: y = xT β

β = (β1, · · · , βp)T ∈ Rp is called of regression coefficient.

Jong-June Jeon Recent Advances of Machine Learning

Page 10: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model: estimation

Let training set be Tr = {(yi,xi) : 1 ≤ i ≤ n}Least square method is given by the minimizer of RSS(β)where

RSS(β) =

n∑i=1

(yi − xTi β)2

This estimator is called of least square estimator(LSE)

Jong-June Jeon Recent Advances of Machine Learning

Page 11: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Figure: Estimated predictor: Y = 0.5274 + 1.0212X

Jong-June Jeon Recent Advances of Machine Learning

Page 12: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model: estimation

Generative model assumption

yi = xTi β∗ + εi

εi ∼iid (0, σ2)

If the true model is linear, it is known that LSE, β, convergesto the true parameter in probability.√n(β − β∗) weekly converges Gaussian distribution.

Jong-June Jeon Recent Advances of Machine Learning

Page 13: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model: estimation

Note that the uncertainty of β depends on the training set TrLet x0 be evaluation points then linear predictor is xT0 β andthe risk defined though l2 loss function is given byEF (Y − x0β)2 where Y ∼ F (conditional distribution givenX = x0.

Generally risk of the predicted model is given by

R(β) = EF (Y −Xβ)2

where (Y,X) ∼ F

Jong-June Jeon Recent Advances of Machine Learning

Page 14: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model:Estimation

model bias: discrepancy between true model and expectationof estimated model.

Suppose that E(Y |X = x) 6= xTi β∗j

For example,

Yi = 3 +X2i + εi where εi ∼ N(0, 1)

Note that E(Y |Xi = x) = 3 + x2

Try to fit a model in linear model space.

Jong-June Jeon Recent Advances of Machine Learning

Page 15: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model:Estimation

Figure: Blue line denotes the estimate model in the linear model space

Jong-June Jeon Recent Advances of Machine Learning

Page 16: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Learning based on algorithm

K-nearest neighbourhood algorithm

Let (yi, xi) for i = 1, · · · , n be training samples.

Nk(x): the index of k samples close to x

Predictor is given by

Y (x) =1

k

∑xi∈Nk(x)

yi

Jong-June Jeon Recent Advances of Machine Learning

Page 17: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10-NN algorithm

Y (0) =

Jong-June Jeon Recent Advances of Machine Learning

Page 18: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10-NN algorithm

Y (0) =

Jong-June Jeon Recent Advances of Machine Learning

Page 19: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10-NN algorithm

Y (0) = 2.446989

Jong-June Jeon Recent Advances of Machine Learning

Page 20: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10-NN algorithm

Y (1) = 4.024909

Jong-June Jeon Recent Advances of Machine Learning

Page 21: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10-NN algorithm

Y (−1) = 4.314436

Jong-June Jeon Recent Advances of Machine Learning

Page 22: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10-NN algorithm

Jong-June Jeon Recent Advances of Machine Learning

Page 23: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model in classification problem

Figure: x = (x1, x2)′ ∈ R2

Jong-June Jeon Recent Advances of Machine Learning

Page 24: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model in classification problem

Jong-June Jeon Recent Advances of Machine Learning

Page 25: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model in classification problem

linear predictor: y = 0.62223− 0.03565x1 − 0.05486x2

decision boundary:{(x1, x2) ∈ R2 : 0.5 = 0.62223− 0.03565x1 − 0.05486x2}

Jong-June Jeon Recent Advances of Machine Learning

Page 26: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model in classification problem

Figure: decision boundary is given by hyperplane

Jong-June Jeon Recent Advances of Machine Learning

Page 27: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model in classification problem

Jong-June Jeon Recent Advances of Machine Learning

Page 28: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

Y (x) =

{1 if 1

k

∑xi∈Nk(x)

yi ≥ 1/2

0 o.w

Jong-June Jeon Recent Advances of Machine Learning

Page 29: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

1k

∑xi∈Nk(x)

yi = Y ((0, 2)) =

Jong-June Jeon Recent Advances of Machine Learning

Page 30: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

1k

∑xi∈Nk(x)

yi = 9/11 ≥ 1/2⇒ Y ((0, 2)) = 1

Jong-June Jeon Recent Advances of Machine Learning

Page 31: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

1k

∑xi∈Nk(x)

yi = Y ((0, 0)) =

Jong-June Jeon Recent Advances of Machine Learning

Page 32: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

1k

∑xi∈Nk(x)

yi = 5/11 ≤ 1/2⇒ Y ((0, 0)) = 0

Jong-June Jeon Recent Advances of Machine Learning

Page 33: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

Non-linear decision boundary is obtained by K-NN algorithm.

Jong-June Jeon Recent Advances of Machine Learning

Page 34: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

K-NN in in classification problem

Jong-June Jeon Recent Advances of Machine Learning

Page 35: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Model complexity

The selection of K is crucial to obtain the best predictiveperformance.

Jong-June Jeon Recent Advances of Machine Learning

Page 36: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Model complexity

Jong-June Jeon Recent Advances of Machine Learning

Page 37: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Model complexity

Jong-June Jeon Recent Advances of Machine Learning

Page 38: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Model complexity

Jong-June Jeon Recent Advances of Machine Learning

Page 39: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Model complexity

Jong-June Jeon Recent Advances of Machine Learning

Page 40: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Idea of K-NN algorithm

K-NN algorithm can produce nonlinear model easily.

The idea of K-NN algorithm is local approximation.

Y (x) =1

k

∑xi∈Nk(x)

yi

For small K less data are used for estimating local mean suchthat model become complex and variance of the local meanincreases.

For large K vice versa.

Jong-June Jeon Recent Advances of Machine Learning

Page 41: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Bias-Variance Trade off

Prediction error = (Model bias)2 + Model Variance

Complex model : Large variance and small bias

Simple model: small variance and large bias

Jong-June Jeon Recent Advances of Machine Learning

Page 42: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Selection of model with moderate complexity is required to achievethe best predictive model under restricted training sample.

Jong-June Jeon Recent Advances of Machine Learning

Page 43: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Ensemble method: variance reducing method by combiningpredicted models(weak learners)

We will see examples of ensemble method applied to tree model.

Jong-June Jeon Recent Advances of Machine Learning

Page 44: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Tree model

Learning algorithm to split regions

Jong-June Jeon Recent Advances of Machine Learning

Page 45: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

data underlying function

Jong-June Jeon Recent Advances of Machine Learning

Page 46: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Single regression tree

Jong-June Jeon Recent Advances of Machine Learning

Page 47: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

10 regression trees using randomly sampled data

Jong-June Jeon Recent Advances of Machine Learning

Page 48: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

averaging tree

Jong-June Jeon Recent Advances of Machine Learning

Page 49: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Hard problem for classification tree

Jong-June Jeon Recent Advances of Machine Learning

Page 50: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

single tree

Jong-June Jeon Recent Advances of Machine Learning

Page 51: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

25 averaged tree

Jong-June Jeon Recent Advances of Machine Learning

Page 52: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

25 voted tree

Jong-June Jeon Recent Advances of Machine Learning

Page 53: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

bagging: Bootstrap Aggregating

Bootstrapping: re-sample data B times T (b)r

Aggregating

Learn model f b using T (b)r for each b = 1, · · · , B

Aggregating

Regression

fbag(x) =1

B

B∑b=1

fb(x)

Classification

fbag(x) = Voting{fb(x) : 1 ≤ b ≤ B}

Jong-June Jeon Recent Advances of Machine Learning

Page 54: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Thererical background of bagging

Variance reduction

Bias does not change

Note that “Prediction error =Bias2 + Variance”

Jong-June Jeon Recent Advances of Machine Learning

Page 55: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Linear model

Why did so many statistician and mathematician study linearmodel?

Jong-June Jeon Recent Advances of Machine Learning

Page 56: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

High dimensional linear model

Suppose that Yi = f(Zi) + εi where f is smooth function.

For sufficiently large p f(z) ' β0 +∑p

j=1 βjzj and

E(Yi|Zi) ' β0 +

p∑j=1

βjXij

where Xij = Zji .

Let Xi = (Xi1, · · · , Xip)′ ∈ Rp and Yi ∈ R

Linear model

Yi = β0 +

p∑j=1

βjXij + εi

Jong-June Jeon Recent Advances of Machine Learning

Page 57: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

High dimensional linear model linear modelEven for non-smooth function f ,

E(Yi|Xi) ' β0 +

p∑j=1

β′jBj(Xi)

by appropriate selection of basis functions Bjs.

How to control model complexity in the high dimensional linearmodel?

Jong-June Jeon Recent Advances of Machine Learning

Page 58: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Regularization method

Jong-June Jeon Recent Advances of Machine Learning

Page 59: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Statistical learning by empirical risk minimization

zi = (yi,x′i)′ ∼iid P

yi = g(xi) + εi for i = 1 · · · , n for g ∈ Gloss function : l : Z × G 7→ R+

l2 loss function: l(zi, g) = (yi − g(xi))2

risk function:

Risk function : R(g) = EP l(z, g)Empirical risk function: Rn(g) =

∑ni=1 l(zi, g)/n

Statistical learning

g = argming∈GRn(g)

Jong-June Jeon Recent Advances of Machine Learning

Page 60: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Visualization of bias-variance trade-off

Regularization method utilizes the bias-variance trade-off byrestricting model space

Jong-June Jeon Recent Advances of Machine Learning

Page 61: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Regularization

g = argming∈GRn(g)

subject to J(g) ≤ C,

where J : G 7→ R+ is a penalty (regularization) function

We can write the above optimization problem as follows:

g = argmingRn(g) + λJ(g)

for some λ ≥ 0

Jong-June Jeon Recent Advances of Machine Learning

Page 62: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

LSE

(yi,xi) for i = 1, · · · , n : pairs of response and explanatoryvariables (yi ∈ R and xi ∈ Rp)β = (β1, · · · , βp)′

yi = x′iβ + εi (εi ∼iid (0, σ2))

LSE:

β = argminβ

n∑i=1

(yi − x′iβ)2

Jong-June Jeon Recent Advances of Machine Learning

Page 63: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

LSE

LSE is the Best linear Unbiased Estimator (BLUE).

When n > p LSE is not unique.

When n ' p, the variance of LSE is large.

Jong-June Jeon Recent Advances of Machine Learning

Page 64: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Ridge estimator

βλ = argminβ

n∑i=1

(yi − x′iβ)2 + λ

p∑j=1

β2j︸ ︷︷ ︸penalty function

for λ > 0.

Ridge estimator always exists.

There exist a λ > 0 such that the ridge estimatorcorresponding to the λ has better predictive performance thanLSE.

When p > n the consistency of βλ (convergence to the trueparameter) is not guaranteed.

Jong-June Jeon Recent Advances of Machine Learning

Page 65: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

LASSO estimator

βλ = argminβ

n∑i=1

(yi − x′iβ)2 + λ

p∑j=1

|βj |︸ ︷︷ ︸penalty function

for λ > 0.

In the high dimensional problem ( p ' exp(n) ) LASSO workswell.

Under regularity condition the lasso estimator achievesminimax optimal error bound.

But strong model conditions are required for selectionconsistency.

Jong-June Jeon Recent Advances of Machine Learning

Page 66: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Regularized estimator with nonconvex penalty

βλ = argminβ

n∑i=1

(yi − x′iβ)2 +

p∑j=1

Gλ(|βj |)︸ ︷︷ ︸penalty function

for λ > 0.

When signals is large ( > O(1/√n)), the method can choose

the signal variable well (oracle property).

But strong model conditions are required for minimaxoptimality.

Jong-June Jeon Recent Advances of Machine Learning

Page 67: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Regularization method provides a useful view of machine learningapplication

Jong-June Jeon Recent Advances of Machine Learning

Page 68: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Sparsity

Form non-differential points in model constraints we knowthat there are positive probability that estimated model hasexactly zero coefficients.

By introducing various type of constraints havingnon-differential points, structural learning is possible.

Jong-June Jeon Recent Advances of Machine Learning

Page 69: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Structural sparse modeling 1

yi = µi + εi for i = 1, · · · , n.

µ = (µ1, · · · , µn):

(µ1, · · · , µn) = argminµ

n∑i=1

(yi − µi)2 + λ

n−1∑j=1

|µj+1 − µj |︸ ︷︷ ︸penalty function

Note that

λ =∞: µj = y for all j.

λ = 0: µj = yi for all j.

λ controls the number of change points.

Jong-June Jeon Recent Advances of Machine Learning

Page 70: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Signal approximatorModel:

yi = βi + εi

where εi ∼iid N(0, σ2) for i = 1, · · · , n.

Figure: Sigmal approximator

Jong-June Jeon Recent Advances of Machine Learning

Page 71: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Structural sparse modeling 2

yi = x′iβ + εi for i = 1, · · · , n.

We obtain the following estimator of

β = argminβ

n∑i=1

(yi − x′iβ)2 + λ∑j 6=k|βj − βk|

clustering of estimate coefficients.

Jong-June Jeon Recent Advances of Machine Learning

Page 72: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Map of estimated trends in extreme precipitation

Jong-June Jeon Recent Advances of Machine Learning

Page 73: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Classification and deep learning

Jong-June Jeon Recent Advances of Machine Learning

Page 74: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Binary Classification

Reponse varible: y ∈ Y = {−1, 1}Explanatory variable: x ∈ XClassification function: C : X 7→ Y

ex) Let y be a variable denotes disease or normal, and let x be avector of result of diagnosys. A doctor is a classification functionto map x to Y.

Jong-June Jeon Recent Advances of Machine Learning

Page 75: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Let P be a distribution of (y,x).

missclassification error of C on population:

P (C(x) 6= y) = P (C(x) = −1, y = 1) + P (C(x) = 1, y = −1)

Bayes error:

minCP (C(x) 6= y)

Bayes classifier:

C∗ = argminCP (C(x) 6= y)

Jong-June Jeon Recent Advances of Machine Learning

Page 76: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Note that the bayes classifier is given by

C∗(x) =

{1 if P (y = 1|x) ≥ 0.5−1 o.w.

Jong-June Jeon Recent Advances of Machine Learning

Page 77: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

We assume that (yi,xi) for i = 1, · · · , n are iid random samples.

minC

n∑i=1

#(C(xi) 6= yi)/n

The final goal is to estimate a classifier which minimizes theclassification error.

The function C is too complex such that we adopt analternative method to construct classifier.

Jong-June Jeon Recent Advances of Machine Learning

Page 78: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Scoring function

f : X → R: a function assigns a score to the observation withcovariate x. Using this score, we construct a classifier asfollowing;

C(x; f) =

{1 if f(x) ≥ 0−1 o.w.

Jong-June Jeon Recent Advances of Machine Learning

Page 79: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Estimation and surrogate loss function It is reasonable to findf minimizing

n∑i=1

I(C(xi; f) 6= yi)/n,

which is equivalently written by∑n

i=1 I(yif(xi) < 0). Hereyif(xi) is called margin.cf) I(x > 0) = 1,if x > 0, I(x > 0) = 0, otherwise.

Jong-June Jeon Recent Advances of Machine Learning

Page 80: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Convex surrogate loss functionHowever, this task requires to heavy computation so that wecannot use this method. The complexity comes frome 0-1 lossfunction. We replace the 0-1 loss function with other convex lossfunction called of the surrogate loss function.

Jong-June Jeon Recent Advances of Machine Learning

Page 81: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Surrogate loss function

Jong-June Jeon Recent Advances of Machine Learning

Page 82: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Surrogate loss functionFinally, we minimize the surrogate risk function

L(f ;φ) =

n∑i=1

φ(yif(xi))/n

The estimator in the classfication problem is given by

f = argminfL(f ;φ)

Jong-June Jeon Recent Advances of Machine Learning

Page 83: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

If the region of f(x) = 0 is similar to that of P (Y = 1|x) = 0.5,then f(x) gives an approximated bayes classifier.

C(x) =

{1 if f(x) ≥ 0.5−1 o.w

If f(x) is linear, the region f(x) = 0 is linear space. If the bayesclassifier is not linear, the model bias in classification always exists.

Jong-June Jeon Recent Advances of Machine Learning

Page 84: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Example: logistic regression

linear model:f(x) = x′β

logistic loss:φ(yf(x)) = −yf(x) + log(1 + exp(−yf(x)))

estimator: argminβ∑n

i=1 φ(yix′iβ)

The estimator is equal to Maximum likelihood estimator in thelogistic regression model.

Jong-June Jeon Recent Advances of Machine Learning

Page 85: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

How can we produce a complex model f?

Jong-June Jeon Recent Advances of Machine Learning

Page 86: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Estimation of non-linear classifier

Additive models: assume that f is a additive model of trees(stumps) or simple classifier. When φ(z) = exp(−z) and ridgepenalty (see regularization) is applied, then the classificationproblem is called adaBoost.

Feature mapping: assume that the input space are project ona feature space. When φ(z) = (1− z)+ and ridge penalty (seeregularization) is applied, then the classification problem iscalled support vector machine.

Jong-June Jeon Recent Advances of Machine Learning

Page 87: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Neural Network for classification

Composition

Input data is mapped onto linear spaceIts image is transformed by non-linear activation function(sigmoid function, tanh, RELU...)The transformed data is mapped onto linear space again.· · ·A feature of input data is obtained by above compositions.Apply the conventional classification method.

Jong-June Jeon Recent Advances of Machine Learning

Page 88: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Neural Network for classification

Figure: Visualization of neural network

Jong-June Jeon Recent Advances of Machine Learning

Page 89: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Neural Network with l2 loss for classificationHere we consider a single layer neural network without a bias termfor notational simplicity. The objective function is given by

L(β,α) =

n∑i=1

(1− yi(k∑j=1

βjσ(x′αj))2/n,

where σ is sigmoid function, β = (β1, · · · , βk) andα = (α1, · · · ,αk).

Jong-June Jeon Recent Advances of Machine Learning

Page 90: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Neural Network for classification

argminβ,αL(β) = argminβ,α

n∑i=1

(1− yik∑j=1

βjσ(x′iαj))2

Note that the score functions is given by f(x) =∑k

j=1 βjσ(x′αj)

and the surrogate loss fucntion φ(x) = (1− x)2.

Jong-June Jeon Recent Advances of Machine Learning

Page 91: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

It is known that all methods (additive models, featuremapping, composition) can achieves ideal decision boundaryas # of data goes to infinity.

Then the natural question is that ”what is more efficient wayto construct a complex model”: which methods requires lessparameters to construct a nearly optimal decision functions.

Answer is simple. It depends on the true model. But manydifficult problems are solved by deep neural network (image,video data analysis). ⇒ High order composition works ! (deeplearning)

Jong-June Jeon Recent Advances of Machine Learning

Page 92: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Deep neural network

Construct a nonlinear model by compositions

f(x) = hk ◦ hk−1 · · · ◦ h1(x)

Regularization (tune the estimated model)

Scheduling learning rateSelection of moment parameterDrop-out rate# of layer (depth)· · ·

Computational issue

# of parameter is very large (107 ∼)Develop parallel optimization algorithm.GPGPU computing

Jong-June Jeon Recent Advances of Machine Learning

Page 93: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Recent advances of machine learning answers the question,“how to efficiently construct a model space that can fittedwell for considered data.”

Three methods, additive models, high order feature mapping,compositions, are competing.

Successes in engineering fields tells that complex model spaceinduced by compositions is useful for many areas.

But regularizing the model is still crucial task to select thebest model.

Jong-June Jeon Recent Advances of Machine Learning

Page 94: Recent Advances of Machine Learning - UOSranking.uos.ac.kr/material/presentation/overview.pdfRegression: Statistical method for learning the relation between two more variables Figure:Which

Thank you

Jong-June Jeon Recent Advances of Machine Learning