Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced...

Advanced Lecture on

Neural Information Processing Systems

(Lecture 03)

Ichiro Takeuchi

Nagoya Institute of Technology

Ichiro Takeuchi, Nagoya Institute of Technology 1/1

Nonlinear modeling

Consider training a model for relationship between elapsedtime after collision (x) and passenger’s head acceleration (y)

Nonlinear modeling

Linear modeling is not helpful here

We want something like this

0 10 20 30 40 50 60

Time[ms]

Which nonlinear model should we use?

Consider a single input case x ∈ R, y ∈ R▶ y = w1 log x

▶ y = w1

√x+ w2 exp(−x2)

▶ y = w1 cos 2πx+ w2 sin 2πx2 + w3

▶ y = log(w1 + w2x)

▶ y = w1+xexp(−w2x2)

▶ y = sin 2π(w1 + w2x) + cos 2π(w3 + w4x)

What’s the difference between the first and the latter 3models?

Basis function approach

▶ For single input case, i.e., when x ∈ R, basis functionmodel is written as

y = f(x) = w0 + w1h1(x) + w2h2(x) + . . .+ wqhq(x),

where hk, k = 1, . . . , q is a basis function.

▶ How can we estimate the parameters w0, w1, . . . , wq byleast squares method?

minw0,w∈Rq

n∑i=1

(yi − (w0 +

d∑j=1

wjhj(x))

Basis function approach as linear models

▶ Original training set

x1...xn

y1y2...yn

▶ Expanded training set

1 h1(x1) h2(x1) · · · hq(x1)1 h1(x2) h2(x2) · · · hq(x2)...

......

. . ....

1 h1(xn) h2(xn) · · · hq(xn)

y1y2...yn

Basis function approach and linear model

▶ Basis function approach

y = f(x) = w0 · 1 + w1h1(x) + w2h2(x) + . . .+ wqhq(x)

▶ Linear regression with multiple inputs

y = f(x) = w0 · 1 + w1x1 + w2x2 + . . .+ wqxq

Which basis functions should we use?

▶ Radial basis function

hk(x) = exp

(−(x− ck)

0 20 40 60 80 100

Input x

How to determine q, {ck}qk=1, σ2 in RBF

▶ Approach 1▶ q ← n▶ ck ← xi, k = 1, . . . , q▶ s ← cross validation (explained later)

0 20 40 60 80 100

Input x

How to determine q, {ck}qk=1, σ2 in RBF

▶ Approach 2▶ q ← cross validation

▶ ck ←(kn

)thquantile of {xi}ni=1

▶ s ← cross validation

0 20 40 60 80 100

Input x

RBF Approach for Collision Data

▶ If we select good hyper-parameters (q, {ck}qk=1, s)

0 10 20 30 40 50 60

Time[ms]

Overfitting

▶ If we do not select good hyper-parameters (q, {ck}qk=1, s)

Simulation Example for RBF

-1 -0.5 0 0.5 1

Input x

TruthEstimated

-1 -0.5 0 0.5 1

Input x

TruthEstimated

q = 1 q = 10

-1 -0.5 0 0.5 1

Input x

TruthEstimated

-1 -0.5 0 0.5 1

Input x

TruthEstimated

q = 20 q = 50

Training Error and True Error

1 2 4 5 10 20 40 50

# of basis "q"

Training ErrorTrue Error

High dimensional problem

E.g. Gene expression microarray

▶ xij: activity of jth gene for ith patient

▶ yi: Effectiveness of a medicine

yi = f(xi) = w0 + w1xi1 + . . .+ w10000xi,10000

How to avoid overfitting: Regularization

minw∈Rd

n∑i=1

(yi −w⊤xi

)subject to

d∑j=1

w2j ≤ s

Ridge regression

w∗λ = arg min

w∈Rd

n∑i=1

(yi −w⊤xi)2 + λ

d∑j=1

where λ > 0 is the regularization parameter.

Simulation Example for Ridge Regression

-1 -0.5 0 0.5 1

Input x

TruthEstimated

-1 -0.5 0 0.5 1

Input x

TruthEstimated

λ = 0 (q = 50) λ = 1.0 (q = 50)

-1 -0.5 0 0.5 1

Input x

TruthEstimated

-1 -0.5 0 0.5 1

Input x

TruthEstimated

λ = 10 (q = 50) λ = 100 (q = 50)

Solving Ridge regression

▶ Training data

x11 x12 · · · x1d

x21 x22 · · · x2d...

.... . .

...xn1 xn2 · · · xnd

x2...xn

, yn×1

y1y2...yn

▶ Solution

w∗λ = (X⊤X + λI)−1X⊤y

Model selection

▶ Example: how to select the regularization parameter λ

▶ Training error cannot be used for model selection becauseit cannot detect over-training (as we will see).

Training and validation data

-1 -0.5 0 0.5 1

Input x

Training dataValidation data

•: Training data, •: Validation data

Training and validation data

1 2 4 5 10 20 40 50

# of basis "q"

Training ErrorTrue Error

Validation Error

▶ Training error monotonically decreases

▶ Validation error can be used as a proxy of the true error

Cross-validation

Training data Validation data

▶ The model hyper-parameters (q, λ etc.) are selectedbased on the average validation error.

Cross-validation example

-1 -0.5 0 0.5 1

Input x

Training DataValidation Data

-1 -0.5 0 0.5 1

Input x

-1 -0.5 0 0.5 1

Input x

Round 1 Round 2 Round 3

-1 -0.5 0 0.5 1

Input x

-1 -0.5 0 0.5 1

Input x

Round 4 Round 5

Leave-one-out cross-validation (LOOCV)

Training data Validation data

Final exercise IGiven the data {(xi, yi)}ni=1, consider a constant model thatdoes not use the input x (not useful in practice)

f(x) = w0,

The parameter w0 is estimated by solving the followingminimization problem:

arg minw0∈R

n∑i=1

(yi − f(xi))2 = arg min

w0∈R

n∑i=1

(yi − w0)2

▶ First, show that the solution of the optimal solution ofthe above problem is the sample mean, i.e.,

arg minw0∈R

n∑i=1

(yi − w0)2 =

n∑i=1

Final exercise II

▶ Next, confirm that the training error and the LOOCVerror of the constant model are respectively written as

TrainEr :=n∑

(yi − arg min

w0∈R

n∑j=1

(yj − w0)2

(yi − y)2,

LoocvEr :=n∑

(yi − arg min

w0∈R

∑j =i

(yj − w0)2

(yi −

n− 1

∑j =i

Final exercise III

▶ Finally, show that the relation of these two errors arewritten as

LoocvEr :=

n− 1

TrainEr.

Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced...

Documents

Transcript of Advanced Lecture on Neural Information Processing Systems ...takeuchi/T/NIPm/NipM03_web.pdfAdvanced...

Artificial Neural Networks Lecture Noteslucci/notes/lecture11.pdfArtificial Neural Networks Lecture Notes Stephen Lucci, PhD Artificial Neural Networks Part 11 Stephen Lucci, PhD Page

cKit cardiac progenitors of neural crest origin · cKit+ cardiac progenitors of neural crest origin Konstantinos E. Hatzistergos a, Lauro M. Takeuchi , Dieter Saurb, Barbara Seidlerb,

Principles of Neural Organization Lecture 2

Artificial Neural Networks Lecture 1lucci/notes/lecture01.pdfArtificial Neural Networks Lecture Notes - Part 1 Stephen Lucci, PhD Models of Computation Artificial neural networks can

Deep Learning & Neural Networks Lecture 1

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

Lecture 10 Neural Network

LECTURE 16: NEURAL NETWORKS

Lecture 12 – Neural Networks

Lecture 7 Artificial neural networks - Ankara …comp.eng.ankara.edu.tr/files/2015/10/Lecture-7.pdfNegnevitsky, Pearson Education, 2011 1 Lecture 7 Artificial neural networks: Supervised

Lecture Neural Nets 2010

Lecture Notes Introduction to Neural Networks

COMPACT TRACK LOADERTL8 - Home - Takeuchi US · ATTACHMENTS Takeuchi now offers attachments for all of your Takeuchi equipment. See your authorized Takeuchi dealer for additional

CSC321: Neural Networks Lecture 12: Clustering

Takeuchi - experlogix.com

Lecture 14 – Neural Networks

Embryology Of Neural Tube (Lecture)

Lecture: Deep Convolutional Neural Networksvision.stanford.edu/teaching/cs131_fall1819/files/20_deep_cnns.pdfConvolutional Neural Networks (ConvNets) • Neural networks which involve

Lecture Neural-Networks ..

Lecture IV. Mechanisms of Neural Development