Download - Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Transcript
Page 1: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Question aboutGradient descent

Hung-yi Lee

Page 2: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Larger gradient, larger steps?

𝑦=π‘Žπ‘₯2+𝑏π‘₯+𝑐

|πœ• π‘¦πœ• π‘₯ |=ΒΏ2π‘Žπ‘₯+π‘βˆ¨ΒΏ

π‘₯0

ΒΏ π‘₯0βˆ’π‘2π‘Ž

∨¿

π‘₯0

ΒΏ2π‘Žπ‘₯0+π‘βˆ¨ΒΏ

Best step:

βˆ’π‘2π‘Ž

ΒΏ2π‘Žπ‘₯0+π‘βˆ¨ 2π‘Ž

Page 3: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Contradiction

𝑀𝑑+1←𝑀 π‘‘βˆ’πœ‚πœŽπ‘‘ 𝑔

𝑑

𝜎 𝑑=βˆšπ›Ό (πœŽπ‘‘βˆ’ 1 )2+(1βˆ’π›Ό ) (𝑔𝑑 )2

𝑀𝑑+1←𝑀 π‘‘βˆ’πœ‚

βˆšβˆ‘π‘–=0𝑑

(𝑔𝑖 )2𝑔𝑑

Original Gradient descent

Adagrad

RMSprop

𝑀𝑑+1←𝑀 π‘‘βˆ’πœ‚π‘”π‘‘

𝑔𝑑=πœ•πΆ (𝑀 𝑑 )πœ•π‘€

Larger gradient, larger step

Divided by first derivative

Divided by first derivative

Page 4: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Second Derivative

𝑦=π‘Žπ‘₯2+𝑏π‘₯+𝑐

|πœ• π‘¦πœ• π‘₯ |=ΒΏ2π‘Žπ‘₯+π‘βˆ¨ΒΏ

βˆ’π‘2π‘Ž

π‘₯0

ΒΏ π‘₯0βˆ’π‘2π‘Ž

∨¿

π‘₯0ΒΏ2π‘Žπ‘₯0+π‘βˆ¨ΒΏ

Best step:

πœ•2 π‘¦πœ• π‘₯2

=2π‘Ž The best step is|First derivative|

Second derivative

ΒΏ2π‘Žπ‘₯0+π‘βˆ¨ 2π‘Ž

Page 5: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

More than one parameters

𝑀1

𝑀2

𝑀1

𝑀2

|First derivative|

Second derivativeThe best step is

a

b

c

d

c < a

c > d

Larger second derivative

smaller second derivative

a > b

Page 6: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

What to do with Adagrad and RMSprop?

|First derivative|

Second derivative

The best step is

Use first derivative to estimate second derivative

√ ( first derivative )2

𝑀1 𝑀2

larger second derivative

smaller second derivative

Page 7: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Acknowledgement

β€’ This question is raised by ζŽε»£ε’Œ

Page 8: Question about Gradient descent Hung-yi Lee. Larger gradient, larger steps? Best step:

Thanks for your attention!