Master Defense Slides (translated)

1. High capacity neural network optimization problems: study & solutions explorationFrancis Piraut, eng., M.A.Sc [email_address] http://fraka6.blogspot.com/

2. Plan

Context: learn language model with NN

High capacity NN optimization inefficiency (error # & CPU time)

Is this normal?

Various optimization problems

Some solutions&results

Contributions

Futurs work

Conclusion

3. Learning algorithm: Neural Network

Problem:

Find P(c i |x 1 , x 2 .)from samples P( x 1 , x 2 . | c i )

No distribution apriori

Complex relationships (non-linear)

A solution = Neural Network

4. sortie zcible t t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and capacity P(c i |x i ) P(c i |x i ) y 2 y j z 1 Z k 5. 6. High/huge capacity Neural Network y 1 y 2 y 2 7. Constraints

First order stochatic gradient

Standard architecture

One learning rate

Overfitting is neglected

Database :

Letters26 classes /16 inputs/20000 examples

8. Errors :Optimization Inefficiency of High Capacity Neural Networks 9. CPU time: Optimization Inefficiency of High Capacity Neural Networks 10. Is this inefficiency normal?

Hypothesis: No

inefficiency is created by the increase of optimisation problems inherent to the backprop stochastic

Linear solutions vs non-linear solutions

Solutions space

Solution = reduce ou eliminate problems related to backpropagation

11. sortie zcible t z 1 Z k t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and equations y 2 y j 12. Learning process is slowing down for non-linear relationships 13. Solutions space of a N+K Neurones Neural Network Solution space of a NNeurones Neural Network Solutions space 14. Similar Solutions Initial State Example 5 iterations3 iterations 15. Optimisation problems

Moving target problem

Attenuation and gradient dilution

No specialisation mechanism (ex:boosting)

Opposites gradients (classification)

Symetry problem

16. sortie zcible t z 1 Z k t 1 t k y 1 x i x D y N w jk w ij x 1 Neural Networks Optimization Problems

Moving target problem

Attenuation and gradient dilution

No specialisation mechanism (ex:boosting)

Opposites gradients (classification)

Symetry problem

y 2 y j 17. Explored solutions

Incremental Neural Networks

Uncoupled architecture

Neural Network with parameter part optimisation

18. Incremental Neural Networks : first approach 19. Incremental Neural Networks : first approach (fix weights optimisation) 20. Hypothesis: Incremental NN OK Incremental NN Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 21. Incremental Neural Networks (1): results 22. Why it doesnt work? (critical points) 23. 24. 25. Incremental Neural Network : second approach (add hidden layers) z 1 z 2 x 1 x 2 z 1 z 2 y 1 x 1 x 2 y 2 y 3 y 4 26. Cost function curve shape 27. Hypothesis: Incremental NN (add layers) OK Incremental NN (add layers)Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 28. Incremental Neural Network (2): results 29. Uncoupled architecture 30. Hypothesis: Uncoupled Architecture OK Removed Decoupled architecture Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 31. In efficiency of high capacity Neural Networks (CPU time) 32. Efficiency of High capacity Neural Network: decoupled architecture 33. Hypothesis: Partial Parameters optimization OK Opt. partie Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 34. Neural Networks with partial parameters optimization: results All parametersoptimization Max sensitivity optimization 35. Why predicting parameters? (observation) poque Valeurs 36. Hypothesis * Benefit: reduce # iterations by predicting values based on history Parameter prediction Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 37. Prediction : Quadratic extrapolation 38. Prediction : Learning rate increase 39. Contributions

Experimental indication of optimization problem for high capacity NN

Same capacity (add hidden layer):

Speed up learning

Better error rate

Presentation of a solution that doesnt reduce speed when we increase capacity (decoupled architecture/opposite gradients)

40. Futur works

Can we generalised high capacity optimization inefficiency? (more datasets)

In classification task, is decouped architecture a better choice in an generalization point of view?

Is the point critique hypothesis applicable in the context of incremental neural networks?

Adding hidden layers: why it doesnt work for successive layers?

Parameters part optimization

Better comprehension of results

Which selections parameters algorithm is better?

Is there a prediction parameters technique that is efficient?

41. Conclusion

Partially documentedexperimentally high capacity Neural Network inefficiency( cpu time/# error )

Various problems

Explored solutions:

Incremental approach

Decoupled architecture

Part of parameters optimization

Parameters predictions

42. Any Questions?? 43. Exemple :solution linaire 44. Exemple :solution hautement non-linaire 45. Slection des connections influenant le plus le cot 46. Slection des connections influenant le plus lerreur T = 1 S = 0 T = 0 S = 1 T = 0 S = 0.1 T = 0 S = 0.1 47. Observation: idealized behavior of the ratio time

Master Defense Slides (translated)

Technology

Transcript of Master Defense Slides (translated)