Master Defense Slides (translated)

download Master Defense Slides (translated)

If you can't read please download the document

description

This is the slides of my master defense; 17 april 2003 subject: "High capacity neural network optimization problems: study & solutions exploration"

Transcript of Master Defense Slides (translated)

  • 1. High capacity neural network optimization problems: study & solutions explorationFrancis Piraut, eng., M.A.Sc [email_address] http://fraka6.blogspot.com/

2. Plan

  • Context: learn language model with NN
  • High capacity NN optimization inefficiency (error # & CPU time)
  • Is this normal?
  • Various optimization problems
  • Some solutions&results
  • Contributions
  • Futurs work
  • Conclusion

3. Learning algorithm: Neural Network

  • Problem:
  • Find P(c i |x 1 , x 2 .)from samples P( x 1 , x 2 . | c i )
  • No distribution apriori
  • Complex relationships (non-linear)
  • A solution = Neural Network

4. sortie zcible t t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and capacity P(c i |x i ) P(c i |x i ) y 2 y j z 1 Z k 5. 6. High/huge capacity Neural Network y 1 y 2 y 2 7. Constraints

  • First order stochatic gradient
  • Standard architecture
  • One learning rate
  • Overfitting is neglected
  • Database :
  • Letters26 classes /16 inputs/20000 examples

8. Errors :Optimization Inefficiency of High Capacity Neural Networks 9. CPU time: Optimization Inefficiency of High Capacity Neural Networks 10. Is this inefficiency normal?

  • Hypothesis: No
  • inefficiency is created by the increase of optimisation problems inherent to the backprop stochastic
    • Linear solutions vs non-linear solutions
    • Solutions space
  • Solution = reduce ou eliminate problems related to backpropagation

11. sortie zcible t z 1 Z k t 1 t k y 1 x i x D y N w kj w ij x 1 Neural Networks and equations y 2 y j 12. Learning process is slowing down for non-linear relationships 13. Solutions space of a N+K Neurones Neural Network Solution space of a NNeurones Neural Network Solutions space 14. Similar Solutions Initial State Example 5 iterations3 iterations 15. Optimisation problems

  • Moving target problem
  • Attenuation and gradient dilution
  • No specialisation mechanism (ex:boosting)
  • Opposites gradients (classification)
  • Symetry problem

16. sortie zcible t z 1 Z k t 1 t k y 1 x i x D y N w jk w ij x 1 Neural Networks Optimization Problems

  • Moving target problem
  • Attenuation and gradient dilution
  • No specialisation mechanism (ex:boosting)
  • Opposites gradients (classification)
  • Symetry problem

y 2 y j 17. Explored solutions

  • Incremental Neural Networks
  • Uncoupled architecture
  • Neural Network with parameter part optimisation
  • Etc.

18. Incremental Neural Networks : first approach 19. Incremental Neural Networks : first approach (fix weights optimisation) 20. Hypothesis: Incremental NN OK Incremental NN Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 21. Incremental Neural Networks (1): results 22. Why it doesnt work? (critical points) 23. 24. 25. Incremental Neural Network : second approach (add hidden layers) z 1 z 2 x 1 x 2 z 1 z 2 y 1 x 1 x 2 y 2 y 3 y 4 26. Cost function curve shape 27. Hypothesis: Incremental NN (add layers) OK Incremental NN (add layers)Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 28. Incremental Neural Network (2): results 29. Uncoupled architecture 30. Hypothesis: Uncoupled Architecture OK Removed Decoupled architecture Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 31. In efficiency of high capacity Neural Networks (CPU time) 32. Efficiency of High capacity Neural Network: decoupled architecture 33. Hypothesis: Partial Parameters optimization OK Opt. partie Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 34. Neural Networks with partial parameters optimization: results All parametersoptimization Max sensitivity optimization 35. Why predicting parameters? (observation) poque Valeurs 36. Hypothesis * Benefit: reduce # iterations by predicting values based on history Parameter prediction Symetry Gradient dillution Specialisation mechanism Opposite gradient Moving target Problems Solution 37. Prediction : Quadratic extrapolation 38. Prediction : Learning rate increase 39. Contributions

  • Experimental indication of optimization problem for high capacity NN
  • Same capacity (add hidden layer):
    • Speed up learning
    • Better error rate
  • Presentation of a solution that doesnt reduce speed when we increase capacity (decoupled architecture/opposite gradients)

40. Futur works

  • Can we generalised high capacity optimization inefficiency? (more datasets)
  • In classification task, is decouped architecture a better choice in an generalization point of view?
  • Is the point critique hypothesis applicable in the context of incremental neural networks?
  • Adding hidden layers: why it doesnt work for successive layers?
  • Parameters part optimization
    • Better comprehension of results
    • Which selections parameters algorithm is better?
  • Is there a prediction parameters technique that is efficient?

41. Conclusion

  • Partially documentedexperimentally high capacity Neural Network inefficiency( cpu time/# error )
  • Various problems
  • Explored solutions:
    • Incremental approach
    • Decoupled architecture
    • Part of parameters optimization
    • Parameters predictions

42. Any Questions?? 43. Exemple :solution linaire 44. Exemple :solution hautement non-linaire 45. Slection des connections influenant le plus le cot 46. Slection des connections influenant le plus lerreur T = 1 S = 0 T = 0 S = 1 T = 0 S = 0.1 T = 0 S = 0.1 47. Observation: idealized behavior of the ratio time