HMC Taxila_Internship_Report_Amjad_Faizan_Mohsin_15_July_2015
Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.
-
Upload
esther-holmes -
Category
Documents
-
view
212 -
download
0
Transcript of Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.
![Page 1: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/1.jpg)
Brain Damage: Algorithms for Network Pruning
Andrew Yip
HMC Fall 2003
![Page 2: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/2.jpg)
The Idea
• Networks with excessive weights “over-train” on data. As a result, they have poor generalization.
• Create a technique that can effectively reduce the size of the network without reducing validation.
• Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.
![Page 3: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/3.jpg)
History
• Removing weights means to set them to 0 and freeze them
• First attempt at network pruning removed weights of least magnitude
• Minimize cost function composed of both the training error and the measure of network complexity
![Page 4: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/4.jpg)
Lecun’s Take
• Derive a more theoretically sound technique for weight removal order using the derivative of the error function:
)(3
212
21
21 UOuuhuhugE ji
jiiji
iiii
ii
221
ii
ii uhE
![Page 5: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/5.jpg)
Computing the 2nd Derivatives
• Network expressed as:
• Diagonals of Hessian:
• Second Derivatives:
)( ii afx j
jiji xwa
),(
2
2
ji ijkk w
Eh 2
2
2
2
2
jijij
xa
E
w
E
)()(2)( ''2'2
2
iiiii
afxdafa
E
ii
lllii
i x
Eaf
a
Ewaf
a
E
)()( ''
2
222'
2
2
![Page 6: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/6.jpg)
The Recipe
• Train the network until local minimum is obtained
• Compute the second derivatives for each parameter
• Compute the saliencies
• Delete the low-saliency parameters
• Iterate
2/2kkkk uhs
2/2kkkk uhs
![Page 7: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/7.jpg)
Results
Results of OBD Compared to Magnitude-Based Damage
![Page 8: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/8.jpg)
Results Continued
Comparison of MSE with Retraining versus w/o Retraining
![Page 9: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/9.jpg)
Lecon’s Conclusions
• Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased.
• OBD can be used either as an automatic pruning tool or an interactive one.
![Page 10: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/10.jpg)
Babak Hassibi: Return of Lecun
• Several problems arise from Lecun’s simplifying assumptions
• For smaller sized networks, OBD chooses the incorrect parameter to delete
• It is possible to recursively calculate the Hessian, yielding a more accurate approximation.
![Page 11: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/11.jpg)
**Insert Math Here**
(I have no idea what I’m talking about)
![Page 12: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/12.jpg)
The MONK’s Problems
• Set of problems involving classifying artificial robots based on six discrete valued attributes
• Binary Decision Problems: (head_shape = body_shape)
• Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.
![Page 13: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/13.jpg)
Results: Hassibi Wins
Training Training # weights
MONK1 BPWD
OBS
100
100
100
100
58
14
MONK2 BPWD
OBS
100
100
100
100
39
15
MONK3 BPWD
OBS
93.4
93.4
97.2
97.2
39
4
![Page 14: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/14.jpg)
References
• Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, 1990.
• Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center. 1993.
• Thrun, S.B. “The MONK’s Problems”. CMU. 1991.
![Page 15: Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.](https://reader035.fdocuments.us/reader035/viewer/2022072005/56649ce25503460f949acb46/html5/thumbnails/15.jpg)
Questions?
(Brain Background Courtesy Brainburst.com)