Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.

Brain Damage: Algorithms for Network Pruning

Andrew Yip

HMC Fall 2003

The Idea

• Networks with excessive weights “over-train” on data. As a result, they have poor generalization.

• Create a technique that can effectively reduce the size of the network without reducing validation.

• Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net.

History

• Removing weights means to set them to 0 and freeze them

• First attempt at network pruning removed weights of least magnitude

• Minimize cost function composed of both the training error and the measure of network complexity

Lecun’s Take

• Derive a more theoretically sound technique for weight removal order using the derivative of the error function:

)(3

212

21

21 UOuuhuhugE ji

jiiji

iiii

ii

221

ii

ii uhE

Computing the 2nd Derivatives

• Network expressed as:

• Diagonals of Hessian:

• Second Derivatives:

)( ii afx j

jiji xwa

),(

2

2

ji ijkk w

Eh 2

2

2

2

2

jijij

xa

E

w

E

)()(2)( ''2'2

2

iiiii

afxdafa

E

ii

lllii

i x

Eaf

a

Ewaf

a

E

)()( ''

2

222'

2

2

The Recipe

• Train the network until local minimum is obtained

• Compute the second derivatives for each parameter

• Compute the saliencies

• Delete the low-saliency parameters

• Iterate

2/2kkkk uhs

2/2kkkk uhs

Results

Results of OBD Compared to Magnitude-Based Damage

Results Continued

Comparison of MSE with Retraining versus w/o Retraining

Lecon’s Conclusions

• Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased.

• OBD can be used either as an automatic pruning tool or an interactive one.

Babak Hassibi: Return of Lecun

• Several problems arise from Lecun’s simplifying assumptions

• For smaller sized networks, OBD chooses the incorrect parameter to delete

• It is possible to recursively calculate the Hessian, yielding a more accurate approximation.

**Insert Math Here**

(I have no idea what I’m talking about)

The MONK’s Problems

• Set of problems involving classifying artificial robots based on six discrete valued attributes

• Binary Decision Problems: (head_shape = body_shape)

• Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time.

Results: Hassibi Wins

Training Training # weights

MONK1 BPWD

OBS

100

100

100

100

58

14

MONK2 BPWD

OBS

100

100

100

100

39

15

MONK3 BPWD

OBS

93.4

93.4

97.2

97.2

39

4

References

• Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, 1990.

• Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center. 1993.

• Thrun, S.B. “The MONK’s Problems”. CMU. 1991.

Questions?

(Brain Background Courtesy Brainburst.com)

Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.

Documents

Transcript of Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003.