CNN-Based Incremental Learning Strategies for Face Recognition

1
CNN-Based Incremental Learning Strategies for Face Recognition Vincenzo Lomonaco and Davide Maltoni Biometric System Lab – DISI - University of Bologna Vincenzo Lomonaco and Davide Maltoni DISI - University of Bologna Emails: {vincenzo.lomonaco, davide.maltoni}@unibo.it Websites: www.vincenzolomonaco.com, http://bias.csr.unibo.it/maltoni/ Contacts [1] Franco, A., Maio, D., Maltoni, D., Scienze, C.L., Bologna, U.: The Big Brother Database: Evaluating Face Recognition in Smart Home Environments. Image (Rochester, N.Y.). 142–150 (2009). [2] Franco, A., Maio, D., Maltoni, D.: Incremental template updating for face recognition in home environments. Pattern Recognit. 43, 2891–2903 (2010). [3] Maltoni, D., Lomonaco, V.: Semi-supervised Tuning from Temporal Coherence. Tech. Report. DISI - Univ. of Bologna. http://arxiv.org/pdf/1511.03163v3.pdf. 1–14 (2015). References In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well on face recognition tasks being able to deal with large occlusions, extremely low resolutions, strong illumination variations, etc. However,partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning. In this work we compare different incremental learning strategies for CNN- based architectures in the context of face recognition. One possible approach to deal with this incremental scenario is to store all the previously seen past data, and retrain the model from scratch as soon as a new batch of data is available. However, this solution is often impractical for many real world systems where memory and computational resources are subject to stiff constraints. A different approach to address this issue, is to update the model based only on the new available batch of data. The BigBrother dataset (SETB) [1] has been created starting from 2 DVDs made commercially available at the end of the 2006 edition of the “Big Brother”reality show produced for the ItalianTV. It consists of 14,675 (70×70) gray-scale images of faces belonging to 7 subject often characterized by bad lighting, poor focus, occlusions, and non-frontal pose. In addition to the typical training and test sets, it provides an additional large set of images called “updating set” for incremental learning/tuning purposes and split in 54 days. Figure 1. The seven subjects of the Big Brother dataset (SETB). Forgetting can be a very detrimental issue: hence, when possible (i.e., transfer learning from the same domain), it is preferable to use CNN as a fixed feature extractor to feed an incremental classifier. If the features are not optimized (transfer learning from a different domain), the tuning of low level layers may be preferable and the learning strength (i.e., learning rate, number of iteration, etc.) can be used to control forgetting. Training a CNN from scratch can be advantageous if the problem patterns (and feature invariances) are highly specific and a sufficient number of samples are available. Figure 2. Accuracy of the different strategies tested on the SETB of the Big Brother dataset. Figure 3. The impact of the learning rate on forgetting. Final Acc. % LeNet7 CaffeNet + SVM VGG + SVM CaffeNet + FT VGG + FT 34 Days Split 82.35% 80.10% 96.96% 73.23% 91.39% Orig. Days Split 75.33% 75.13% 96.73% 70.23% 89,58% Cumulative Days 90.50% 86.79% 97.65% 84.26% 95,51% Gain +7.03% +4.97% +0.23% +3.00% +1.81% Loss -8.15% -6.69% -0.69% -11.03% -4.12% Table 1. Accuracy gain and loss of the 34 Days split with respect of the Original and Cumulative days split. Abstract Experiments and results Big Brother dataset Incremental Learning Strategies Conclusions Pre-trained CNN + SVM Pre-trained CNN + Finetuning Ad-hoc CNN trained from scratch SVM 3 2 1

Transcript of CNN-Based Incremental Learning Strategies for Face Recognition

Page 1: CNN-Based Incremental Learning Strategies for Face Recognition

CNN-Based Incremental Learning

Strategies for Face Recognition

Vincenzo Lomonaco and Davide Maltoni

Biometric System Lab – DISI - University of Bologna

Vincenzo Lomonaco and Davide Maltoni

DISI - University of Bologna

Emails: {vincenzo.lomonaco, davide.maltoni}@unibo.it

Websites: www.vincenzolomonaco.com, http://bias.csr.unibo.it/maltoni/

Contacts[1] Franco, A., Maio, D., Maltoni, D., Scienze, C.L., Bologna, U.: The Big Brother Database: Evaluating Face

Recognition in Smart Home Environments. Image (Rochester, N.Y.). 142–150 (2009).

[2] Franco, A., Maio, D., Maltoni, D.: Incremental template updating for face recognition in home environments.

Pattern Recognit. 43, 2891–2903 (2010).

[3] Maltoni, D., Lomonaco, V.: Semi-supervised Tuning from Temporal Coherence. Tech. Report. DISI - Univ. of

Bologna. http://arxiv.org/pdf/1511.03163v3.pdf. 1–14 (2015).

References

In the last decade, Convolutional Neural Networks (CNNs) have shown to

perform incredibly well on face recognition tasks being able to deal with large

occlusions, extremely low resolutions, strong illumination variations, etc.

However, partly because of their complex training and tricky hyper-parameters

tuning, CNNs have been scarcely studied in the context of incremental learning.

In this work we compare different incremental learning strategies for CNN-

based architectures in the context of face recognition.

One possible approach to deal with this incremental scenario is to store all the

previously seen past data, and retrain the model from scratch as soon as a

new batch of data is available. However, this solution is often impractical for

many real world systems where memory and computational resources are

subject to stiff constraints.

A different approach to address this issue, is to update the model based only

on the new available batch of data.

The BigBrother dataset (SETB) [1] has been created starting from 2 DVDs made

commercially available at the end of the 2006 edition of the “Big Brother” reality

show produced for the Italian TV.

It consists of 14,675 (70×70) gray-scale images of faces belonging to 7 subject

often characterized by bad lighting, poor focus, occlusions, and non-frontal

pose.

In addition to the typical training and test sets, it provides an additional large

set of images called “updating set” for incremental learning/tuning purposes

and split in 54 days.

Figure 1. The seven subjects of the Big Brother dataset (SETB).

• Forgetting can be a very detrimental issue: hence, when possible (i.e.,

transfer learning from the same domain), it is preferable to use CNN as a

fixed feature extractor to feed an incremental classifier.

• If the features are not optimized (transfer learning from a different domain),

the tuning of low level layers may be preferable and the learning strength

(i.e., learning rate, number of iteration, etc.) can be used to control

forgetting.

• Training a CNN from scratch can be advantageous if the problem patterns

(and feature invariances) are highly specific and a sufficient number of

samples are available.

Figure 2. Accuracy of the different strategies tested on the SETB of the Big Brother dataset.

Figure 3. The impact of the learning rate on forgetting.

Final Acc. % LeNet7 CaffeNet + SVM VGG + SVM CaffeNet + FT VGG + FT

34 Days Split 82.35% 80.10% 96.96% 73.23% 91.39%

Orig. Days

Split

75.33% 75.13% 96.73% 70.23% 89,58%

Cumulative

Days

90.50% 86.79% 97.65% 84.26% 95,51%

Gain +7.03% +4.97% +0.23% +3.00% +1.81%

Loss -8.15% -6.69% -0.69% -11.03% -4.12%

Table 1. Accuracy gain and loss of the 34 Days split with respect of the Original and Cumulative days split.

Abstract Experiments and results

Big Brother dataset

Incremental Learning Strategies

Conclusions

Pre-trained CNN + SVM

Pre-trained CNN + Finetuning

Ad-hoc CNN trained from scratch

SVM

3

2

1