Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive...

Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of

Massive Media Archives

Christian Kehl and Ana Lucia VarbanescuUniversity of Amsterdam – Informatics Institute

Motivation and Focus

Object Recognition Challenges

Real Media Archives

image set static dynamic increase

tag set static dynamic increase

Contributions

• A concept for parameterizing Visual Object classifiers, based on data sampling strategies

• An initial, small-scale study of accelerator performance impact using different sampling schemes

• Side contribution: a prototypical implementation of sampling-based CNN parameterization

Approach - Theory

• 2 key ideas: – use pre-trained network, successive refinement– even untrained, randomly-initialised models can be used [Jarret2009]– more re-training -> longer computation, higher accuracy– CNNs learn “from the [input] data” => controlling data means controlling the

model [Zeiler2012]– CNN parameter tuning: demands knowledge of the process– CNN input data “tuning”: demands knowledge of the data (e.g. images)

filter response

[Jarret2009]

Approach - PracticeVisual Object Classification:

- CNN- Bag of Words

=> White-Box Model; interchangeable

Images (database) and tags (thesaurus) are sample according to operator input. The VOC white box takes the training sample to generate a classification model. After passing quality checks, the archive is re-classified (cheap operation) to generate a score matrix for querying.

state chart for indexing, showing data (boxes), states (upper ellipsis) and transition function (lower ellipsis)

Experiments

• VOC block: ConvNet [Krizhevsky2012]

• Prototype with model pre-training in pylearn2

• Focus on precision influence:– sample rate (images/tag)– tag generalisation– full retraining vs. pre-trained

refinement (last-layer retraining)

• Hardware:– Intel Xeon E5-1650 v3 in SMP (8 PEs used)– dedicated graphics adapter (Quadro K4200)– Intel E5-2620 + 1x NVIDIA GTX680– Intel E5-2620 + 1x NVIDIA Tesla C2050– Intel E5-2620 in SMP (16 PEs used)

• Software:– pylearn2 + hardware-accelerated numpy

• Datasets:– startoff: CIFAR-10 – 10 tags; 50,000 train

images; 10,000 test images– source of additional content: CIFAR-100: 100

classes; 50,000 train images; 10,000 test im.

Results

significantly overfitted network (error rates)

Discussion

• Full model recomputation and last-layer re-training: comparable precision

• re-training valid alternative for dynamic data updates• ratio “tag samples : # image” has visible precision impact

=> input sampling does parameterize the model• Tag generalisation improves precision• CNN computation times benefit more from Accelerator usage

than algorithmic tuning (for small examples)• technical: NVIDIA Quadro K4200 has not been utilized by

pylearn2 as accelerator

Upcoming Research

• Experiments ILSVRC 2010+ImageNet (reduced overfitting)• use distributed workflow environments (WS-VLAM)• scaling on network nodes• impact of retraining at different NN layers• research how user feedback can control classification• improve score matrix evaluation on query for growing

archives (sparse matrix access patterns)• research “steering via sample”

Acknowledgements

References:

[Jarret2009] K. Jarret, K. Kavukcuoglu, M. Ranzato and Y. LeCun, “What is the best multi-

stage architecture for object recogniton ?”, in IEEE 12th International Conference

of Computer Vision, 2009, pp. 2146-2153

[Zeiler2014] M.D.Zeiler and R. Fergus, „Visualizing and Understanding

Convolutional Networks“, in Computer Vision – ECCV, Springer, 2014,

pp. 818-833[Krizhevsky2012] A. Krizhevsky, I. Sutskever, and G.E.

Hinton, „ImageNet Classification and Deep

Convolutional Neural Networks“, in Advances in

Neural Information Processing Systems, 2012, pp. 1097-1105

• Lorentz Centrum, for the project initialization

• NWO and STW, for the KIEM project grant

• Roeland Ordelman and “Beeld en Geluid” (NISV), for the close project collaboration

• Adam Belloum and Thomas Mensink (UvA – IvI), for the discussions on Big Data workflow packages and Visual Indexing strategies

Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive...

Science

Transcript of Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive...