Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive...
-
Upload
christian-kehl -
Category
Science
-
view
214 -
download
0
Transcript of Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of Massive...
Towards Distributed, Semi-Automatic Content-Based Visual Information Retrieval (CBVIR) of
Massive Media Archives
Christian Kehl and Ana Lucia VarbanescuUniversity of Amsterdam – Informatics Institute
Motivation and Focus
Object Recognition Challenges
Real Media Archives
image set static dynamic increase
tag set static dynamic increase
Contributions
• A concept for parameterizing Visual Object classifiers, based on data sampling strategies
• An initial, small-scale study of accelerator performance impact using different sampling schemes
• Side contribution: a prototypical implementation of sampling-based CNN parameterization
Approach - Theory
• 2 key ideas: – use pre-trained network, successive refinement– even untrained, randomly-initialised models can be used [Jarret2009]– more re-training -> longer computation, higher accuracy– CNNs learn “from the [input] data” => controlling data means controlling the
model [Zeiler2012]– CNN parameter tuning: demands knowledge of the process– CNN input data “tuning”: demands knowledge of the data (e.g. images)
filter response
[Jarret2009]
Approach - PracticeVisual Object Classification:
- CNN- Bag of Words
=> White-Box Model; interchangeable
Images (database) and tags (thesaurus) are sample according to operator input. The VOC white box takes the training sample to generate a classification model. After passing quality checks, the archive is re-classified (cheap operation) to generate a score matrix for querying.
state chart for indexing, showing data (boxes), states (upper ellipsis) and transition function (lower ellipsis)
Experiments
• VOC block: ConvNet [Krizhevsky2012]
• Prototype with model pre-training in pylearn2
• Focus on precision influence:– sample rate (images/tag)– tag generalisation– full retraining vs. pre-trained
refinement (last-layer retraining)
• Hardware:– Intel Xeon E5-1650 v3 in SMP (8 PEs used)– dedicated graphics adapter (Quadro K4200)– Intel E5-2620 + 1x NVIDIA GTX680– Intel E5-2620 + 1x NVIDIA Tesla C2050– Intel E5-2620 in SMP (16 PEs used)
• Software:– pylearn2 + hardware-accelerated numpy
• Datasets:– startoff: CIFAR-10 – 10 tags; 50,000 train
images; 10,000 test images– source of additional content: CIFAR-100: 100
classes; 50,000 train images; 10,000 test im.
Results
significantly overfitted network (error rates)
Discussion
• Full model recomputation and last-layer re-training: comparable precision
• re-training valid alternative for dynamic data updates• ratio “tag samples : # image” has visible precision impact
=> input sampling does parameterize the model• Tag generalisation improves precision• CNN computation times benefit more from Accelerator usage
than algorithmic tuning (for small examples)• technical: NVIDIA Quadro K4200 has not been utilized by
pylearn2 as accelerator
Upcoming Research
• Experiments ILSVRC 2010+ImageNet (reduced overfitting)• use distributed workflow environments (WS-VLAM)• scaling on network nodes• impact of retraining at different NN layers• research how user feedback can control classification• improve score matrix evaluation on query for growing
archives (sparse matrix access patterns)• research “steering via sample”
Acknowledgements
References:
[Jarret2009] K. Jarret, K. Kavukcuoglu, M. Ranzato and Y. LeCun, “What is the best multi-
stage architecture for object recogniton ?”, in IEEE 12th International Conference
of Computer Vision, 2009, pp. 2146-2153
[Zeiler2014] M.D.Zeiler and R. Fergus, „Visualizing and Understanding
Convolutional Networks“, in Computer Vision – ECCV, Springer, 2014,
pp. 818-833[Krizhevsky2012] A. Krizhevsky, I. Sutskever, and G.E.
Hinton, „ImageNet Classification and Deep
Convolutional Neural Networks“, in Advances in
Neural Information Processing Systems, 2012, pp. 1097-1105
• Lorentz Centrum, for the project initialization
• NWO and STW, for the KIEM project grant
• Roeland Ordelman and “Beeld en Geluid” (NISV), for the close project collaboration
• Adam Belloum and Thomas Mensink (UvA – IvI), for the discussions on Big Data workflow packages and Visual Indexing strategies