Recurrent Image Annotator
Transcript of Recurrent Image Annotator
Recurrent Image Annotator for Arbitrary Length Image TaggingJIREN JINNAKAYAMA LAB
2
1. Introduction to Automatic Image Annotation
3
Automatic Image Annotation (AIA)
4
Difficulties of the TaskMost previous work focus on several problems:• label sparsity• label imbalance• incorrect/incomplete labels
The basic way is to utilize:• image-to-tag correlation• tag-to-tag correlation
5
Existing Methods• generative models (distribution over image features and annotation tags), Yu et al.• discriminatively trained classifiers, Claudio et al.• based on K-nearest-neighbor (KNN), Guillaumin et al.• based on Object detection, Song et al.
6
2. The Missing Part: Annotation Length
7
Missing Part: Annotation LengthConventional evaluation has a fixed annotation length • annotate k most relevant keywords • evaluate retrieval performance per keyword• average over keywords• typical k value is 5 or 3Why did they do this?• for ease of comparison with previous results• most existing methods cannot trivially predict proper number of tags
8
Why Annotation Length MattersFixed annotation length:• not the natural way that we humans annotate images • not the fact of realistic images
Problem to solve: predict results with
arbitrary length AL:arbitrary length
T5: top-5
GT:
Ground truth
9
3. Our Solution: Recurrent Image Annotator
10
Sequence generation• just output them one by one -> arbitrary annotation length• previous outputs influence the current output -> tag-to-tag correlation
Inspired by machine translation and image captioning• image or language A’s sentence to be encoded• image description or language B’s sentence to be decoded
Natural Way for Arbitrary Length Outputs
Karpathy, et al. (2014)
Vinyals, Oriol, et al. (2014).
11
What Else We NeedAn order of the tags• Both image captioning and machine translation aim to generate sentences, which have a natural order. • Unfortunately, in image annotation task, order is not available.• We have to choose or learn an order. Points for a useful order “rule”:• should be based on semantic image and tag information • tag sequences in each training example should follow the same rule to be sorted
Easy to learn Good for generation
12
Contributions1.analyze the insufficiency in existing methods:
◦ unable to generate image dependent number of tags
2.first to form image annotation as a sequence generation problem◦ propose a novel RNN based model Recurrent
Image Annotator 3.propose and evaluate several orders for
sorting the tag inputs ◦ show the importance of tag order in tag sequence
generation problem
13
Recurrent Image Annotator (RIA)
14
4. Submodules of Recurrent Image Annotator
15
Neural Networks
Hidden layer: linear transformation + nonlinear activation function (e.g., sigmoid function)Simple network from
Wikipedia
Fully-connected
16
Convolutional Neural Networks
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Local connectivity
Shared weights
3D volumes of neurons
17
Recurrent Neural Networks
http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
18
Long Short Term Memory NetworksAn improved version of RNN:• Remember information for long periods of time• Use gating units to control information flow through time steps
S.Hochreiter and J.Schmidhuber, 1997
Core idea of LSTM:
the cell state
easy for information to just
flow along it unchanged http://colah.github.io/posts/2015-08-Understanding-LSTMs/
19
5. Experimentation
20
Dataset 1: Corel 5KVocabulary size
260
Number of images
4,493
Words per image
3.4 (maximum is 5)
Images per word
58.6 (maximum is 1004)
21
Dataset 2: ESP GAMEVocabulary size
269
Number of images
18,689
Words per image
4.7(maximum is 15)
Images per word
362.7 (maximum is 4553)
22
Dataset 3: IAPR-TC12Vocabulary size
291
Number of images
17,665
Words per image
5.7 (maximum is 23)
Images per word
347.7 (maximum is 4999)
23
Evaluation Measures• precision, P (averaged over classes)• recall, R (averaged over classes)• f-measure, F (averaged over classes)• the number of classes with non-zero recall value, N+
24
Different Orders for Tag Sequences• dictionary order: alphabetical order• random order: random sorting tags in each training example• frequent-first order: put the frequent tags ahead rare tags• rare-first order: put the rare tags ahead frequent tags
25
6. Analysis and Conclusion
26
Arbitrary Length Annotation (1)
27
Arbitrary Length Annotation (2)
28
Arbitrary Length Annotation (3)
29
Compare Influence of Different Orders
P: precisionR: recallF: f-measureN+: the number of class with non-zero recall valuesLarger value represents better performance.
30
Analysis of Results for Different Orders Why rare-first outperforms frequent-first:• “rare” means rare in the datasets, however, for the single image, it may represent more importance• frequent tags are easier to predict than rare tags naturally, while frequent-first order makes the easy task easier, but the difficult task more difficult• correctly predicting rare tags is more important in the per-class evaluation measure
31
Top-5 Annotation
P: precisionR: recallF: f-measureN+: the number of class with non-zero recall values
Much faster testing speed: Constant time (5ms) for each testing image,instead of O(N) in KNN based methods.N: number of training images
32
Conclusion• transform image annotation to sequence generation problem • achieve comparable performance to state-of-the-art methods • decide appropriate annotation length automatically • obtain a much faster testing speed• confirm the importance of a proper tag sequence order
33
Output of This Work1.Accepted by International Conference
on Pattern Recognition (ICPR) 2016 (oral)
2.Web demo for RIA: www.nlab.ci.i.u-tokyo.ac.jp/annotator
34
Future workImprove the strategy to obtain the tag sequence order• e.g., use reinforcement learning to learn the order automatically
Extend to personal preference annotation• consider eye-catching effect, etc.
35