Learnable pooling with Context Gating for Video Classification
Antoine Miech, Ivan Laptev, Josef Sivic
1
Goal: Multi-modal features pooling2
POOLING MODULE
CLASSIFICATION
Recurrent model (e.g LSTM)3
LSTM
LSTM
Recurrent model (e.g LSTM)4
LSTM
LSTM
LSTM
LSTM
Recurrent model (e.g LSTM)5
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
Recurrent model (e.g LSTM)6
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
Recurrent model (e.g LSTM)7
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
Problem ?
▹ Slow for inference/training▹ This is NOT a sequential problem▹ Needs lots of data for training▹ How about very long videos ?
But surprisingly good results !
8
Our approach9
Traditional pooling
▹ Bag-of-visual-words [Sivic and Zisserman, 2003][Csurka et al., 2004]
▹ VLAD [Jégou et al., 2010]
▹ Fisher Vector [Perronnin et al., 2007]
10
Traditional Pipeline11
Unsupervised learning of a dictionary
Learning a supervised classifier
Encoding And unsupervised
dimension reduction
End-to-end Pipeline
Supervised learning of dictionary+
Supervised dimension reduction+
Learning a supervised classifier
VS
Learnable pooling 12
Unsupervised End-to-EndBag-of-visual-Words [Sivic and Zisserman, 2003] Soft-DBoW
VLAD [Jégou et al., 2010] NetVLAD [Arandjelović et al. CVPR 2016]
Fisher Vector [Perronnin et al., 2007] NetFV
Model overview13
Context Gating14
Context Gating15
Context Gating16
Equation:
Model overview17
Results18
Generalization19
20
BonusWinning the kaggle competition
21
Effects of ensembling22
LOUPE Tensorflow toolbox23
GITHUB LOUPE repo:github.com/antoine77340/LOUPE
GITHUB Kaggle code repo:github.com/antoine77340/Youtube-8M-WILLOW
”
Questions ?
24
Top Related