Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

62
@DocXavi Deep Learning for Computer Vision Video Analytics 5 May 2016 Xavier Giró-i-Nieto Master en Creació Multimedia

Transcript of Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Page 1: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

@DocXavi

Deep Learning for Computer VisionVideo Analytics 5 May 2016

Xavier Giró-i-Nieto

Master en Creació Multimedia

Page 2: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Outline

1. Recognition2. Optical Flow3. Object Tracking

2

Page 3: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Recognition

Demo: Clarifai

MIT Technology Review : “A start-up’s Neural Network Can Understand Video” (3/2/2015)3

Page 4: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Figure: Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE.

4

Recognition

Page 5: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

5

Recognition

Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Page 6: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

6

Recognition

Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Previous lectures

Page 7: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

7

Recognition

Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Page 8: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE.

Slides extracted from ReadCV seminar by Victor Campos 8

Recognition: DeepVideo

Page 9: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 9

Recognition: DeepVideo: Demo

Page 10: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 10

Recognition: DeepVideo: Architectures

Page 11: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 11

Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14]

Recognition: DeepVideo: Features

Page 12: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 12

Recognition: DeepVideo: Multiscale

Page 13: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 13

Recognition: DeepVideo: Results

Page 14: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

14

Recognition

Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Page 15: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

15

Recognition: C3D

Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Page 16: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

16Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: Demo

Page 17: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

17K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015.

Recognition: C3D: Spatial dimensionSpatial dimensions (XY) of the used kernels are fixed to 3x3, following Symonian & Zisserman (ICLR 2015).

Page 18: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

18Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: Temporal dimension3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

Page 19: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

19Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Recognition: C3D: Temporal dimension

Page 20: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

20Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

No gain when varying the temporal depth across layers.

Recognition: C3D: Temporal dimension

Page 21: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

21Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: Architecture

Featurevector

Page 22: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

22Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: Feature vector

Video sequence

16 frames-long clips

8 frames-long overlap

Page 23: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

23Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: Feature vector

16-frame clip

16-frame clip

16-frame clip

16-frame clip

...

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

Page 24: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

24Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: VisualizationBased on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details.

Page 25: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

25Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: Compactness

Page 26: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

26Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks

Recognition: C3D: Performance

Page 27: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

27Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Recognition: C3D: SoftwareImplementation by Michael Gygli (GitHub)

Page 28: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

28Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694-4702. 2015.

Recognition:

Page 29: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

29

Recognition: ImageNet Video

[ILSVRC 2015 Slides and videos]

Page 30: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

30

Recognition: ImageNet Video

[ILSVRC 2015 Slides and videos]

Page 31: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

31

Recognition: ImageNet Video

[ILSVRC 2015 Slides and videos]

Page 32: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

32

Recognition: ImageNet Video

Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Page 33: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

33

Recognition: ImageNet Video

Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Page 34: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

34

Recognition: ImageNet Video

Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Page 35: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

35

Recognition: ImageNet Video

Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]

Page 36: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 36

Page 37: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 37

Page 38: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 38

End to end supervised learning of optical flow.

Page 39: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet (contracting)

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 39

Option A: Stack both input images together and feed them through a generic network.

Page 40: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet (contracting)

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 40

Option B: Create two separate, yet identical processing streams for the two images and combine them at a later stage.

Page 41: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet (contracting)

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 41

Option B: Create two separate, yet identical processing streams for the two images and combine them at a later stage.

Correlation layer: Convolution of data patches from the layers to combine.

Page 42: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet (expanding)

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 42

Upconvolutional layers: Unpooling features maps + convolution.Upconvolutioned feature maps are concatenated with the corresponding map from the contractive part.

Page 43: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Optical Flow: FlowNet

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. ICCV 2015 43

Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic Flying Dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color).

Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI.

Data augmentation

Page 44: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 44

Optical Flow: FlowNet

Page 45: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: MDNet

45Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)

Page 46: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: MDNet

46Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)

Page 47: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: MDNet: Architecture

47Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)

Domain-specific layers are used during training for each sequence, but are replaced by a single one at test time.

Page 48: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: MDNet: Online update

48Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)

MDNet is updated online at test time with hard negative mining, that is, selecting negative samples with the highest positive score.

Page 49: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT

49Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." ICCV 2015 [code]

Page 50: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT

50Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification.

conv4-3 conv5-3

Page 51: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT: Specialization

51Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence.

Page 52: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT: Localization

52Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

Although trained for image classification, feature maps in conv5-3 enable object localization…...but is not discriminative enough to different objects of the same category.

Page 53: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: Localization

53Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015.

Other works have shown how features maps in convolutional layers allow object localization.

Page 54: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT: Localization

54Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

On the other hand, feature maps from conv4-3 are more sensitive to intra-class appearance variation…

conv4-3 conv5-3

Page 55: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT: Architecture

55Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Page 56: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: FCNT: Results

56Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

Page 57: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: DeepTracking

57P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]

Page 58: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: DeepTracking

58P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]

Page 59: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: DeepTracking

59P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]

Page 60: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: DeepTracking

60P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]

Page 61: Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

Object tracking: Compact features

61Wang, Naiyan, and Dit-Yan Yeung. "Learning a deep compact image representation for visual tracking." NIPS 2013 [Project page with code]