Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights...

25
©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader for deep learning https://cvpr20-video.mxnet.io/ Yi Zhu and Zhi Zhang 06/14/2020

Transcript of Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights...

Page 1: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Tutorial on Video ModelingDecord: an efficient video reader for deep learning

https://cvpr20-video.mxnet.io/

Yi Zhu and Zhi Zhang06/14/2020

Page 2: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Motivation

Ø Videos have redundant frames, need video reader

Videos ---> Raw frames ----> Network training

Page 3: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Motivation

Ø Videos have redundant frames, need video reader

videos450G

frames6.8T

Ø Pre-processing takes timeØ Data storage is hugeØ IO bottleneck during training

Videos ---> Network training

Videos ---> Raw frames ----> Network training

Page 4: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Motivation

Ø Slowness in random access

Page 5: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Motivation

Ø Slowness in random access

Wang etal, Temporal Segment Networks: Towards Good Practices for Deep Action Recognition, ECCV 2016

Segment1: index 9Segment2: index 51Segment3: index 102

Random access > sequential read

Page 6: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Motivation

Ø Lack of flexibility or good user experience in terms of video handling

Decord

frame = vr[99]

OpenCV

Page 7: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Decord

Decord: provide smooth experiences similar to random image loader for deep learning.

Page 8: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Installation

Supports Windows/Mac/Linux

Need to build from source to enable GPU support

Page 9: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Ease of Usage

Page 10: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Pythonic interface

Easy to get video duration

Direct access to any frames by list indexing

Page 11: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Batch read frames

Page 12: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Segment1: index 9Segment2: index 51Segment3: index 102vrames = vr.get_batch([9, 51, 102])

Page 13: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

3D CNNs, loading clips instead of frames

Segment1: index [1,2,3,4,5,6,7,8,9,10,11,12]Segment2: index [5,6,7,8,9,10,11,12,13,14,15,16,17]Segment3: index [9,10,11,12,13,14,15,16,17,18,19,20,21]

Duplication! (OpenCV -> slow, Lintel -> X)

Page 14: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Batch read frames

Efficient handling of duplication

Page 15: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Drop-in replacement of OpenCV

Page 16: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Resize videos while video reading

Page 17: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Batch reading using range, reduce python overhead

Page 18: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Usage

Get all the key frames

Page 19: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Deep learning framework

Page 20: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Efficiency Comparison

Page 21: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

2x faster than OpenCV

Page 22: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Speed Comparison

Page 23: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Ø OpenCV and PyAVSlow in random access pattern

Ø Lintel (https://github.com/dukebw/lintel)can’t handle duplication, no key_frame handling

Ø DALI (https://github.com/NVIDIA/DALI)complicated pipeline and usage, has to use Nvidia GPU

We will provide interface to other video readers!

Comparison to other video readers

Page 24: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Previewed Features

Ø GPU decoding and data augmentationcomplete, but needs further optimization

Ø Video Loaderan all-in-one solutionhttps://github.com/dmlc/decord

Page 25: Tutorial on Video Modeling · ©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark Tutorial on Video Modeling Decord: an efficient video reader

©2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Trademark

Conclusion

Ø Easy to use and flexiblepythonic

Ø Efficient

Ø Notebook(https://github.com/dmlc/decord/blob/master/examples/video_reader.ipynb)

Ø Please try Decord at https://github.com/dmlc/decord