WhatMakesaVideoaVideo:AnalyzingTemporal Informationin...
Transcript of WhatMakesaVideoaVideo:AnalyzingTemporal Informationin...
What Makes a Video a Video: Analyzing Temporal Information inVideo Understanding Models and Datasets
De-An Huang1, Vignesh Ramanathan2, Dhruv Mahajan2, Lorenzo Torresani2, Manohar Paluri2, Li Fei-Fei1, Juan Carlos Niebles1
Stanford University1, Facebook2
Motivation Class-Agnostic Temporal Generator AnalysisØ Videos contain much more than just the imagesØ Still missing an explicit analysis of temporal information
Ø Analyze the video model trained on a dataset (fixed weights)Ø Propose three frameworks to ablate temporal info from test video
Ø Single frame is just an image and contains no temporal information
(b) Video matching C3D deep features of (a)(a) Original Video
Approach Overview
0 10 20 30 40 50 60 70 80 90
Original Video
No Temporal
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
C3D trainedon UCF101
Test Video SelectedFrame
Subsampling
FrameSelector
TemporalGenerator
GeneratedVideoGenerator
Selector
6%
Ø Temporal Dist Shift: Model has not seen “static videos” in trainingØ Generate a video from the frame to bridge the distribution shift but
without using any ”real” temporal information
Ø Learning the Temporal Generator: The video generated from the imageshould be perceptually similar to the original video for the model
Ø Key frame for us to recognize the action without temporal informationØ ! " : Estimate of frame quality
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
C3D trainedon UCF101
Test Video MiddleFrame
ReplicatedFrames
ReplicateFrames
MiddleFrame
Conv
1
Conv
2
Conv
3
Conv
4
Conv
5
C3D trainedon UCF101
Test Video MiddleFrame
MiddleFrame
TemporalGenerator
GeneratedVideo
Naïve Subsampling
Video Model (C3D)Input Video
SelectedFrame
GeneratedVideo
TemporalGenerator
Subsampling
ℓ$ ℓ% ℓ& ℓ' ℓ(
Motion-Invariant Frame Selector
! )* = max/ 0/()*)0/()*) : score of class 3
Input Video
Sub-sampledFrame Candidates
……
)$
)*
)4
!(")
!(")
!(")
argmaxØ Oracle Key Frames (UpperBound): select the framesthat can give correctprediction
Ø Analyzing Motion Information
Ø 40% of UCF101 and 35% of Kinetics classes do not need motion
Ø
Ø Temporal Generator:
Ø Frame Selection:
Ø Oracle Fame Selection
JuggleBallsOriginal Vid
JuggleBallsTemp. Gen.
PlayFluteOriginal Vid
PlayFluteTemp. Gen.
Sled
Dog
R
acin
gIc
e Sk
atin
gB
oxin
gsp
eedb
agSk
iJu
mpi
ng