Weakly-Supervised Action Segmentation with Iterative Soft...
Transcript of Weakly-Supervised Action Segmentation with Iterative Soft...
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
Li Ding (MIT) Chenliang Xu (University of Rochester)[email protected] [email protected]
take bowl pour cereals pour milk stir cereals
take bowl pour cereals pour milk stir cereals
take bowl pour cereals pour milk stir cereals
take bowl pour cereals pour milk stir cerealspour cereals pour cereals pour milk
take bowl pour cereals pour milk stir cerealspour cereals pour cereals pour milk
training
video
transcript
initialize target
inference
assignment
new target
… iterate till meeting stop criteria
take bowl pour cereals pour milk stir cereals
ground truth
training
inference
squeeze orange pour juice
transcript
initialize target
SIL SIL
inference
Δ > 𝜌
Δ > 𝜌 Δ < 𝜌
new transcript
squeeze orange pour juiceSIL SILsqueeze orangeSIL
frame (t)
probability
video feature
1D Conv + BN + ReLU + Pool
feature
1D Conv + BN + ReLU + Pool
feature
feature
feature
Upsample
+
feature
1x1 Conv
1x1 Conv
Upsample
+1x1 Conv
(Frame-wise) FC + Softmax
prediction
Encoder Decoder
1D Conv
1D Conv
1D Conv
Overview of the Weakly-Supervised Action Segmentation System
Iterative Soft Boundary Assignment (ISBA) Temporal Convolutional Feature Pyramid Network (TCFPN)
Experiment
Conclusion
• We iteratively train a temporal segmentation network with
target generated from action transcript, and refine the action
transcript based on the inference of current network.
Code
About
• We propose ISBA as a novel training strategy for weakly-
supervised sequence learning, and TCFPN as an
advanced temporal convolutional network for supervised
action recognition.
• The whole system TCFPN+ISBA outperforms state-of-the-
art on both action segmentation task and alignment tasks.
• Our training strategy for weakly-supervised sequence
learning is general and can be extended to other tasks,
such as speech recognition, video segmentation, etc.
• ISBA uses a simple-yet-
effective algorithm, trying
to generate a reasonable
probabilistic distribution of
different actions through
numbers of iterations.