Post on 26-Sep-2020
Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones
Why Action Recognition from Drones?
Jinwoo Choi1,2 Jia-Bin Huang1Dataset available at:http://bit.ly/NEC-Drone
Experimental ResultsSame source & target label sets
Gaurav Sharma2 Manmohan Chandraker2
Challenges
1Virginia Tech 2NEC Labs America
Rare viewpoint Continuous motion & blur Expensive Annotation
...
Unlabeled Target
Cross-entropy loss
Action classifier
Sharedweights
Gradient reversal
Feature extractor
Adversarial lossMLP
Labeled Source
‘run’
‘hug’
...
Feature extractor
Domain classifier
FC
Action prediction
Domain prediction
Sourcefeature
Targetfeature
Sport Analytics Crime Detection Disaster Monitoring
Method AccuracySource only 13.6% (reference)Video-based DA* 27.2% (+13.6)Instance-based DA* 29.4% (+15.8)Both video & instance DA* 32.0% (+18.4)Fully supervised on target 69.3%
Method AccuracySource only 8.2% (reference)Video-based DA* 18.7% (+10.5)Instance-based DA* 21.6% (+13.4)Both video & instance DA* 22.5% (+14.3)Fully supervised on target 68.3%
Proposed Method
......
Detection & tracking
Different source & target label sets
Cross-entropy loss
Action classifier
Sharedweights
Gradient reversal
Feature extractor
Adversarial lossMLP
Feature extractor
Domain classifier
FC
Action prediction
Domain prediction
Sourcefeature
Targetfeature
...
Unlabeled Target
Tripletloss
Sharedweights
Gradient reversal
Feature extractor
Adversarial loss
Labeled Source
Feature extractor
a
p
n
Embedding space
sim
ilar
diss
imila
r
a
p
n
...
Domain classifier
MLP
Average
Feature extractor
FC
Action classifier
Action prediction
Instancefeature
Action classifier
Feature extractor
FC
Action prediction
Videofeature
Detection & tracking
Target test video
Target support set(Few labeled target examples)
Feature extractor
Support setfeature
Feature extractor
Testfeature
Sharedweights
k-NN in embedding space
‘Hug’
‘Run’
‘Phone’
Same source & target label sets Different source & target label sets
Instance-level training
Video-level training
Testing
Video-level training
Testing
NEC-Drone Dataset● 5,250 full HD videos
○ 1,188 train○ 437 val○ 454 test○ 3,161 unlabeled
● Captured by two drones● 16 action categories● 19 actors
m = 0 m = 3 m = 5 m = 10 m = 20
Instance-based w/o. DA 12.6 31.1 35.4 43.2 49.5
Instance-based w. DA 16.5 31.6 39.3 41.3 52.4
Video-baseid w. DA 13.6 24.3 35.4 53.9 52.9
Both video & instance DA 15.1 32.0 41.8 54.9 58.3
m = 3 m = 5 m = 10 m = 20Instance-based w. DA 15.9 21.6 31.3 34.6
Video-baseid w. DA 12.8 18.7 29.1 34.4
Both video & instance DA 18.1 22.5 36.1 39.7
‘Run’ ‘Hug’
Ablation: # of target labeled examples per class used
Ablation: # of target labeled examples per class used
*Using five target labeled examples per class
*Using five target labeled examples per class
Main results
Main results