Transferable Dictionary Pair based Cross-view Action Recognition

19
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong

description

Transferable Dictionary Pair based Cross-view Action Recognition. Lin Hong. Outline. Research Background Method Experiment. Research Background. Cross-view action recognition: Automatically analyze ongoing activities from an unknown video ; - PowerPoint PPT Presentation

Transcript of Transferable Dictionary Pair based Cross-view Action Recognition

Page 1: Transferable Dictionary Pair  based Cross-view Action Recognition

Transferable Dictionary Pair based Cross-view Action Recognition

Lin Hong

Page 2: Transferable Dictionary Pair  based Cross-view Action Recognition

Outline

• Research Background• Method• Experiment

Page 3: Transferable Dictionary Pair  based Cross-view Action Recognition

Research Background

• Cross-view action recognition:– Automatically analyze ongoing activities from an unknown video;– Recognize actions from different views, to be robust to viewpoints

variation ;– Essential for human computer interaction, video retrieval, especially in

activity monitoring in surveillance scenarios.

• Challenging:– Same action looks quite different from different viewpoints;– Action models learned from one view become less discriminative for

recognizing in a much different view;

Page 4: Transferable Dictionary Pair  based Cross-view Action Recognition

Research Background

• Transfer learning approaches for cross-view action recognition:1. Split-based features method [1];2. Bilingual-words based method [2];3. Transferable dictionary pair (TDP) based method [3];

[1] Ali Farhadi and Mostafa Kamali Tabrizi. Learning to recognize activities from the wrong view point. In ECCV, 2008.

[2] Jingen Liu, Mubarak Shah, Benjamin Kuipers, and Silvio Savarese. Cross-view action recognition via view knowledge transfer. In CVPR, 2011.

[3] Zheng, J., Jiang, Z., Phillips, J., Chellappa,R.. ‘Cross-View Action Recognition via a Transferable Dictionary Pair’. Proc. of the British Mach. Vision Conf., 2012.

Page 5: Transferable Dictionary Pair  based Cross-view Action Recognition

Research Background

• Motivation:1. Split-based features method: Exploited the frame-to-frame

correspondence in pairs of videos taken from two views of the same action by transferring the split-based features of video frames in the source view to the corresponding video frames in the target view. Defect: the frame-to-frame correspondence is computationally expensive.

2. Bilingual-words based method: exploit the correspondence between the view-dependent codebooks constructed by k-means clustering on videos in each view. Defect: the codebook-to-codebook correspondence is not accurate enough to guarantee that a pair of videos observed in the source and target views will have similar feature representations.

Page 6: Transferable Dictionary Pair  based Cross-view Action Recognition

Research Background

• Motivation 3. Transferable dictionary pair (TDP) based method: It the most encouraging method now. It learn two dictionaries of source and target views simultaneously to ensure the same action to have the same representation. Defect: Although this transfer learning algorithm achieve good performance. However, it still remain hard to transfer action models across views that involves the top view.

Page 7: Transferable Dictionary Pair  based Cross-view Action Recognition

Object: forcing two sets of videos of shared actionsin two views to have the same sparse representations. Thus, the action model learned in the source view can be directly applied to classify test videos in the target view.

Method: TDP based method

Flowchart of cross-view action recognition framework

Page 8: Transferable Dictionary Pair  based Cross-view Action Recognition

Spatio-temporal interest point (STIP) based feature: Advantage:Capture local salient characteristics of appearance and motion;Robust to spatiotemporal shifts and scales, background clutter and multiple motions;

Local space–time feature extraction: Detector: select spatio-temporal interest points in video by maximizing specific saliency functions Descriptor: capture shape and motion in the neighborhoods of selected points using image measurements

Method

Page 9: Transferable Dictionary Pair  based Cross-view Action Recognition

MethodBag of Words (BoW) feature:STIP features are first quantized into visual words and a video is then represented as the frequency histogram over the visual words.

STIP

STIP

cookbook

cookbook

BoW feature

BoW feature

view1

view2

K-means

K-means

Page 10: Transferable Dictionary Pair  based Cross-view Action Recognition

Method

Sparse coding and Dictionary learning The K-SVD is well known for efficiently learning a dictionary from a set of

training signals. It solves the following optimization problem:

where is the learned dictionary, and are the sparse representations of input signals

Page 11: Transferable Dictionary Pair  based Cross-view Action Recognition

Method• View-invariant action recognition Object: recognize an unknown action from an unseen (target) view using training data taken from other (source) views. Method: simultaneously learn the source and target dictionaries by forcing the shared videos taken from two views to have the same sparse representations

Ys

Yt

Xs

||

Xt

Page 12: Transferable Dictionary Pair  based Cross-view Action Recognition

Method

Ds and Dt are learned by forcing two sets of videos of shared actions in two views to have the same sparse representations. With such sparse view-invariant representations, we can learn an action model for orphan actions in the source view and test it in the target view.

Where,

Then, {Ds, Dt } can be efficiently learned using K-SVD algorithm

• Objective function:

Page 13: Transferable Dictionary Pair  based Cross-view Action Recognition

Experiment

• Protocol: leave one action class out Each time we only consider one action class for testing in the target view. This action class is not used to construct a transferable dictionary pair.

• Dataset We test the approach on the IXMAS multi-view dataset; Website: http://4drepository.inrialpes.fr/public/viewgroup/6

Page 14: Transferable Dictionary Pair  based Cross-view Action Recognition

Experiment: dataset

The most popular multi-view dataset:• Five views: four side views and one top view;• 11 actions performed 3 times by 10 actors;• Each view contain 330 action videos;• Each action class contain 30 samples under each view;• The actors choose freely position and orientation;

Page 15: Transferable Dictionary Pair  based Cross-view Action Recognition

cam0

cam1

cam2

cam3

The four arrows indicate the

directions actors may face

Experiment: dataset

Page 16: Transferable Dictionary Pair  based Cross-view Action Recognition

cam0 cam1 cam2 cam3 cam4

IXMAS dataset: exemplar frames

time1

time2

time3

Experiment: dataset

Page 17: Transferable Dictionary Pair  based Cross-view Action Recognition

Experiment• Each time, we select one action class (30 samples) for testing in the target

views. Except this action class samples, the rest (300 samples ) in both source view and source view are used for dictionary pair learning.

• The classification accuracy is averaging over all possible combinations for selecting test actions.

Page 18: Transferable Dictionary Pair  based Cross-view Action Recognition

Experiment: STIP feature Local feature: Cuboid detector & descriptor [4] Global feature: shape-flow descriptor[5] BoW feature: 1000-dimensional local BoW feature + 500-dimensional

global BoW feature.

Finally, each action video is represented by 1500-dimensional BoW feature.

[4] P. Doll´ar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS.[5] D. Tran and A. Sorokin. Human activity recognition with metric learning. In ECCV, 2008.

Page 19: Transferable Dictionary Pair  based Cross-view Action Recognition

Experiment: result

• Cuboid + KNN

C0 96.7 96.7 98.4 97.9 97.9 97.6 84.6 84.9C1 99.1 97.3 96.7 96.4 97.0 89.7 76.1 81.2C2 96.7 92.1 92.4 89.7 97.6 94.9 88.2 89.1C3 95.8 97.0 96.4 94.2 97.0 96.7 81.2 83.9C4 81.5 83.0 69.7 70.6 90.0 89.7 84.2 83.7Ave. 93.3 92.4 88.8 87.8 95.5 95.2 94.2 91.5 82.5 84.8

Source View

cross-view (Cuboids+KNN, unsupervised) 1500dimension (Dictsize:50, Sparse:30)

%Target View

C0 C1 C2 C3 C4

The red numbers are the recognition results in [1]. The black blod numbers are our result with the same method as [1]. This experiment’s setting is follow the [1], using the same feature and classifier.The comparative results can show that our experiment is correct. It demonstrates that we realize the method in [1] successfully.

[1] Zheng, J., Jiang, Z., Phillips, J., Chellappa, R.. ‘Cross-View Action Recognition via a Transferable Dictionary Pair’. Proc. of the British Mach. Vision Conf., 2012.