Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

62
Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data Xu Chen University of Illinois at Chicago Electrical and Computer Engineering March/01/2010

description

Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data. Xu Chen University of Illinois at Chicago Electrical and Computer Engineering March/01/2010. Outline. Background and Motivation Related Work Problem Statement Expected Contributions - PowerPoint PPT Presentation

Transcript of Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Page 1: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Robust View-Invariant Representation for Classification

and Retrieval of Image and Video Data

Xu ChenUniversity of Illinois at Chicago

Electrical and Computer EngineeringMarch/01/2010

Page 2: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Outline

Background and MotivationRelated WorkProblem StatementExpected Contributions Null Space Invariants

Tensor Null Space

Localized Null Space

Non-linear Kernel Space Invariants

Bilinear Invariants

Page 3: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Background

Within the last several years, object motion trajectory-based recognition has gained significant interest in diverse application areas including:

sign language gesture recognition, Global Positioning System (GPS), Car Navigation System (CNS), animal mobility experiments, sports video trajectory analysis automatic video surveillance .

Page 4: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Motivation

Accurate activity classification and recognition algorithms in multiple view is an extremely challenging task.

Object trajectories captured from different view-points lead to completely different representations, which can be modeled by affine transformation approximately.

To get a view independent representation, the trajectory data is represented in an affine invariant feature space.

Page 5: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Related Work

[Stiller, IJCV, 1994] math formulation of NSI [Bashir et al., ACM multimedia, 2006]

Curvature scale space (CSS), Centroid distance function (CDF) representation, only works with small camera motions

[Chellapa et al., TIP, 2006] PCNSA for activity recognition, [Huang et al., TIP, 2008] correlation tensor analysis

[Chang et al., PAMI, 2008] kernel methods with multilevel temporal alignment, not view invariant

Page 6: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Problem Statement and Approach

Development of efficient view invariant representation, indexing/retrieval, and classification techniques for motion based events

Null Space in a particular basis is invariant in the presence of arbitrary affine transformations.

Demonstration of enormous potential in computer vision, especially in motion event, activity recognition and retrieval.

Page 7: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Null Space Invariants

Let be a single 2-D point, i = 0,1,… ,N-1 . Motion trajectory can be represented in the n 2-D points in a matrix M:

null space H:

Where q is a n by 1 vector, H is a matrix spanned by the vector (linearly independent basis) with the size n by (n-3).

Page 8: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Null Space Invariants (NSI)

Typically, each element in H is given by:

Page 9: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data
Page 10: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Null Space based Classification/Retrieval Algorithm

Normalization the length of trajectories. Taking 2D FFT, selecting the N largest coefficients and then taking 2D

IFFT. Computation of NSI for the normalized raw data and

vectorizing the NSI. Once we obtain the n by n-3 NSI H, we convert H into the n(n-3) by 1 vector.

Applying Principal Component Null Space Analysis (PCNSA) on vectorized NSI.

There are various classification and retrieval algorithms we could apply on NSI.

Page 11: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Normalization Example to 25 samples

Page 12: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Details of PCNSA1. Obtain PCA Space: Evaluate total covariance matrix , then apply PCA to

the total covariance matrix to find W(pca), whose columns are the L leading eigenvectors.

2. Project the data vectors, class means and class covariance matrices into the corresponding data vectors, class means, and class covariance matrices in the PCA space.

3. Obtain ANS: Find the approximate null space , for each class i by choosing M(i) smallest eigenvalues’ corresponding eigenvectors.

Page 13: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Details of PCNSA

4. Obtain Valid Classification Directions in ANS: Say If any direction e(i) satisfies this direction is said

valid direction and used to build valid ANS, W(NSA, i).

5. Classification: PCNSA finds distances from a query trajectory X to all classes :

d(X, i)=||W(NSA, i) (X-m(i)||, where m(i) is the mean for each class. We choose the smallest distance to a class for classification of X.

6. Retrieval: We compute the distance of the querytrajectory Y to any other trajectory X(i) by d(X, i)=||W(NSA, i)(X(i)-Y||.

Page 14: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Classification Performance

We plot the classification accuracy verus the number of classes with 20 trajectories in each class (up to 40 classes).

Page 15: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Classification Performance

We plot the classification accuracy with the number of trajectories (up to 40 trajectories in each class)

Page 16: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Retrieval Performance

Recall

Precisio

n

To further demonstrate the view invariance nature of our system, we populate the CAVIAR dataset with 5 rotated versions for each trajectories in the class by rotating the trajectories with -60, -30, 0, 30, 60 degrees.

Apply PCNSA on NSI;

Directly using PCA on NSI

Page 17: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Visual illustration for retrieval results with 20 classes with motion trajectories from CAVIAR dataset for the motion events ”chase” and ”shopping and leave” for fixed cameras from unknown views. (query and top 2 retrieval)

Page 18: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Applications of NSI in image retrieval

Facial recognition

Extract SIFT

as feature points

The raw data matrix is not necessarily of the size 3 by n.

Page 19: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Image retrieval results

Page 20: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Perturbation Analysis

So the ratio of the output error (error on null space) and input error (error on the raw data) is:

Z the noise matrix on the raw data

Page 21: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

SNR

The ratio of the energy of the signal for NSI and the energy of the noise on NSI.

Page 22: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Optimal Sampling

Given the perturbation, designing optimal sampling strategy .

Uniform sampling and Poisson sampling are utilized.

Arbitrary trajectories in x and y directions:

x=f(t), y=g(t)

Expanding the trajectory in Macluarin series.

Page 23: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Optimal Sampling

Property 2: The rate parameter = O(N) should be chosen for Poisson sampling to guarantee the convergence of the error ratio, where N is the total number of samples.

Page 24: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Optimal Sampling

Page 25: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Optimal Sampling

In our framework, the density corresponds to the average number of samples per unit-length; i.e.

Page 26: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Arbitrary Moving Cameras and Segmented NSIFixed cameras from unknown views (all the features

points undergo the same global affine transformations). Classification and retrieval problem is further compound

(the feature points can undergo different affine transformations).

Computing null space of segmented trajectories yields higher accuracy since the orientations and the translations for adjacent points are very close, therefore they have more similar null space representation locally.

Overlapping segmentation and non-overlapping segmentation. (Assumption)

Page 27: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Query Rank 1 Rank 2

global

Overlap

segment

by 5

Entering the shop

First

16 NSI

Page 28: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

optimal samplingWithout Poisson sampling

With Poisson sampling

The same trajectory with different representation due to camera motions

Page 29: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

The example of the trajectory ”all” and affine versions with and without Poisson sampling with lamda=0.8. NS representations with Poisson sampling (on the right) are more similar than the ones without sampling (on the left). Poisson sampling greatly attenuates the noisy effects.

Without sampling With Poisson sampling

Page 30: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Classification Accuracy

Page 31: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Retrieval Time

sec

Page 32: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Comparison

ASL dataset, 20 classes with 40 trajectories in each class

Page 33: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Tensor Null Space (TNSI)

Fundamental mathematical framework for tensor NSI

View-invariant classification and retrieval of multiple motion trajectories.

Page 34: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Definition of Tensor Null Space

Conditions for rotational invariance:

Applying affine transformation T (m) on the mth unfolding of the multi-dimensional data M, if the resulting tensor null space Q is invariant in the mth dimension, then it is referred to as mode-m invariant.

M(1), M(2), M(3) are unfolding of the three dimensional tensor from different dimensions. M(1): I1 by I2I3 M(2): I2 by I1I3 M(3): I3 by I1I2

Page 35: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Definition of Tensor Null Space

Conditions for translation invariance for tensor null space:

Due to the invariance of rotation,

Page 36: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Motion Event Tensor

We align each trajectory as two rows in a matrix according tox and y coordinates, and the number of rows of a matrix isset to be twice the number of the objects in the motion eventunder analysis

P: the length of normalized trajectories

J: Twice of the number of trajectories

K: Number of video samples

Page 37: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Simulation results for TNSI

The accuracy of proposed classification system versus number of classes. There are 20 tensors in each class. Simulation results show that our system preserves its efficiency even for higher number of different classes (J (three trajectory in each clip)=3, P (length of trajectories)=18, K (video clips)=20). (unfolding in K )

Page 38: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Accuracy values versus increase in the number of tensors within a class. There are 20 classes in the system.

Page 39: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Localized Null Space

Consider the view invariant video classification and retrieval. partial queries dynamical video database.

Efficient updating and downdating procedures for the representation of dynamic video databases.

Localized Null Space is one of the ways to solve the problem.

Page 40: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

47

Localized Null Space (LNS)Localized Null Space (LNS)

Localized Null Space relies on different key points in different segments.

Page 41: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

48

Localized Null SpaceLocalized Null Space

Page 42: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Proposed Localized Null Space

Zero elements

Zero elements

N-3

N

Traditional Null Space

49

Structure of Localized Null SpaceStructure of Localized Null Space

Illustration of the structure of the traditional Null Space and the proposed Localized Null Space.

Zero elements

Zero elements

3 Non-Zero elements

N-3

Zero elements

Zero elements

K-3Non-Zero elements for W1

K

N-K-3Non-Zero elements for W2

3

Zero elements

Zero elements

N-K

Page 43: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Splitting of Raw Data SpaceSplitting of Raw Data Space

Page 44: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

51

Splitting of Raw Data SpaceSplitting of Raw Data Space

Deterministic splitting The length of the feature vector and the key points

are known to the users. LNS provides perfect solution

Random splitting The length of the feature vector and the key points

are not available to the users. Splitting and key points must be estimated.

Page 45: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

52

Optimal SplittingOptimal Splitting

where D is the distortion for random splitting given by

and P(L) is the distribution of the segment with length L, and K is the optimal segmentation length. Solving the minimization problem, we obtain

Page 46: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

53

Optimal Key Points Selection within Optimal Key Points Selection within Each segmentEach segment

where C is the probability that all the key points are in the range .

Page 47: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

54

Benefits of LNSBenefits of LNS

The localized null space can be viewed as consisting multiple subspaces and therefore can be dynamically split for retrieval of partial queries.

Localized Null Space can be used to merge multiple Null Spaces into an integrated Null Space.

Localized Null Space has the same complexity as the traditional null space.

Page 48: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

55

Visual illustration of the facial image B and part of rotate image A with identical localized null space representations.

LNS ExampleLNS Example

Page 49: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Non-linear Kernel Space Invariants(NKSI)

Invariance to non-linear transformationRelying on Taylor expansions to

approximate the non-linear transformations with linear transformations

Application: Standard Perspective Transformation

Page 50: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Non-linear Kernel Space Invariants(NKSI)

When k=2

Page 51: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Non-linear Kernel Space Invariants(NKSI)

Page 52: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Standard Perspective Transformation

Page 53: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Unequal multiple trajectory representation

Page 54: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Bilinear Invariants

AXB=0, where A and B are raw data matrices, X is the invariant basis.

When A and B are subject to different linear transformations from the left and right side respectively, X is invariant.

Page 55: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Bilinear Invariants

Page 56: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Retrieval of unequal multiple trajectories

Page 57: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Conclusion

Null Space an effective and robust tool for classification and

retrieval of motion events

segmentation of null space can further improve the performance for arbitrary moving cameras

Tensor Null Space

high order data

Page 58: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Conclusion

Localized Null Space Dynamic updating of the database

Partial Query, Splitting and Merging of Null Space

Non-linear Kernel Space Invariants Invariance to non-linear transformation

Bilinear Invariants suitable for different lengths of features, different dimensions of raw

data.

Page 59: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Publications

Journal Papers: 1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and

trajectory estimation of mobile object using minimum samples,'' IEEE Transactions on Vehicular Technology (TVT), volume 8, issue 9, 2009, pp 4439-4446.

2. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part I: View Invariant Motion Trajectory Analysis and Image Classification and Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), revised, 2009.

3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part II: Localized Null Space Representation for Dynamic Image and Video Databases," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), submitted, 2009.

Page 60: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Publications

Conference Papers:1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and trajectory

estimation of mobile object with a single sensor,'' IEEE Statistical Singal Processing Workshop (SSP'07), Madison, Wisconsin, 2007.

2. Eser Ustunel, Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'08), Las Vegas, Nevada, 2008.

3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust closed-form localization of mobile targets using a single sensor based on a non-linear measurement model,'' IEEE International Workshop Singal Processing Advances in Wireless Communications (SPAWC'08), Recife, Pernambuco, Brazil, 2008.

4. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust null space representation and sampling for view invariant motion trajectory analysis,'' IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, Alaska, 2008.

5. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust multi-dimensional null space representation for image retrieval and classification,'' IEEE Conference on Image Processing (ICIP'08), San Diego, California, 2008.

6. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, "View-Invariant Tensor Null Space Representation For Multiple Motion Trajectory Retrieval and Classification," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'09) (Invited Paper), Taibei, Taiwan, 2009.

Page 61: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Publications

Conference Papers:7. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localized Null-Space

representation for Dynamic Updating and Downdating in Image and Video Databases,'' IEEE International Conference on Image Processing (ICIP'09), Cairo, Egypt, 2009.

8. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP‘10), Dallas, Texas, 2010.

Page 62: Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Thanks !

Questions ?