Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Robust View-Invariant Representation for Classification

and Retrieval of Image and Video Data

Xu ChenUniversity of Illinois at Chicago

Electrical and Computer EngineeringMarch/01/2010

Outline

Background and MotivationRelated WorkProblem StatementExpected Contributions Null Space Invariants

Tensor Null Space

Localized Null Space

Non-linear Kernel Space Invariants

Bilinear Invariants

Background

Within the last several years, object motion trajectory-based recognition has gained significant interest in diverse application areas including:

sign language gesture recognition, Global Positioning System (GPS), Car Navigation System (CNS), animal mobility experiments, sports video trajectory analysis automatic video surveillance .

Motivation

Accurate activity classification and recognition algorithms in multiple view is an extremely challenging task.

Object trajectories captured from different view-points lead to completely different representations, which can be modeled by affine transformation approximately.

To get a view independent representation, the trajectory data is represented in an affine invariant feature space.

Related Work

[Stiller, IJCV, 1994] math formulation of NSI [Bashir et al., ACM multimedia, 2006]

Curvature scale space (CSS), Centroid distance function (CDF) representation, only works with small camera motions

[Chellapa et al., TIP, 2006] PCNSA for activity recognition, [Huang et al., TIP, 2008] correlation tensor analysis

[Chang et al., PAMI, 2008] kernel methods with multilevel temporal alignment, not view invariant

Problem Statement and Approach

Development of efficient view invariant representation, indexing/retrieval, and classification techniques for motion based events

Null Space in a particular basis is invariant in the presence of arbitrary affine transformations.

Demonstration of enormous potential in computer vision, especially in motion event, activity recognition and retrieval.

Null Space Invariants

Let be a single 2-D point, i = 0,1,… ,N-1 . Motion trajectory can be represented in the n 2-D points in a matrix M:

null space H:

Where q is a n by 1 vector, H is a matrix spanned by the vector (linearly independent basis) with the size n by (n-3).

Null Space Invariants (NSI)

Typically, each element in H is given by:

Null Space based Classification/Retrieval Algorithm

Normalization the length of trajectories. Taking 2D FFT, selecting the N largest coefficients and then taking 2D

IFFT. Computation of NSI for the normalized raw data and

vectorizing the NSI. Once we obtain the n by n-3 NSI H, we convert H into the n(n-3) by 1 vector.

Applying Principal Component Null Space Analysis (PCNSA) on vectorized NSI.

There are various classification and retrieval algorithms we could apply on NSI.

Normalization Example to 25 samples

Details of PCNSA1. Obtain PCA Space: Evaluate total covariance matrix , then apply PCA to

the total covariance matrix to find W(pca), whose columns are the L leading eigenvectors.

2. Project the data vectors, class means and class covariance matrices into the corresponding data vectors, class means, and class covariance matrices in the PCA space.

3. Obtain ANS: Find the approximate null space , for each class i by choosing M(i) smallest eigenvalues’ corresponding eigenvectors.

Details of PCNSA

4. Obtain Valid Classification Directions in ANS: Say If any direction e(i) satisfies this direction is said

valid direction and used to build valid ANS, W(NSA, i).

5. Classification: PCNSA finds distances from a query trajectory X to all classes :

d(X, i)=||W(NSA, i) (X-m(i)||, where m(i) is the mean for each class. We choose the smallest distance to a class for classification of X.

6. Retrieval: We compute the distance of the querytrajectory Y to any other trajectory X(i) by d(X, i)=||W(NSA, i)(X(i)-Y||.

Classification Performance

We plot the classification accuracy verus the number of classes with 20 trajectories in each class (up to 40 classes).

Classification Performance

We plot the classification accuracy with the number of trajectories (up to 40 trajectories in each class)

Retrieval Performance

Recall

Precisio

n

To further demonstrate the view invariance nature of our system, we populate the CAVIAR dataset with 5 rotated versions for each trajectories in the class by rotating the trajectories with -60, -30, 0, 30, 60 degrees.

Apply PCNSA on NSI;

Directly using PCA on NSI

Visual illustration for retrieval results with 20 classes with motion trajectories from CAVIAR dataset for the motion events ”chase” and ”shopping and leave” for fixed cameras from unknown views. (query and top 2 retrieval)

Applications of NSI in image retrieval

Facial recognition

Extract SIFT

as feature points

The raw data matrix is not necessarily of the size 3 by n.

Image retrieval results

Perturbation Analysis

So the ratio of the output error (error on null space) and input error (error on the raw data) is:

Z the noise matrix on the raw data

SNR

The ratio of the energy of the signal for NSI and the energy of the noise on NSI.

Optimal Sampling

Given the perturbation, designing optimal sampling strategy .

Uniform sampling and Poisson sampling are utilized.

Arbitrary trajectories in x and y directions:

x=f(t), y=g(t)

Expanding the trajectory in Macluarin series.

Optimal Sampling

Property 2: The rate parameter = O(N) should be chosen for Poisson sampling to guarantee the convergence of the error ratio, where N is the total number of samples.

Optimal Sampling

Optimal Sampling

In our framework, the density corresponds to the average number of samples per unit-length; i.e.

Arbitrary Moving Cameras and Segmented NSIFixed cameras from unknown views (all the features

points undergo the same global affine transformations). Classification and retrieval problem is further compound

(the feature points can undergo different affine transformations).

Computing null space of segmented trajectories yields higher accuracy since the orientations and the translations for adjacent points are very close, therefore they have more similar null space representation locally.

Overlapping segmentation and non-overlapping segmentation. (Assumption)

Query Rank 1 Rank 2

global

Overlap

segment

by 5

Entering the shop

First

16 NSI

optimal samplingWithout Poisson sampling

With Poisson sampling

The same trajectory with different representation due to camera motions

The example of the trajectory ”all” and affine versions with and without Poisson sampling with lamda=0.8. NS representations with Poisson sampling (on the right) are more similar than the ones without sampling (on the left). Poisson sampling greatly attenuates the noisy effects.

Without sampling With Poisson sampling

Classification Accuracy

Retrieval Time

sec

Comparison

ASL dataset, 20 classes with 40 trajectories in each class

Tensor Null Space (TNSI)

Fundamental mathematical framework for tensor NSI

View-invariant classification and retrieval of multiple motion trajectories.

Definition of Tensor Null Space

Conditions for rotational invariance:

Applying affine transformation T (m) on the mth unfolding of the multi-dimensional data M, if the resulting tensor null space Q is invariant in the mth dimension, then it is referred to as mode-m invariant.

M(1), M(2), M(3) are unfolding of the three dimensional tensor from different dimensions. M(1): I1 by I2I3 M(2): I2 by I1I3 M(3): I3 by I1I2

Definition of Tensor Null Space

Conditions for translation invariance for tensor null space:

Due to the invariance of rotation,

Motion Event Tensor

We align each trajectory as two rows in a matrix according tox and y coordinates, and the number of rows of a matrix isset to be twice the number of the objects in the motion eventunder analysis

P: the length of normalized trajectories

J: Twice of the number of trajectories

K: Number of video samples

Simulation results for TNSI

The accuracy of proposed classification system versus number of classes. There are 20 tensors in each class. Simulation results show that our system preserves its efficiency even for higher number of different classes (J (three trajectory in each clip)=3, P (length of trajectories)=18, K (video clips)=20). (unfolding in K )

Accuracy values versus increase in the number of tensors within a class. There are 20 classes in the system.

Localized Null Space

Consider the view invariant video classification and retrieval. partial queries dynamical video database.

Efficient updating and downdating procedures for the representation of dynamic video databases.

Localized Null Space is one of the ways to solve the problem.

47

Localized Null Space (LNS)Localized Null Space (LNS)

Localized Null Space relies on different key points in different segments.

48

Localized Null SpaceLocalized Null Space

Proposed Localized Null Space

Zero elements

Zero elements

N-3

N

Traditional Null Space

49

Structure of Localized Null SpaceStructure of Localized Null Space

Illustration of the structure of the traditional Null Space and the proposed Localized Null Space.

Zero elements

Zero elements

3 Non-Zero elements

N-3

Zero elements

Zero elements

K-3Non-Zero elements for W1

K

N-K-3Non-Zero elements for W2

3

Zero elements

Zero elements

N-K

Splitting of Raw Data SpaceSplitting of Raw Data Space

51

Splitting of Raw Data SpaceSplitting of Raw Data Space

Deterministic splitting The length of the feature vector and the key points

are known to the users. LNS provides perfect solution

Random splitting The length of the feature vector and the key points

are not available to the users. Splitting and key points must be estimated.

52

Optimal SplittingOptimal Splitting

where D is the distortion for random splitting given by

and P(L) is the distribution of the segment with length L, and K is the optimal segmentation length. Solving the minimization problem, we obtain

53

Optimal Key Points Selection within Optimal Key Points Selection within Each segmentEach segment

where C is the probability that all the key points are in the range .

54

Benefits of LNSBenefits of LNS

The localized null space can be viewed as consisting multiple subspaces and therefore can be dynamically split for retrieval of partial queries.

Localized Null Space can be used to merge multiple Null Spaces into an integrated Null Space.

Localized Null Space has the same complexity as the traditional null space.

55

Visual illustration of the facial image B and part of rotate image A with identical localized null space representations.

LNS ExampleLNS Example

Non-linear Kernel Space Invariants(NKSI)

Invariance to non-linear transformationRelying on Taylor expansions to

approximate the non-linear transformations with linear transformations

Application: Standard Perspective Transformation


When k=2

Standard Perspective Transformation

Unequal multiple trajectory representation

Bilinear Invariants

AXB=0, where A and B are raw data matrices, X is the invariant basis.

When A and B are subject to different linear transformations from the left and right side respectively, X is invariant.

Bilinear Invariants

Retrieval of unequal multiple trajectories

Conclusion

Null Space an effective and robust tool for classification and

retrieval of motion events

segmentation of null space can further improve the performance for arbitrary moving cameras

Tensor Null Space

high order data

Conclusion

Localized Null Space Dynamic updating of the database

Partial Query, Splitting and Merging of Null Space

Non-linear Kernel Space Invariants Invariance to non-linear transformation

Bilinear Invariants suitable for different lengths of features, different dimensions of raw

data.

Publications

Journal Papers: 1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and

trajectory estimation of mobile object using minimum samples,'' IEEE Transactions on Vehicular Technology (TVT), volume 8, issue 9, 2009, pp 4439-4446.

2. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part I: View Invariant Motion Trajectory Analysis and Image Classification and Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), revised, 2009.

3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part II: Localized Null Space Representation for Dynamic Image and Video Databases," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), submitted, 2009.

Publications

Conference Papers:1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and trajectory

estimation of mobile object with a single sensor,'' IEEE Statistical Singal Processing Workshop (SSP'07), Madison, Wisconsin, 2007.

2. Eser Ustunel, Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'08), Las Vegas, Nevada, 2008.

3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust closed-form localization of mobile targets using a single sensor based on a non-linear measurement model,'' IEEE International Workshop Singal Processing Advances in Wireless Communications (SPAWC'08), Recife, Pernambuco, Brazil, 2008.

4. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust null space representation and sampling for view invariant motion trajectory analysis,'' IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, Alaska, 2008.

5. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust multi-dimensional null space representation for image retrieval and classification,'' IEEE Conference on Image Processing (ICIP'08), San Diego, California, 2008.

6. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, "View-Invariant Tensor Null Space Representation For Multiple Motion Trajectory Retrieval and Classification," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'09) (Invited Paper), Taibei, Taiwan, 2009.

Publications

Conference Papers:7. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localized Null-Space

representation for Dynamic Updating and Downdating in Image and Video Databases,'' IEEE International Conference on Image Processing (ICIP'09), Cairo, Egypt, 2009.

8. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP‘10), Dallas, Texas, 2010.

Thanks !

Questions ?

Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data

Documents

Transcript of Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data