Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data
description
Transcript of Robust View-Invariant Representation for Classification and Retrieval of Image and Video Data
Robust View-Invariant Representation for Classification
and Retrieval of Image and Video Data
Xu ChenUniversity of Illinois at Chicago
Electrical and Computer EngineeringMarch/01/2010
Outline
Background and MotivationRelated WorkProblem StatementExpected Contributions Null Space Invariants
Tensor Null Space
Localized Null Space
Non-linear Kernel Space Invariants
Bilinear Invariants
Background
Within the last several years, object motion trajectory-based recognition has gained significant interest in diverse application areas including:
sign language gesture recognition, Global Positioning System (GPS), Car Navigation System (CNS), animal mobility experiments, sports video trajectory analysis automatic video surveillance .
Motivation
Accurate activity classification and recognition algorithms in multiple view is an extremely challenging task.
Object trajectories captured from different view-points lead to completely different representations, which can be modeled by affine transformation approximately.
To get a view independent representation, the trajectory data is represented in an affine invariant feature space.
Related Work
[Stiller, IJCV, 1994] math formulation of NSI [Bashir et al., ACM multimedia, 2006]
Curvature scale space (CSS), Centroid distance function (CDF) representation, only works with small camera motions
[Chellapa et al., TIP, 2006] PCNSA for activity recognition, [Huang et al., TIP, 2008] correlation tensor analysis
[Chang et al., PAMI, 2008] kernel methods with multilevel temporal alignment, not view invariant
Problem Statement and Approach
Development of efficient view invariant representation, indexing/retrieval, and classification techniques for motion based events
Null Space in a particular basis is invariant in the presence of arbitrary affine transformations.
Demonstration of enormous potential in computer vision, especially in motion event, activity recognition and retrieval.
Null Space Invariants
Let be a single 2-D point, i = 0,1,… ,N-1 . Motion trajectory can be represented in the n 2-D points in a matrix M:
null space H:
Where q is a n by 1 vector, H is a matrix spanned by the vector (linearly independent basis) with the size n by (n-3).
Null Space Invariants (NSI)
Typically, each element in H is given by:
Null Space based Classification/Retrieval Algorithm
Normalization the length of trajectories. Taking 2D FFT, selecting the N largest coefficients and then taking 2D
IFFT. Computation of NSI for the normalized raw data and
vectorizing the NSI. Once we obtain the n by n-3 NSI H, we convert H into the n(n-3) by 1 vector.
Applying Principal Component Null Space Analysis (PCNSA) on vectorized NSI.
There are various classification and retrieval algorithms we could apply on NSI.
Normalization Example to 25 samples
Details of PCNSA1. Obtain PCA Space: Evaluate total covariance matrix , then apply PCA to
the total covariance matrix to find W(pca), whose columns are the L leading eigenvectors.
2. Project the data vectors, class means and class covariance matrices into the corresponding data vectors, class means, and class covariance matrices in the PCA space.
3. Obtain ANS: Find the approximate null space , for each class i by choosing M(i) smallest eigenvalues’ corresponding eigenvectors.
Details of PCNSA
4. Obtain Valid Classification Directions in ANS: Say If any direction e(i) satisfies this direction is said
valid direction and used to build valid ANS, W(NSA, i).
5. Classification: PCNSA finds distances from a query trajectory X to all classes :
d(X, i)=||W(NSA, i) (X-m(i)||, where m(i) is the mean for each class. We choose the smallest distance to a class for classification of X.
6. Retrieval: We compute the distance of the querytrajectory Y to any other trajectory X(i) by d(X, i)=||W(NSA, i)(X(i)-Y||.
Classification Performance
We plot the classification accuracy verus the number of classes with 20 trajectories in each class (up to 40 classes).
Classification Performance
We plot the classification accuracy with the number of trajectories (up to 40 trajectories in each class)
Retrieval Performance
Recall
Precisio
n
To further demonstrate the view invariance nature of our system, we populate the CAVIAR dataset with 5 rotated versions for each trajectories in the class by rotating the trajectories with -60, -30, 0, 30, 60 degrees.
Apply PCNSA on NSI;
Directly using PCA on NSI
Visual illustration for retrieval results with 20 classes with motion trajectories from CAVIAR dataset for the motion events ”chase” and ”shopping and leave” for fixed cameras from unknown views. (query and top 2 retrieval)
Applications of NSI in image retrieval
Facial recognition
Extract SIFT
as feature points
The raw data matrix is not necessarily of the size 3 by n.
Image retrieval results
Perturbation Analysis
So the ratio of the output error (error on null space) and input error (error on the raw data) is:
Z the noise matrix on the raw data
SNR
The ratio of the energy of the signal for NSI and the energy of the noise on NSI.
Optimal Sampling
Given the perturbation, designing optimal sampling strategy .
Uniform sampling and Poisson sampling are utilized.
Arbitrary trajectories in x and y directions:
x=f(t), y=g(t)
Expanding the trajectory in Macluarin series.
Optimal Sampling
Property 2: The rate parameter = O(N) should be chosen for Poisson sampling to guarantee the convergence of the error ratio, where N is the total number of samples.
Optimal Sampling
Optimal Sampling
In our framework, the density corresponds to the average number of samples per unit-length; i.e.
Arbitrary Moving Cameras and Segmented NSIFixed cameras from unknown views (all the features
points undergo the same global affine transformations). Classification and retrieval problem is further compound
(the feature points can undergo different affine transformations).
Computing null space of segmented trajectories yields higher accuracy since the orientations and the translations for adjacent points are very close, therefore they have more similar null space representation locally.
Overlapping segmentation and non-overlapping segmentation. (Assumption)
Query Rank 1 Rank 2
global
Overlap
segment
by 5
Entering the shop
First
16 NSI
optimal samplingWithout Poisson sampling
With Poisson sampling
The same trajectory with different representation due to camera motions
The example of the trajectory ”all” and affine versions with and without Poisson sampling with lamda=0.8. NS representations with Poisson sampling (on the right) are more similar than the ones without sampling (on the left). Poisson sampling greatly attenuates the noisy effects.
Without sampling With Poisson sampling
Classification Accuracy
Retrieval Time
sec
Comparison
ASL dataset, 20 classes with 40 trajectories in each class
Tensor Null Space (TNSI)
Fundamental mathematical framework for tensor NSI
View-invariant classification and retrieval of multiple motion trajectories.
Definition of Tensor Null Space
Conditions for rotational invariance:
Applying affine transformation T (m) on the mth unfolding of the multi-dimensional data M, if the resulting tensor null space Q is invariant in the mth dimension, then it is referred to as mode-m invariant.
M(1), M(2), M(3) are unfolding of the three dimensional tensor from different dimensions. M(1): I1 by I2I3 M(2): I2 by I1I3 M(3): I3 by I1I2
Definition of Tensor Null Space
Conditions for translation invariance for tensor null space:
Due to the invariance of rotation,
Motion Event Tensor
We align each trajectory as two rows in a matrix according tox and y coordinates, and the number of rows of a matrix isset to be twice the number of the objects in the motion eventunder analysis
P: the length of normalized trajectories
J: Twice of the number of trajectories
K: Number of video samples
Simulation results for TNSI
The accuracy of proposed classification system versus number of classes. There are 20 tensors in each class. Simulation results show that our system preserves its efficiency even for higher number of different classes (J (three trajectory in each clip)=3, P (length of trajectories)=18, K (video clips)=20). (unfolding in K )
Accuracy values versus increase in the number of tensors within a class. There are 20 classes in the system.
Localized Null Space
Consider the view invariant video classification and retrieval. partial queries dynamical video database.
Efficient updating and downdating procedures for the representation of dynamic video databases.
Localized Null Space is one of the ways to solve the problem.
47
Localized Null Space (LNS)Localized Null Space (LNS)
Localized Null Space relies on different key points in different segments.
48
Localized Null SpaceLocalized Null Space
Proposed Localized Null Space
Zero elements
Zero elements
N-3
N
Traditional Null Space
49
Structure of Localized Null SpaceStructure of Localized Null Space
Illustration of the structure of the traditional Null Space and the proposed Localized Null Space.
Zero elements
Zero elements
3 Non-Zero elements
N-3
Zero elements
Zero elements
K-3Non-Zero elements for W1
K
N-K-3Non-Zero elements for W2
3
Zero elements
Zero elements
N-K
Splitting of Raw Data SpaceSplitting of Raw Data Space
51
Splitting of Raw Data SpaceSplitting of Raw Data Space
Deterministic splitting The length of the feature vector and the key points
are known to the users. LNS provides perfect solution
Random splitting The length of the feature vector and the key points
are not available to the users. Splitting and key points must be estimated.
52
Optimal SplittingOptimal Splitting
where D is the distortion for random splitting given by
and P(L) is the distribution of the segment with length L, and K is the optimal segmentation length. Solving the minimization problem, we obtain
53
Optimal Key Points Selection within Optimal Key Points Selection within Each segmentEach segment
where C is the probability that all the key points are in the range .
54
Benefits of LNSBenefits of LNS
The localized null space can be viewed as consisting multiple subspaces and therefore can be dynamically split for retrieval of partial queries.
Localized Null Space can be used to merge multiple Null Spaces into an integrated Null Space.
Localized Null Space has the same complexity as the traditional null space.
55
Visual illustration of the facial image B and part of rotate image A with identical localized null space representations.
LNS ExampleLNS Example
Non-linear Kernel Space Invariants(NKSI)
Invariance to non-linear transformationRelying on Taylor expansions to
approximate the non-linear transformations with linear transformations
Application: Standard Perspective Transformation
Non-linear Kernel Space Invariants(NKSI)
When k=2
Non-linear Kernel Space Invariants(NKSI)
Standard Perspective Transformation
Unequal multiple trajectory representation
Bilinear Invariants
AXB=0, where A and B are raw data matrices, X is the invariant basis.
When A and B are subject to different linear transformations from the left and right side respectively, X is invariant.
Bilinear Invariants
Retrieval of unequal multiple trajectories
Conclusion
Null Space an effective and robust tool for classification and
retrieval of motion events
segmentation of null space can further improve the performance for arbitrary moving cameras
Tensor Null Space
high order data
Conclusion
Localized Null Space Dynamic updating of the database
Partial Query, Splitting and Merging of Null Space
Non-linear Kernel Space Invariants Invariance to non-linear transformation
Bilinear Invariants suitable for different lengths of features, different dimensions of raw
data.
Publications
Journal Papers: 1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and
trajectory estimation of mobile object using minimum samples,'' IEEE Transactions on Vehicular Technology (TVT), volume 8, issue 9, 2009, pp 4439-4446.
2. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part I: View Invariant Motion Trajectory Analysis and Image Classification and Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), revised, 2009.
3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, “Null Space Invariants: Part II: Localized Null Space Representation for Dynamic Image and Video Databases," IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), submitted, 2009.
Publications
Conference Papers:1. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localization and trajectory
estimation of mobile object with a single sensor,'' IEEE Statistical Singal Processing Workshop (SSP'07), Madison, Wisconsin, 2007.
2. Eser Ustunel, Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'08), Las Vegas, Nevada, 2008.
3. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust closed-form localization of mobile targets using a single sensor based on a non-linear measurement model,'' IEEE International Workshop Singal Processing Advances in Wireless Communications (SPAWC'08), Recife, Pernambuco, Brazil, 2008.
4. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust null space representation and sampling for view invariant motion trajectory analysis,'' IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'08), Anchorage, Alaska, 2008.
5. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Robust multi-dimensional null space representation for image retrieval and classification,'' IEEE Conference on Image Processing (ICIP'08), San Diego, California, 2008.
6. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, "View-Invariant Tensor Null Space Representation For Multiple Motion Trajectory Retrieval and Classification," IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'09) (Invited Paper), Taibei, Taiwan, 2009.
Publications
Conference Papers:7. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Localized Null-Space
representation for Dynamic Updating and Downdating in Image and Video Databases,'' IEEE International Conference on Image Processing (ICIP'09), Cairo, Egypt, 2009.
8. Xu Chen, Dan Schonfeld and Ashfaq Khokhar, ''Null space representation for view-invariant motion trajectory classification-recognition and indexing-retrieval,''IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP‘10), Dallas, Texas, 2010.
Thanks !
Questions ?