Shifted Subspaces Tracking on Sparse Outlier for Motion … · 2014-09-30 · Shifted Subspaces...

7
Shifted Subspaces Tracking on Sparse Outlier for Motion Segmentation Tianyi Zhou and Dacheng Tao Centre for Quantum Computation & Intelligent Systems University of Technology Sydney, NSW 2007, Australia [email protected]; [email protected] Abstract In low-rank & sparse matrix decomposition, the en- tries of the sparse part are often assumed to be i.i.d. sampled from a random distribution. But the struc- ture of sparse part, as the central interest of many problems, has been rarely studied. One motivat- ing problem is tracking multiple sparse object flows (motions) in video. We introduce “shifted sub- spaces tracking (SST)” to segment the motions and recover their trajectories by exploring the low-rank property of background and the shifted subspace property of each motion. SST is composed of two steps, background modeling and flow tracking. In step 1, we propose “semi-soft GoDec” to separate all the motions from the low-rank background L as a sparse outlier S. Its soft-thresholding in updat- ing S significantly speeds up GoDec and facilitates the parameter tuning. In step 2, we update X as S obtained in step 1 and develop “SST algorithm” further decomposing X as X = k i=1 L(i) τ (i)+ S +G, wherein L(i) is a low-rank matrix storing the i th flow after transformation τ (i). SST algorithm solves k sub-problems in sequel by alternating min- imization, each of which recovers one L(i) and its τ (i) by randomized method. Sparsity of L(i) and between-frame affinity are leveraged to save com- putations. We justify the effectiveness of SST on surveillance video sequences. 1 Introduction In video sequences, an object flow is composed of multiple motions or moving objects with the identical trajectory. An- alyzing object flows is more frequently preferred than ana- lyzing the motion of a single object, because the flows can provide more semantic clues for the crowd behavior [Ali and Shah, 2007]. Tracking multiple object flows and motion seg- mentation [Wu et al., 2011][Galasso et al., 2012] in complex scenes is a vital and challenging problem in a variety of com- puter vision tasks such as surveillance, robotics, augmented reality, medical imaging and human-computer interactions. The challenges mainly come from the complex background, occlusion, illumination variation, noise, overlapping, and in- tertwined trajectories of different flows [Fragkiadaki and Shi, 2011]. Some of these problems have been well studied in the previous research works, but some are typical features of the flow tracking or motion segmentation and thus have not been fully tackled before. In this paper, we construct a model that considers all the miscellaneous problems above, reduce the motion detection, tracking and segmentation to a simple ma- trix factorization, and then develop an efficient optimization algorithm to achieve the factorization result. 1.1 Related Work To begin with, we can roughly categorize existing tracking approaches into two groups, i.e., generative model [Doucet et al., 2001][Comaniciu et al., 2003] and discriminative method [Hess and Fern, 2009][Hong et al., 2012]. Generative model formulates tracking as estimation of the state within a time series space state space model, and search the regions of the highest likelihood. Early works such as Kalman filter and its variants have been demonstrated to be optimal for linear Gaussian model. Another representative generative model for tracking is particle filter [Doucet et al., 2001], which approx- imates the posterior distribution of the state space by Monte Carlo integration. Single appearance model or multiple ap- pearance models [Yu et al., 2008] are used in these genera- tive models. Different from generative models, discriminative classifies cast the tracking problem into a classification task whose goal is to distinguish the target object from the back- ground. In the training stage, each pixel is represented by a feature vector, and the pixels belonging to the same target object is assigned to the same class. Hybrid approaches [Yu et al., 2008] combining both generative models and discrim- inative classifiers have also been proposed for solving visual tracking under significant occlusions. Recent advances in matrix completion [Cand` es and Recht, 2008][Ji and Ye, 2009] and robust principle component anal- ysis (RPCA) [Chandrasekaran et al., 2009][Cand` es et al., 2009][Chen et al., 2010][F. Nie, 2011] lead us to a new per- spective for analyzing large-scale video data, which is explor- ing the inherent low-rank structures of background and mo- tions. As an extension of matrix completion who recovers a low-rank matrix from a small portion of its entries, RPCA ex- actly recovers a low-rank matrix L from collected data matrix X = L + S, where S is sparse outlier of large magnitude on random support. RPCA decomposes X by minimizing the convex surrogates of L’s rank and S’s cardinality, i.e., the Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence 1946

Transcript of Shifted Subspaces Tracking on Sparse Outlier for Motion … · 2014-09-30 · Shifted Subspaces...

Shifted Subspaces Tracking on Sparse Outlier for Motion Segmentation

Tianyi Zhou and Dacheng TaoCentre for Quantum Computation & Intelligent SystemsUniversity of Technology Sydney, NSW 2007, [email protected]; [email protected]

AbstractIn low-rank & sparse matrix decomposition, the en-tries of the sparse part are often assumed to be i.i.d.sampled from a random distribution. But the struc-ture of sparse part, as the central interest of manyproblems, has been rarely studied. One motivat-ing problem is tracking multiple sparse object flows(motions) in video. We introduce “shifted sub-spaces tracking (SST)” to segment the motions andrecover their trajectories by exploring the low-rankproperty of background and the shifted subspaceproperty of each motion. SST is composed of twosteps, background modeling and flow tracking. Instep 1, we propose “semi-soft GoDec” to separateall the motions from the low-rank background L asa sparse outlier S. Its soft-thresholding in updat-ing S significantly speeds up GoDec and facilitatesthe parameter tuning. In step 2, we update X asS obtained in step 1 and develop “SST algorithm”further decomposingX asX =

∑ki=1 L(i)◦τ(i)+

S+G, whereinL(i) is a low-rank matrix storing theith flow after transformation τ(i). SST algorithmsolves k sub-problems in sequel by alternating min-imization, each of which recovers one L(i) and itsτ(i) by randomized method. Sparsity of L(i) andbetween-frame affinity are leveraged to save com-putations. We justify the effectiveness of SST onsurveillance video sequences.

1 IntroductionIn video sequences, an object flow is composed of multiplemotions or moving objects with the identical trajectory. An-alyzing object flows is more frequently preferred than ana-lyzing the motion of a single object, because the flows canprovide more semantic clues for the crowd behavior [Ali andShah, 2007]. Tracking multiple object flows and motion seg-mentation [Wu et al., 2011][Galasso et al., 2012] in complexscenes is a vital and challenging problem in a variety of com-puter vision tasks such as surveillance, robotics, augmentedreality, medical imaging and human-computer interactions.The challenges mainly come from the complex background,occlusion, illumination variation, noise, overlapping, and in-tertwined trajectories of different flows [Fragkiadaki and Shi,

2011]. Some of these problems have been well studied in theprevious research works, but some are typical features of theflow tracking or motion segmentation and thus have not beenfully tackled before. In this paper, we construct a model thatconsiders all the miscellaneous problems above, reduce themotion detection, tracking and segmentation to a simple ma-trix factorization, and then develop an efficient optimizationalgorithm to achieve the factorization result.

1.1 Related WorkTo begin with, we can roughly categorize existing trackingapproaches into two groups, i.e., generative model [Doucet etal., 2001][Comaniciu et al., 2003] and discriminative method[Hess and Fern, 2009][Hong et al., 2012]. Generative modelformulates tracking as estimation of the state within a timeseries space state space model, and search the regions of thehighest likelihood. Early works such as Kalman filter andits variants have been demonstrated to be optimal for linearGaussian model. Another representative generative model fortracking is particle filter [Doucet et al., 2001], which approx-imates the posterior distribution of the state space by MonteCarlo integration. Single appearance model or multiple ap-pearance models [Yu et al., 2008] are used in these genera-tive models. Different from generative models, discriminativeclassifies cast the tracking problem into a classification taskwhose goal is to distinguish the target object from the back-ground. In the training stage, each pixel is represented bya feature vector, and the pixels belonging to the same targetobject is assigned to the same class. Hybrid approaches [Yuet al., 2008] combining both generative models and discrim-inative classifiers have also been proposed for solving visualtracking under significant occlusions.

Recent advances in matrix completion [Candes and Recht,2008][Ji and Ye, 2009] and robust principle component anal-ysis (RPCA) [Chandrasekaran et al., 2009][Candes et al.,2009][Chen et al., 2010][F. Nie, 2011] lead us to a new per-spective for analyzing large-scale video data, which is explor-ing the inherent low-rank structures of background and mo-tions. As an extension of matrix completion who recovers alow-rank matrix from a small portion of its entries, RPCA ex-actly recovers a low-rank matrix L from collected data matrixX = L+ S, where S is sparse outlier of large magnitude onrandom support. RPCA decomposes X by minimizing theconvex surrogates of L’s rank and S’s cardinality, i.e., the

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence

1946

trace norm of L and the `1 norm of S,

minL,S

‖L‖∗ + λ‖S‖1s.t. X = L+ S.

(1)

Above convex optimization ensures exact or adequately pre-cise recovery of L and S under mild identifiability conditionssuch as rank-sparsity coherence, on the premise that exact de-composition X = L + S does exist. However, this premisecannot be always fulfilled for data collected from real appli-cations. Moreover, the rank and cardinality cannot be fixedwithin a small range while the error is still guaranteed to besmall. This uncertainty brings unpredictable computationalcosts in optimization.

RPCA has been successfully applied to background mod-eling in video surveillance [Candes et al., 2009], where thebackgrounds in all frames compose the rows of low-rank ma-trix L, whilst the moving objects or motions are captured bythe sparse outlier S. However, most existing methods simplytreat S as random noise and lack further study of the obtainedsparse outlier, which in fact contains substantial rich infor-mation about the motions of object flows. In addition, theextra dense noise caused by camera lens or environment illu-mination changes make the exact decomposition assumptionX = L + S cannot be satisfied in practices. Hence severalrecent approaches [Zhou et al., 2010][Hsu et al., 2011][Zhouand Tao, 2013] aim to obtain the approximated RPCA decom-position X = L+ S +G, where G is the dense noise. Sincemost RPCA algorithms rely on repeating time consuming sin-gular value thresholding, randomized method [Zhou and Tao,2012] was introduced for acceleration.

In order to overcome the shortcomings of existing RPCAapproaches, a novel method GoDec [Zhou and Tao, 2011]is proposed. GoDec imposes hard constraints to the rank ofL and the cardinality of S [Xiong et al., 2010], and addi-tionally accelerates the noisy decomposition with the help ofbilateral random projections (BRP) based low-rank approxi-mation [Zhou and Tao, 2012] and controllable rank of L.

minL,S

‖X − L− S‖2Fs.t. rank(L) ≤ r, card(S) ≤ k.

(2)

Although linear convergence of GoDec can be proved by theframework “alternating projections on two manifolds” [Lewisand Malick, 2008], the upper bound for cardinality of S hasto be carefully chosen in GoDec. Otherwise some portionof S’s entries will be contaminated or lost. Moreover, thehard thresholding towards S requires sorting all its entries’magnitudes and thus is time consuming.

1.2 Overview of SSTA significant open problem left by the mentioned RPCA ap-proaches is how to further analyze the rich structure of thesparse part. In many practical applications, the sparse part hasstructures that can contribute more useful information thanthe low-rank part. An example in point is that the sparse partof video sequence data is comprised of the motions of multi-ple object flows. In this paper, we consider decomposing thesparse part as the sum of several shifted low-rank matrices,

each of which corresponds to objects sharing one motion tra-jectory. This can be viewed as a novel structured sparsity. Itis also worthwhile to point out that although the main focusof this paper is object flow tracking and motion segmentation,the SST framework allows other forms of nonlinear transfor-mation and thus can be directly applied to other problems.

We develop an efficient unsupervised framework to detect,track and segment multiple motions in complex scenes viasolving a matrix factorization model. This framework in-vokes a sequence of matrix decompositions as subroutines,which can be summarized in two steps, i.e., backgroundmodeling and flow tracking. For the first step, we proposes“semi-soft GoDec” which replaces the cardinality constraintin GoDec with an `1 penalty. This small change greatly short-ens the computational time and facilitates the parameter tun-ing. For the second step, as the main contribution of thispaper, we present a novel insight that each flow can be de-picted by a low rank matrix after certain geometric transfor-mation sequence to video frames (which are stored as rowsof a matrix), and develop a matrix factorization method torecover both the low-rank matrices and the transformationsequences. If we treat the sparse outliers attained by semi-soft GoDec as the new data matrix X , our flow tracking ap-proach “shifted subspaces tracking (SST)” decomposes X asX =

∑ki=1 L(i)◦ τ(i)+S+G, where L(i) stands for a low-

rank matrix and τ(i) stands for a transformation sequence,both correspond to the motion of the ith object flow. SSTreduces the matrix factorization to a sequence of alternatingoptimizations in a similar manner with semi-soft GoDec, andefficiently solves them by taking advantages of the between-frame affinity and the motion sparsity of L(i). Besides, BRP[Zhou and Tao, 2012] is invoked to speed up the update ofL(i). The low-rank patterns, together with their transforma-tion sequences, reveals unexplored structure of the sparse partand rich information of segmented motion in complex scenes.

2 Problem SetupWe consider the problem of tracking object flows and seg-menting their motions from the raw video data. Given a datamatrix X ∈ Rn×p that stores a video sequence of n frames,each of which has w × h = p pixels and reshaped as a rowvector inX , the goal of SST framework is to separate the mo-tions of the object flows, recover both their low-rank patternsand geometric transformation sequences. This task is decom-posed as two steps in SST, i.e., background modeling thatseparating all the moving objects from the static background,and flow tracking that recovers the information of each mo-tion. In this paper, ·i stands for the ith entry of a vector or theith row of a matrix, while ·i,j signifies the entry at the ith rowand the jth column of a matrix.

2.1 Background modelingAlthough previous RPCA approaches provides several effec-tive matrix decomposition formulations for background mod-eling, some problems still arise in real applications. In SST,we consider the low-rank and sparse matrix decomposition innoisy case, which has been adopted by several recent models

1947

due to its robustness and adaptiveness to the real data,

X = L+ S +G, rank(L) ≤ r, card(S) ≤ s, (3)

whereL describes the background and S denotes the motions.In SST, a new formulation for background modeling is

adopted to overcome the shortcomings of both RPCA andGoDec,

minL,S

‖X − L− S‖2F + λ‖S‖1s.t. rank (L) ≤ r.

(4)

Compared with RPCA, (4) does not require the exact decom-position X = L + S. Furthermore, the rank of L is control-lable and the update of L can be randomized as in GoDec,thus the time cost can be largely decreased. Compared withGoDec, (4) replaces the “hard” cardinality constraint with a“soft” `1 regularization. Tuning soft threshold λ is much eas-ier than determining s in (3), because the resulting decompo-sition error is more robust to the change of λ. Later, we willdevelop “semi-soft GoDec” to solve the optimization (4).

2.2 Flow trackingAfter obtaining the sparse outliers storing motions by back-ground modeling, SST treats the sparse matrix S as the newdata matrixX , and decomposes it asX =

∑ki=1 L(i)+S+G,

wherein L(i) denotes the ith object flow, S stands for thesparse outliers and G stands for the Gaussian noise.

The matrix decomposition model for flow tracking in SSTis based on an observation to the implicit structures of thesparse matrix L(i). If the trajectory of the object flow L(i) isknown and each frame (row) in L(i) is shifted to the positionof a reference frame, due to the limited number of poses forthe same object flow in different frames, it is reasonable toassume that the rows of the shifted L(i) exist in a subspace.In other words, L(i) after inverse geometric transformationis low-rank. Hence the sparse motion matrix L(i) has thefollowing structured representation

L(i) =

L(i)1 ◦ τ(i)1...

L(i)n ◦ τ(i)n

= L(i) ◦ τ(i). (5)

The invertible transformation τ(i)j : R2 → R2 denotes the2-D geometric transformation (to the reference frame) asso-ciated with the ith object flow in the jth frame, which is rep-resented by L(i)j . To be specific, the jth row in L(i) is L(i)jafter certain permutation of its entries. The permutation re-sults from applying the nonlinear transformation τ(i)j to eachnonzero pixel in L(i)j such that,

τ(i)j(x, y) = (u, v), (6)

where τ(i)j could be one of the five geometric transforma-tions [Prince, 2011], i.e., translation, Euclidean, similarity,affine and homography, which are able to be represented by2, 3, 4, 6 and 9 free parameters, respectively. For example,affine transformation is defined as[

uv

]=

[ρ cos θ ρ sin θ−ρ sin θ ρ cos θ

] [xy

]+

[txty

], (7)

wherein θ is the rotation angle, tx and ty are the two trans-lations and ρ is the scaling ratio. It is worth to point out thatτ(i)j can be any other transformation beyond the geometricgroup. So SST can be applied to sparse structure in other ap-plications if parametric form of τ(i)j is known. We definethe nonlinear operator ◦ as

L(i)j,u+(v−1)h = (L(i)j ◦ τ(i)j)u+(v−1)h

= L(i)j,x+(y−1)h. (8)

Therefore, the flow tracking in SST aims at decomposing thesparse matrix X (S obtained in the background modeling) as

X =k∑i=1

L(i) ◦ τ(i) + S +G,

rank (L(i)) ≤ ri, card(S) ≤ s.(9)

In SST, we iteratively invoke k times of the following matrixdecomposition to greedily construct the decomposition in (9):

X = L ◦ τ + S +G, rank (L) ≤ r, card(S) ≤ s. (10)

In each time of the matrix decomposition above, the data ma-trix X is S obtained by former decomposition. In order tosave the computation and facilitate the parameter tuning, wecast the decomposition (10) into an optimization similar to(4),

minL,τ,S

‖X − L ◦ τ − S‖2F + λ‖S‖1s.t. rank (L) ≤ r,

(11)

To summarize, the optimization scheme of SST frameworkis given in Algorithm 1.

Algorithm 1 SST frameworkInput: X , r, λ, ri, λi(i = 1, · · · , k),Output: L, L(i), τ(i)(i = 1, · · · , k), S(L?, S?) = arg min

rank(L)≤r,S‖X − L− S‖2F + λ‖S‖1.

X := S?, L := L?.for i = 1→ k do

(L?, τ?, S?) =arg min

rank(L)≤ri,τ,S‖X − L ◦ τ − S‖2F + λi‖S‖1.

X := S?, L(i) := L?, τ(i) = τ?;end forS := X .

3 SST frameworkIn this section, we develop efficient algorithms to solve thetwo types of minimization in Algorithm 1, both of which canbe addressed by alternating minimization. In our algorithm,we speed up the update of low-rank part L in (4) and the low-rank motion patterns L(i) in (11) by bilateral random pro-jection based low-rank approximation [Zhou and Tao, 2012].In ST algorithm for flow tracking and motion segmentation,piece-wise linear approximation method is used to approx-imate the nonlinear operator ◦ when updating τ . Both thesparsity of L(i) and the between-frame affinity are leveragedto save computations. We also establish a rule to update thereference frame, which decides the uniqueness of τ(i) andavoids missing any object flow.

1948

3.1 Semi-Soft GoDec for Background ModelingWe propose semi-soft GoDec for optimization (4), which canbe solved by alternatively solving the following two subprob-lems until convergence: Lt = arg min

rank(L)≤r‖X − L− St−1‖2F

St = arg minS‖X − Lt − S‖2F + λ‖S‖1

(12)

The subproblems have global solutions Lt and St that can beobtained via closed-forms.

In particular, the two subproblems in (12) can be solved byupdating Lt via singular value hard thresholding ofX−St−1and updating St via soft thresholding ofX−Lt, respectively. Lt =

r∑i=1

λiUiVTi , svd

(X − St−1

)= UΛV T

St = Pλ (X − Lt) ,Pλ(x) = sign(x) max (|x| − λ, 0)

According to the bilateral random projection (BRP) basedlow-rank approximation [Zhou and Tao, 2012], the costlytruncated SVDs in updating Lt can be approximated by

BRP(Lt) = Y1(AT2 Y1

)−1Y T2 (13)

or its power scheme modification, wherein Y1 ∈ Rn×r andY2 ∈ Rp×r are the left and right random projections and A2

is the left random projection matrix. The obtained semi-softGoDec algorithm is given in Algorithm 2.

Algorithm 2 Semi-soft GoDecInput: X , r, λ, ε, qOutput: L, SInitialize: L0 := X , S0 := 0, t := 0while ‖X − Lt − St‖2F /‖X‖2F > ε dot := t+ 1;L =

[(X − St−1

) (X − St−1

)T ]q (X − St−1

);

Y1 = LA1, A2 = Y1;Y2 = LTY1 = Q2R2, Y1 = LY2 = Q1R1;If rank

(AT2 Y1

)< r then r := rank

(AT2 Y1

), go to the

first step; end;

Lt = Q1

[R1

(AT2 Y1

)−1RT2

]1/(2q+1)

QT2 ;

St = Pλ (X − Lt),where Pλ(x) = sign(x) max (|x| − λ, 0);

end while

Here q is the power parameter. When q > 0, the algorithmadopts a power-scheme modification of BRP for improvingapproximation accuracy [Zhou and Tao, 2011]. When q = 0,for denseX , (13) is applied. In the latter case, the QR decom-position of Y1 and Y2 in Algorithm 2 are not performed, andLt is updated as Lt = Y1

(AT2 Y1

)−1Y T2 . Actually, q = 0 is

sufficient to produce satisfying accuracy in most visual appli-cations. The key difference of Semi-soft GoDec comparingto the ordinary GoDec is the soft-thresholding of S, whichrequires merely np subtractions, while the hard-thresholdingof the largest entries in ordinary GoDec needs sorting of np

values. This improvement further speedup the computationof background modeling. The linear convergence of L stillholds and can be proved by following the same procedure in[Zhou and Tao, 2011].

3.2 SST algorithm for Object Flow TrackingFlow tracking in SST solves a sequence of optimization prob-lem of type (11). Thus we firstly apply alternating minimiza-tion to (11). This results in iterative update of the solutions tothe following three subproblems,

τ t = arg minτ‖X − Lt−1 ◦ τ − St−1‖2F ;

Lt = arg minrank(L)≤r

‖X − L ◦ τ t − St−1‖2F ;

St = arg minS‖X − Lt ◦ τ t − S‖2F + λ‖S‖1.

(14)

InitializationUnfortunately, alternating solving the 3 sub-problems mightnot guarantee to produce a global solution or even a stableone, unless an appropriate initialization is adopted. In partic-ular, in the case when the solutions to L ◦ τ and S in (11) areunique, the pair (L, τ) may not be unique. This is becausethat we can choose arbitrary frame as the reference frame andtransform the object flow in all the other frames of L ◦ τ totheir positions in the reference frame, while the low-rankL◦τkeeps the same. To avoid such trivial multiple solutions of τ ,we pre-define a template frame s and do not update its trans-formation τs during the tracking (w.l.o.g., we fix all the pa-rameters of τs to be zeros). Then the object flow in each otherframe of L ◦ τ are transformed to the position of the objectflow in frame s via the inverse transform of τ , and thus theuniqueness of L and τ can be guaranteed. In this paper, wechoose the template frame s as the one with the largest cardi-nality, which implies that almost all the objects are includedin the frame,

s = arg maxi

card (Xi) . (15)

We then set the rows of the low-rank pattern L as the dupli-cates ofXs, and initialize both the entries of the sparse outlierS as well as the parameters of τ to be zeros,

L = [Xs; · · · ;Xs] , S = 0, τ =−→0 . (16)

We now start to solve the three subproblems in (14).

Update of τThe first subproblem aims at solving the following series ofnonlinear equations of τj ,

Lt−1j ◦ τj = Xj − St−1j , j = 1, · · · , n. (17)

Albeit directly solving the above equation is difficult due toits strong nonlinearity, we can approximate the geometrictransformation Lt−1j ◦ τj by using piece-wise linear transfor-mations, where each piece corresponds to a small change ofτj defined by ∆τj . Thus the solution of (17) can be approx-imated by accumulating a series of ∆τj . This can be viewedas an inner loop included in the update of τ . Thus we havelinear approximation

Lt−1j ◦ (τj + ∆τj) ≈ Lt−1j ◦ τj + ∆τjJj , (18)

1949

where Jj is the Jacobian of Lt−1j ◦τj with respect to the trans-formation parameters in τj . Therefore, by substituting (18)into (17), ∆τj in each linear piece can be solved as

∆τj =(Xj − St−1j − Lt−1j ◦ τj

)(Jj)

†. (19)

The update of τj starts from some initial τj , and iterativelysolves the overdetermined linear equation (19) with updateτj := τj + ∆τj until the difference between the left handside and the right hand side of (17) is sufficiently small. Itis critical to emphasize that a well selected initial value ofτj can significantly save computational time. Based on thebetween-frame affinity, we initialize τj by the transformationof its adjacent frame that is closer to the template frame s,

τj :=

{τj+1, j < s;τj−1, j > s. (20)

Another important support set constraint, supp(L ◦ τ) ⊆supp(X), needs to be considered in calculating Lt−1j ◦ τjduring the update of τ . This constraint ensures that the objectflows or segmented motions obtained by SST always belongto the sparse part achieved from the background modeling,and thus rules out the noise in background. Hence, supposethe complement set of supp(Xj) to be suppc(Xj), each cal-culation of Lt−1j ◦ τj follows a screening such that,(

Lt−1j ◦ τj)suppc(Xj)

=−→0 . (21)

Update of LThe second subproblem has the following global solution thatcan be updated by BRP based low-rank approximation (13)and its power scheme modification,

Lt =r∑i=1

λiUiVTi , svd

((X − St−1

)◦ τ−1

)= UΛV T ,

(22)wherein τ−1 denotes the inverse transformation towards τ .The SVDs can be accelerated by BRP based low-rank approx-imation (4). Another acceleration trick is based on the factthat most columns of

(X − St−1

)◦ τ−1 are nearly all-zeros.

This is because the object flow or motion after transformationoccupies a very small area of the whole frame. Therefore,The update of Lt can be reduced to low-rank approximationof a submatrix of

(X − St−1

)◦ τ−1 that only includes dense

columns. Since the number of dense columns is far less thanp, the update of Lt can become much faster.

Update of SThe third subproblem has a global solution that can be ob-tained via soft-thresholding Pλ(·) similar to the update of Sin semi-soft GoDec,

St = Pλ(X − Lt ◦ τ t

). (23)

A support set constraint supp(S) ⊆ supp(X) should be con-sidered in the update of S as well. Hence the above updatefollows a postprocessing,

Stj,suppc(Xj)=−→0 , j = 1, · · · , n. (24)

Note the transformation computation ◦ in the update can beaccelerated by leveraging the sparsity of the motions. Specif-ically, the sparsity allows SST to only compute the trans-formed positions of the nonzero pixels. We summarize theSST algorithm in Algorithm 3.

Algorithm 3 SST AlgorithmInput: X , ri, λi(i = 1, · · · , n), kOutput: Li(i = 1, · · · , n), Sfor i = 1→ k do

Initialize: s = arg maxi

card (Xi),

L = [Xs; · · · ;Xs], S = 0, τ =−→0

while not converge dofor j = s− 1 : −1 : 1 doτj := τj+1.while not converge doLt−1j = Lt−1j ◦ τj , Lt−1j,suppc(Xj)

=−→0 .

τj := τj +(Xj − St−1j − Lt−1j

)(Jj)

†.end while

end forfor j = s+ 1 : 1 : n doτj := τj−1.while not converge doLt−1j = Lt−1j ◦ τj , Lt−1j,suppc(Xj)

=−→0 .

τj := τj +(Xj − St−1j − Lt−1j

)(Jj)

†.end while

end forτ t = τ .Lt = BRP

((X − St−1

)◦ τ−1

).

St = Pλ (X − Lt ◦ τ t) , Stj,suppc(Xj)=−→0 .

end whileX := St, L(i) := Lt, τ(i) = τ t.

end for

4 Experiments on Surveillance VideosThis section justifies both the effectiveness and the efficiencyof the SST framework, which includes semi-soft GoDec andSST algorithm as its two steps, via tracking object flows infour surveillance video sequences 1. We run all the exper-iments by MATLAB on a server with dual quad-core 3.33GHz Intel Xeon processors and 32 GB RAM. In the experi-ments, the type of geometric transformation τ is simply se-lected as translation. The detection, tracking and segmen-tation results as well as associated time costs are shown inFigure 1 and Figure 2.

The results show SST can successfully separate the mov-ing objects from the background via semi-soft GoDec, andpromisingly recover both the low-rank patterns and the as-sociated geometric transformations for motions of multipleobject flows. The detection, tracking and segmentation areseamlessly unified in a matrix factorization framework andachieved with high accuracy. Moreover, it also verifies thatSST performs significantly robust on complicated motionsin complex scenes. This is attributed to their distinguishingshifted low-rank patterns, because different object flows canhardly share a subspace after the same geometric transforma-tion. Since SST show stable and appealing performance inmotion detection, tracking and segmentation for either crowd

1http://perception.i2r.a-star.edu.sg/bk model/bk index.html

1950

X L L(1) L(2)

2.85s 46.32s 41.08s

Figure 1: Background modeling and object flow tracking re-sults of a 50-frame surveillance video sequence from Halldataset with resolution 144× 176.

or individual, it provides a more semantic and intelligent anal-ysis to the video content than existing methods.

Table 1: CPU seconds of GoDec and semi-soft GoDec (SS-GoDec) on 200-frame videos.

Pixels 25344 20480 19200 81920

GoDec 47.38 39.75 36.84 203.72SSGoDec 8.75 6.52 6.23 24.15

Besides the precise video content analysis, another advan-tage of SST is its fast speed. In Table 1, we compare semi-softGoDec with original GoDec on the 4 video sequences used in[Zhou and Tao, 2011] for background modeling task. Semi-soft GoDec can processes a 200 frame video with 144× 176resolution in 9 seconds, which is substantially faster than allthe published results of RPCA approaches. In Table 2, we listthe corresponding choices of cardinality k in GoDec and soft-threshold λ in semi-soft GoDec. Their resulting relative er-rors for reconstruction are both within [10−3, 10−2]. It showsthat for GoDec we should carefully select k, otherwise thenoise will leak into the sparse part or some meaningful sparseoutlier will be lost. But we can easily tune λ in semi-soft

X L L(1) L(2)

16.52s 74.14s 79.07s

Figure 2: Background modeling and object flow tracking re-sults of a 50-frame surveillance video sequence from Shop-pingmall dataset with resolution 256× 320.

GoDec (we fix it to 8 in all experiments). For flow tracking,we speed up SST algorithm by leveraging the motion spar-sity, between-frame affinity, and randomized approximation.Therefore, SST is very competitive in big data problems.

Table 2: Sparsity parameters of GoDec and semi-soft GoDec(SSGoDec) on 200-frame videos.

Pixels 25344 20480 19200 81920

GoDec (k) 310000 65000 540000 1800000SSGoDec (λ) 8 8 8 8

AcknowledgementsWe would like to thank all the anonymous reviewers for theirconstructive comments on improving this paper. This work issupported by Australian Research Council Discovery Projectwith number ARC DP-120103730.

References[Ali and Shah, 2007] S. Ali and M. Shah. A lagrangian par-

ticle dynamics approach for crowd flow segmentation and

1951

stability analysis. In IEEE Conference on Computer Visionand Pattern Recognition, 2007.

[Candes and Recht, 2008] Emmanuel J. Candes and Ben-jamin Recht. Exact matrix completion via convex op-timization. Foundations of Computational Mathematics,9:717–772, 2008.

[Candes et al., 2009] E. J. Candes, X. Li, Y. Ma, andJ. Wright. Robust principal component analysis? Jour-nal of the ACM, 2009.

[Chandrasekaran et al., 2009] V. Chandrasekaran, S. Sang-havi, S. A. Parrilo, and A. S. Willsky. Rank-sparsity incoherence for matrix decomposition. arXiv:0906.2220, 2009.

[Chen et al., 2010] J. Chen, J. Liu, and J. Ye. Learning inco-herent sparse and low-rank patterns from multiple tasks. InACM SIGKDD International Conference On KnowledgeDiscovery and Data Mining, 2010.

[Comaniciu et al., 2003] D. Comaniciu, V. Ramesh, andP. Meer. Kernel-based object tracking. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 25(5),2003.

[Doucet et al., 2001] A. Doucet, N. de Freitas, and N. Gor-don. Sequential Monte Carlo Methods in Practice.Springer-Verlag, New York, 2001.

[F. Nie, 2011] C. Ding D. Luo H. Wang F. Nie, H. Huang.Robust principal component analysis with non-greedy l1-norm maximization. In International Joint Conference onArtificial Intelligence, 2011.

[Fragkiadaki and Shi, 2011] K. Fragkiadaki and J. Shi. De-tection free tracking: Exploiting motion and topology forsegmenting and tracking under entanglement. In IEEEConference on Computer Vision and Pattern Recognition,pages 2073–2080, 2011.

[Galasso et al., 2012] F. Galasso, R. Cipolla, and B. Schiele.Video segmentation with superpixels. In Asian Conferenceon Computer Vision, 2012.

[Hess and Fern, 2009] R. Hess and A. Fern. Discrimina-tively trained particle filters for complex multi-objecttracking. In IEEE Conference on Computer Vision andPattern Recognition, 2009.

[Hong et al., 2012] Z. Hong, X. Mei, and D. Tao. Dual-forcemetric learning for robust distracter-resistant tracker. InEuropean Conference on Computer Vision, 2012.

[Hsu et al., 2011] D. Hsu, S. Kakade, and T. Zhang. Ro-bust matrix decomposition with sparse corruptions. IEEETransactions on Information Theory, 2011.

[Ji and Ye, 2009] Shuiwang Ji and Jieping Ye. An accel-erated gradient method for trace norm minimization. InThe 26th International Conference on Machine Learning(ICML), pages 457–464, 2009.

[Lewis and Malick, 2008] A. S. Lewis and J. Malick. Alter-nating projections on manifolds. Mathematics of Opera-tions Research, 33(1):216–234, 2008.

[Prince, 2011] Simon J.D. Prince. Computer vision: mod-els, learning and inference. Cambridge University Press,2011.

[Wu et al., 2011] S. Wu, O. Oreifej, and M. Shah. Actionrecognition in videos acquired by a moving camera usingmotion decomposition of lagrangian particle trajectories.In International Conference on Computer Vision, pages1419–1426, 2011.

[Xiong et al., 2010] Liang Xiong, Xi Chen, and Jeff Schnei-der. Direct robust matrix factorization for anomaly detec-tion. In The 11th International Conference on Data Mining(ICDM), 2010.

[Yu et al., 2008] Q. Yu, T. B. Dinh, and G. Medioni. Onlinetracking and reacquistion using co-trained generative anddiscriminative trackers. In European Conference on Com-puter Vision, pages 678–691, 2008.

[Zhou and Tao, 2011] T. Zhou and D. Tao. Godec: Ran-domized low-rank & sparse matrix decomposition in noisycase. In International Conference on Machine Learning,2011.

[Zhou and Tao, 2012] T. Zhou and D. Tao. Bilateral randomprojections. In International Symposium on InformationTheory, 2012.

[Zhou and Tao, 2013] T. Zhou and D. Tao. Greedy bilateralsketch, completion & smoothing. In International Confer-ence on Artificial Intelligence and Statistics, 2013.

[Zhou et al., 2010] Z. Zhou, X. Li, J. Wright, E. J. Candes,and Y. Ma. Stable principal component pursuit. In IEEEInternational Symposium on Information Theory, 2010.

1952