N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today:...

14
N. Benjamin Erichson · Steven L. Brunton · J. Nathan Kutz Compressed Dynamic Mode Decomposition for Background Modeling Abstract We introduce the method of compressed dy- namic mode decomposition (cDMD) for background modeling. The dynamic mode decomposition (DMD) is a regression technique that integrates two of the leading data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed sensing and matrix sketching, cDMD eases the computational workload of high resolution video pro- cessing. The key principal of cDMD is to obtain the de- composition on a (small) compressed matrix representa- tion of the video feed. Hence, the cDMD algorithm scales with the intrinsic rank of the matrix, rather then the size of the actual video (data) matrix. Selection of the opti- mal modes characterizing the background is formulated as a sparsity-constrained sparse coding problem. Our re- sults show, that the quality of the resulting background model is competitive, quantified by the F-measure, Re- call and Precision. A GPU (graphics processing unit) ac- celerated implementation is also presented which further boosts the computational efficiency of the algorithm. Keywords dynamic mode decomposition; background modeling; matrix sketching; sparse coding; GPU- accelerated computing. N. Benjamin Erichson School of Mathematics and Statistics University of St Andrews St Andrews, United Kingdom E-mail: [email protected] Steven L. Brunton Department of Mechanical Engineering University of Washington Seattle, WA 98195 J. Nathan Kutz Department of Applied Mathematics University of Washington Seattle, WA 98195-2420 1 Introduction One of the fundamental computer vision objectives is to detect moving objects in a given video stream. At the most basic level, moving objects can be found in a video by removing the background. However, this is a challeng- ing task in practice, since the true background is often unknown. Algorithms for background modeling are re- quired to be both robust and adaptive. Indeed, the list of challenges is significant and includes camera jitter, il- lumination changes, shadows and dynamic backgrounds. There is no single method currently available that is ca- pable of handling all the challenges in real-time with- out suffering performance failures. Moreover, one of the great challenges in this field is to efficiently process high- resolution video streams, a task that is at the edge of per- formance limits for state-of-the-art algorithms. Given the importance of background modeling, a variety of math- ematical methods and algorithms have been developed over the past decade. Comprehensive overviews of tra- ditional and state-of-the art methods are provided by Bouwmans [1] or Sobral and Vacavant [2]. Motivation. This work advocates the method of dy- namic mode decomposition (DMD), which enables the decomposition of spatio-temporal grid data in both space and time. The DMD has been successfully applied to videos [3, 4, 5], however the computational costs are dominated by the singular value decomposition (SVD). Even with the aid of recent innovations around random- ized algorithms for computing the SVD [6], the com- putational costs remain expensive for high resolution videos. Importantly, we build on the recently introduced compressed dynamic mode decomposition (cDMD) al- gorithm, which integrates DMD with ideas from com- pressed sensing and matrix sketching [7]. Hence, instead of computing the DMD on the full-resolution video data, we show that an accurate decomposition can be obtained from a compressed representation of the video in a frac- tion of the time. The optimal mode selection for back- ground modeling is formulated as a sparsity-constrained arXiv:1512.04205v2 [cs.CV] 25 Aug 2016

Transcript of N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today:...

Page 1: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

N. Benjamin Erichson · Steven L. Brunton · J. Nathan Kutz

Compressed Dynamic Mode Decomposition forBackground Modeling

Abstract We introduce the method of compressed dy-namic mode decomposition (cDMD) for backgroundmodeling. The dynamic mode decomposition (DMD) isa regression technique that integrates two of the leadingdata analysis methods in use today: Fourier transformsand singular value decomposition. Borrowing ideas fromcompressed sensing and matrix sketching, cDMD easesthe computational workload of high resolution video pro-cessing. The key principal of cDMD is to obtain the de-composition on a (small) compressed matrix representa-tion of the video feed. Hence, the cDMD algorithm scaleswith the intrinsic rank of the matrix, rather then the sizeof the actual video (data) matrix. Selection of the opti-mal modes characterizing the background is formulatedas a sparsity-constrained sparse coding problem. Our re-sults show, that the quality of the resulting backgroundmodel is competitive, quantified by the F-measure, Re-call and Precision. A GPU (graphics processing unit) ac-celerated implementation is also presented which furtherboosts the computational efficiency of the algorithm.

Keywords dynamic mode decomposition; backgroundmodeling; matrix sketching; sparse coding; GPU-accelerated computing.

N. Benjamin ErichsonSchool of Mathematics and StatisticsUniversity of St AndrewsSt Andrews, United KingdomE-mail: [email protected]

Steven L. BruntonDepartment of Mechanical EngineeringUniversity of WashingtonSeattle, WA 98195

J. Nathan KutzDepartment of Applied MathematicsUniversity of WashingtonSeattle, WA 98195-2420

1 Introduction

One of the fundamental computer vision objectives is todetect moving objects in a given video stream. At themost basic level, moving objects can be found in a videoby removing the background. However, this is a challeng-ing task in practice, since the true background is oftenunknown. Algorithms for background modeling are re-quired to be both robust and adaptive. Indeed, the listof challenges is significant and includes camera jitter, il-lumination changes, shadows and dynamic backgrounds.There is no single method currently available that is ca-pable of handling all the challenges in real-time with-out suffering performance failures. Moreover, one of thegreat challenges in this field is to efficiently process high-resolution video streams, a task that is at the edge of per-formance limits for state-of-the-art algorithms. Given theimportance of background modeling, a variety of math-ematical methods and algorithms have been developedover the past decade. Comprehensive overviews of tra-ditional and state-of-the art methods are provided byBouwmans [1] or Sobral and Vacavant [2].

Motivation. This work advocates the method of dy-namic mode decomposition (DMD), which enables thedecomposition of spatio-temporal grid data in both spaceand time. The DMD has been successfully applied tovideos [3, 4, 5], however the computational costs aredominated by the singular value decomposition (SVD).Even with the aid of recent innovations around random-ized algorithms for computing the SVD [6], the com-putational costs remain expensive for high resolutionvideos. Importantly, we build on the recently introducedcompressed dynamic mode decomposition (cDMD) al-gorithm, which integrates DMD with ideas from com-pressed sensing and matrix sketching [7]. Hence, insteadof computing the DMD on the full-resolution video data,we show that an accurate decomposition can be obtainedfrom a compressed representation of the video in a frac-tion of the time. The optimal mode selection for back-ground modeling is formulated as a sparsity-constrained

arX

iv:1

512.

0420

5v2

[cs

.CV

] 2

5 A

ug 2

016

Page 2: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

2

sparse coding problem, which can be efficiently approx-imated using the greedy orthogonal matching pursuitmethod. The performance gains in computation timeare significant, even competitive with Gaussian mixture-models. Moreover, the performance evaluation on real-videos shows that the detection accuracy is competitivecompared to leading robust principal component analy-sis (RPCA) algorithms.

Organization. The rest of this paper is organized as fol-lows. Section 2 presents a brief introduction to the dy-namic mode decomposition and its application to videoand background modeling. Section 3 presents the com-pressed DMD algorithm and different measurement ma-trices to construct the compressed video matrix. A GPUaccelerated implementation is also outlined. Finally a de-tailed evaluation of the algorithm is presented in section4. Concluding remarks and further research directionsare given in section 5. Appendix A gives an overview ofnotation.

2 DMD for Video Processing

2.1 The Dynamic Mode Decomposition

The dynamic mode decomposition is an equation-free,data-driven matrix decomposition that is capable of pro-viding accurate reconstructions of spatio-temporal co-herent structures arising in nonlinear dynamical systems,or short-time future estimates of such systems. DMD wasoriginally introduced in the fluid mechanics communityby Schmid [8] and Rowley et al. [9]. A surveillance videosequence offers an appropriate application for DMD be-cause the frames of the video are, by nature, equallyspaced in time, and the pixel data, collected in everysnapshot, can readily be vectorized. The dynamic modedecomposition is illustrated for videos in Figure 1. Forcomputational convenience the flattened grayscale videoframes (snapshots) of a given video stream are stored,ordered in time, as column vectors x1,x2, . . . ,xm of amatrix. Hence, we obtain a 2-dimensional Rn×m spatio-temporal grid, where n denotes the number of pixels perframe, m is the number of video frames taken, and thematrix elements xit correspond to a pixel intensity inspace and time. The video frames can be thought ofas snapshots of some underlying dynamics. Each videoframe (snapshot) xt+1 at time t + 1 is assumed to beconnected to the previous frame xt by a linear mapA : Rn → Rn. Mathematically, the linear map A is atime-independent operator which constructs the approx-imate linear evolution

xt+1 = Axt. (1)

The objective of dynamic mode decomposition is to findan estimate for the matrix A and its eigenvalue decom-position that characterize the system dynamics. At its

core, dynamic mode decomposition is a regression algo-rithm. First, the spatio-temporal grid is separated intotwo overlapping sets of data, called the left and rightsnapshot sequences

X=

x1 x2 · · · xm−1

, X′=

x2 x3 · · · xm

. (2)

Equation (1) is reformulated in matrix notation

X′ = AX. (3)

In order to find an estimate for the matrix A we face thefollowing least-squares problem

A = argminA

‖X′ −AX‖2F , (4)

where ‖ · ‖F denotes the Frobenius norm. This is a well-studied problem, and an estimate of the linear operatorA is given by

A = X′X†, (5)

where † denotes the Moore-Penrose pseudoinverse, whichproduces a regression that is optimal in a least-squaresense. The DMD modes Φ = W, containing the spatialinformation, are then obtained as eigenvectors of the ma-trix A

AW = WΛ, (6)

where columns of W are eigenvectors φj and Λ is a di-agonal matrix containing the corresponding eigenvaluesλj . In practice, when the dimension n is large, the matrix

A ∈ Rn×n may be intractable to estimate and to ana-lyze directly. DMD circumvents the computation of Aby considering a rank-reduced representation A ∈ Rk×k.This is achieved by using the similarity transform, i.e.,projecting A on the left singular vectors. Moreover, theDMD typically makes use of low-rank structure so thatthe total number of modes, k ≤ min(n,m), allows for di-mensionality reduction of the video stream. Hence, onlythe relatively small A ∈ Rk×k matrix needs to be es-timated and analyzed (see Section 3 for more details).The dynamic mode decomposition yields then the fol-lowing low-rank factorization of a given spatio-temporalgrid (video stream):

ΦBV =

φ11 φ1p · · · φ1k

......

. . ....

φi1 φip · · · φik...

.... . .

...φn1 φnp · · · φnk

b1

. . .bp

. . .bk

1 λ1 · · · λm−11...

.... . .

...1 λp · · · λm−1p...

.... . .

...1 λk · · · λm−1k

(7)

where the diagonal matrix B ∈ Ck×k has the amplitudesas entries and V ∈ Ck×m is the Vandermonde matrixdescribing the temporal evolution of the DMD modesΦ ∈ Cn×k.

Page 3: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

3

Fig. 1: Illustration of the dynamic mode decomposition for video applications. Given a video stream, the first stepinvolves reshaping the grayscale video frames into a 2-dimensional spatio-temporal grid. The DMD then creates adecomposition in space and time in which DMD modes contain spatial structure.

2.2 DMD for Foreground/Background Separation

The DMD method can attempt to reconstruct any givenframe, or even possibly future frames. The validity ofthe reconstruction thereby depends on how well the spe-cific video sequence meets the assumptions and criteriaof the DMD method. Specifically, a video frame xt attime points t ∈ 1, ...,m is approximately reconstructedas follows

xt =

k∑j=1

bjφjλt−1j . (8)

Notice that the DMD mode φj is a n× 1 vector contain-ing the spatial structure of the decomposition, while theeigenvalue λt−1j describes the temporal evolution. Thescalar bj is the amplitude of the corresponding DMDmode. At time t = 1, equation (8) reduces to x1 =∑k

j=1 bjφj . Since the amplitude is time-independent, bjcan be obtained by solving the following least-squareproblem using the video frame x1 as initial condition

b = argminb‖x1 −Φb‖2F . (9)

It becomes apparent that any portion of the first videoframe that does not change in time, or changes veryslowly in time, must have an associated continuous-timeeigenvalue

ωj =log(λj)

∆t(10)

that is located near the origin in complex space: |ωj | ≈0 or equivalent |λj | ≈ 1. This fact becomes the keyprinciple to separate foreground elements (approximatesparse) from background (approximate low-rank) infor-mation. Figure 2 shows the dominant continuous-time

eigenvalues for a video sequence. Subplot (a) shows threesample frames from this video sequence that includes acanoe. Here the foreground object (canoe) is not presentat the beginning and the end for the video sequence. Thedynamic mode decomposition factorizes this sequenceinto modes describing the different dynamics present.The analysis of the continuous-time eigenvalue ωj andthe amplitudes over time BV (the amplitudes multipliedby the Vandermonde matrix) can provide interesting in-sights, shown in subplot (b) and (c). First, the ampli-tude for the prominent zero mode (background) is con-stant over time, indicating that this mode is capturingthe dominant (static) content of the video sequence, i.e,the background. The next pair of modes correspond tothe canoe, a foreground object slowly moving over time.The amplitude reveals the presence of this object. Specif-ically, the amplitude reaches its maximum at about theframe index 150, when the canoe is in the center of thevideo frame. At the beginning and end of the video thecanoe is not present, indicated by the negative values ofthe amplitude. The subsequent modes describe other dy-namics in the video sequence e.g., the movements of thecanoeist and the waves. For instance, the modes describ-ing the waves have high frequency and small amplitudes(not shown here). Hence, a theoretical viewpoint we willbuild upon with the DMD methodology centers aroundthe recent idea of low-rank and sparse matrix decompo-sitions. Following this approach, background modelingcan be formulated as a matrix separation problem intolow-rank (background) and sparse (foreground) compo-nents. This viewpoint has been advocated, for instance,by Candes et al. [10] in the framework of robust principalcomponent analysis (RPCA). For a thorough discussionof such methods used for background modeling, we re-

Page 4: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

4

0 50 100 150 200 250 300

0

50

100

150

200

0 50 100 150 200 250 300

0

50

100

150

200

0 50 100 150 200 250 300

0

50

100

150

200

(a) Sample frames (t = 0, 150, 300) of video sequence.

Background

mode

Slow varying foreground objects

Other dynamics

real

ima

gin

ary

0.000-0.005 0.005

-0.1

0.1

0.0

(b) Dominant continuous-time eigenvalues ωj .

Other dynamics

Slow varying

foreground

objects

Time (frame index)

Background mode

am

pli

tud

es

0 50 100 150 200 250 300

-2000

2000

0

4000

(c) Amplitudes over time.

Fig. 2: Results of the dynamic mode decomposition for the ChangeDetection.net video sequence ‘canoe’. Subplot (a)shows three samples frames of the video sequence. Subplot (b) and (c) show the the continuous-time eigenvalues andthe temporal evolution of the amplitudes. The modes corresponding to the amplitudes with the highest variance arecapturing the dominant foreground object (canoe), while the zero mode is capturing the dominant structure of thebackground. Modes corresponding to high frequency amplitudes capturing other dynamics in the video sequence,e.g., waves, etc.

fer to Bouwmans et al. [11, 12]. The connection betweenDMD and RPCA was first established by Grosek andKutz [3]. Assume the set of background modes {ωp} sat-isfies |ωp| ≈ 0. The DMD expansion of equation (8) thenyields

XDMD = L + S

=∑p

bpφpλt−1p︸ ︷︷ ︸

Background Video

+∑j 6=p

bjφjλt−1j︸ ︷︷ ︸

Foreground Video

(11)

where t = [1, ...,m] is a 1×m time vector and XDMD ∈Cn×m.1 Specifically, DMD provides a matrix decompo-sition of the form XDMD = L + S, where the low-rankmatrix L will render the video of just the background,and the sparse matrix S will render the complementaryvideo of the moving foreground objects. We can interpretthese DMD results as follows: stationary background ob-jects translate into highly correlated pixel regions from

1 Note that by construction XDMD is complex, while pixelintensities of the original video stream are real-valued. Hence,only the the real part is considered in the following.

one frame to the next, which suggests a low-rank struc-ture within the video data. Thus the DMD algorithmcan be thought of as an RPCA method. The advantageof the DMD method and its sparse/low-rank separationis the computational efficiency of achieving (11), espe-cially when compared to the optimization methods ofRPCA. The analysis of the time evolving amplitudes pro-vide interesting opportunities. Specifically, learning theamplitudes’ profiles for different foreground objects al-lows automatic separation of video feeds into differentcomponents. For instance, it could be of interest to dis-criminate between cars and pedestrians in a given videosequence.

2.3 DMD for Real-Time Background Modeling

When dealing with high-resolution videos, the standardDMD approach is expensive in terms of computationaltime and memory, because the whole video sequence isreconstructed. Instead a ‘good’ static background modelis often sufficient for background subtraction. This is be-

Page 5: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

5

cause background dynamics can be filtered out or thresh-olded. The challenge remains to automatically select themodes best describing the background. This is essentiallya bias-variance trade-off. Using just the zero mode (back-ground) leads to an under-fit background model, whilea large set of modes tend to overfit. Motivated, by thesparsity-promoting variant of the standard DMD algo-rithm introduced by Jovanovic et al. [13], we formulatea sparsity-constrained sparse coding problem for modeselection. The idea is to augment equation (9) by anadditional term that penalizes the number of non-zeroelements in the vector b

β = argminβ‖x1 −Φβ‖2F such that ‖β‖0 < K, (12)

where β is the sparse representation of b, and ‖ · ‖0is the `0 pseudo norm which counts the non-zero ele-ments in β. Solving this sparsity problem exactly is NP-hard. However, the problem in Eq. 12 can be efficientlysolved using greedy approximation methods. Specifically,we utilize orthogonal matching pursuit (OMP) [14, 15].A highly computationally efficient algorithm is proposedby Rubinstein et al. [16] as implemented in the scikit-learn software package [17]. The greedy OMP algorithmworks iteratively, selecting at each step the mode withthe highest correlation to the current residual. Once amode is selected the initial condition x1 is orthogonallyprojected on the span of the previously selected set ofmodes. Then the residual is recomputed and the pro-cess is repeated until K non-zero entries are obtained.If no priors are available, the optimal number of modesK can be determined using cross-validation. Finally, thebackground model is computed as

xBG = Φβ. (13)

3 Compressed DMD (cDMD)

Compressed DMD provides a computationally efficientframework to compute the dynamic mode decomposi-tion on massively under-sampled or compressed data [7].The method was originally devised to reconstruct high-dimensional, full-resolution DMD modes from sparse,spatially under-resolved measurements by leveragingcompressed sensing. However, it was quickly realizedthat if full-state measurements are available, many ofthe computationally expensive steps in DMD may becomputed on a compressed representation of the data,providing dramatic computational savings. The first ap-proach, where DMD is computed on sparse measure-ments without access to full data, is referred to as com-pressed sensing DMD. The second approach, where DMDis accelerated using a combination of calculations oncompressed data and full data, is referred to as com-pressed DMD (cDMD); this is depicted schematically inFig. 3. For the applications explored in this work, we

use compressed DMD, since full image data is availableand reducing algorithm run-time is critical for real-timeperformance.

X,X′ Φ,Λ

Y,Y′ ΦY,ΛY

DMD

cDMD

C Eq. (24)

Data Dynamic Modes

Fu

llC

om

pre

ssed

Fig. 3: Schematic of the compressed dynamic mode de-composition architecture. The data (video stream) is firstcompressed via left multiplication by a measurement ma-trix C. DMD is then performed on the compressed rep-resentation of the data. Finally, the full DMD modes Φare reconstructed from the compressed modes ΦY by theexpression in Eq. (24).

3.1 Compressed Sensing and Matrix Sketching

Compression algorithms are at the core of modern video,image and audio processing software such as MPEG,JPEG and MP3. In our mathematical infrastructure ofcompressed DMD, we consider the theory of compressedsensing and matrix sketching.

Compressed sensing demonstrates that instead of mea-suring the high-dimensional signal, or pixel space rep-resentation of a single frame x, we can measure in-stead a low-dimensional subsample y and approxi-mate/reconstruct the full state space x with this sig-nificantly smaller measurement [18, 19, 20]. Specifically,compressed sensing assumes the data being measured iscompressible in some basis, which is certainly the casefor video. Thus the video can be represented in a smallnumber of elements of that basis, i.e. we only need tosolve for the few non-zero coefficients in the transformbasis. For instance, consider the measurements y ∈ Rp,with k < p� n:

y = Cx. (14)

If x is sparse in Ψ, then we may solve the underdeter-mined system of equations

y = CΨs (15)

Page 6: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

6

for s and then reconstruct x. Since there are infinitelymany solutions to this system of equations, we seek thesparsest solution s. However, it is well known from thecompressed sensing literature that solving for the spars-est solution formally involves an `0 optimization thatis NP-hard. The success of compressed sensing is thatit ultimately engineered a solution around this issue byshowing that one can instead, under certain conditionson the measurement matrix C, trade the infeasible `0optimization for a convex `1-minimization [18]:

s = argmins′

‖s′‖1, such that y = CΨs′. (16)

Thus the `1-norm acts as a proxy for sparsity promotingsolutions of s. To guarantee that the compressed sensingarchitecture will almost certainly work in a probabilis-tic sense, the measurement matrix C and sparse basisΨ must be incoherent, meaning that the rows of C areuncorrelated with the columns of Ψ. This is discussed inmore detail in [7]. Given that we are considering videoframes, it is easy to suggest the use of generic basis func-tions such as Fourier or wavelets in order to represent thesparse signal s. Indeed, wavelets are already the standardfor image compression architectures such as JPEG-2000.As for the Fourier transform basis, it is particularly at-tractive for many engineering purposes since single-pixelmeasurements are clearly incoherent given that it excitesbroadband frequency content.

Matrix sketching is another prominent framework in or-der to obtain a similar compressed representation ofa massive data matrix [21, 22]. The advantage of thisapproach are the less restrictive assumptions and thestraight forward generalization from vectors to matrices.Hence, Eq. 14 can be reformulated in matrix notation

Y = CX, (17)

where again C denotes a suitable measurement matrix.Matrix sketching comes with interesting error boundsand is applicable whenever the data matrix X has low-rank structure. For instance, it has been successfullydemonstrated that the singular values and right singu-lar vectors can be approximated from such a compressedmatrix representation [23].

3.2 Algorithm

The compressed DMD algorithm proceeds similarly tothe standard DMD algorithm [24] at nearly every stepuntil the computation of the DMD modes. The key dif-ference is that we first compute a compressed represen-tation of the video sequence, as illustrated in Figure 4.Hence the algorithm starts by generating the measure-ment matrix C ∈ Rp×n in order to compresses or sketchthe data matrices as in Eq. (2):

Y = CX, Y′ = CX′. (18)

Fig. 4: Video compression using a sparse measurementmatrix. The compressed matrix faithfully captures theessential spectral information of the video.

where p is denoting the number of samples or measure-ments. There is a fundamental assumption that the in-put data are low-rank. This is satisfied for video data,because each of the columns of X and X′ ∈ Rn×m−1 aresparse in some transform basis Ψ. Thus, for sufficientlymany incoherent measurements, the compressed matri-ces Y and Y′ ∈ Rp×m−1 have similar correlation struc-tures to their high-dimensional counterparts. Then, com-pressed DMD approximates the eigenvalues and eigen-vectors of the linear map AY, where the estimator isdefined as:

AY = Y′Y† (19a)

= Y′VYS−1Y UY∗, (19b)

where ∗ denotes the conjugate transpose. The pseudo-inverse Y† is computed using the SVD:

Y = UYSYVY∗, (20)

where the matrices U ∈ Rp×k, and V ∈ Rm−1×k are thetruncated left and right singular vectors. The diagonalmatrix S ∈ Rk×k has the corresponding singular valuesas entries. Here k is the target-rank of the truncatedSVD approximation to Y. Note that the subscript Y isincluded to explicitly denote computations involving thecompressed data Y. As in the standard DMD algorithm,we typically do not compute the large matrix AY, butinstead compute the low-dimensional model projectedonto the left singular vectors:

AY = UY∗AYUY (21a)

= UY∗Y′VYS−1Y . (21b)

Since this is a similarity transform, the eigenvectors andeigenvalues can be obtained from the eigendecompositionof AY

AYWY = WYΛY, (22)

where columns of WY are eigenvectors φj and ΛY is adiagonal matrix containing the corresponding eigenval-ues λj . The similarity transform implies that Λ ≈ ΛY .The compressed DMD modes are consequently given by

ΦY = Y′VYS−1Y WY. (23)

Page 7: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

7

Finally, the full DMD modes are recovered using

Φ = X′VYS−1Y WY. (24)

Note that the compressed DMD modes in Eq. (24) makeuse of the full data X′ as well as the linear transfor-mations obtained using the compressed data Y and Y′.The expensive SVD on X is bypassed, and it is insteadperformed on Y. Depending on the compression ratio,this may provide significant computational savings. Thecomputational steps are summarized in Algorithm 1 andfurther numerical details are presented in [7].

Remark 1 The computational performance heavily de-pends on the measurement matrix used to construct thecompressed matrix, as described in the next section. Fora practical implementation sparse or single pixel mea-surements (random row sampling) are favored. The lat-ter most memory efficient methods avoids the generationof a large number of random numbers and the expensivematrix-matrix multiplication in step 3.

Remark 2 One alternative to the predefined target-rankk is the recent hard-thresholding algorithm of Gavish andDonoho [25]. This method can can be combined with step4 to automatically determine the optimal target-rank.

Remark 3 As described in Section 2.3 step 9 can be re-placed by the orthogonal matching pursuit algorithm,in order to obtain a sparsity-constrained solution: b =omp(Φ,x1). Computing the OMP solution is in generalextremely fast, but if it comes to high resolution videostreams this step can become computationally expensive.However, instead of computing the amplitudes basedon the the full-state dynamic modes Φ the compressedDMD modes ΦY can be used. Hence, Eq. 12 can be re-formulated as

β = argminβ‖y1 −ΦYβ‖2F such that ‖β‖0 < K, (25)

where y1 is the first compressed video frame. Then step9 can be replaced by: beta = omp(ΦY,y1).

3.3 Measurement Matrices

A basic sensing matrix C can be constructed by draw-ing p × n independent random samples from a Gaus-sian, Uniform or a sub Gaussian, e.g., Bernoulli distri-bution. It can be shown that these measurement matriceshave optimal theoretical properties, however for practicallarge-scale applications they are often not feasible. Thisis because generating a large number of random numberscan be expensive and computing (18) using unstructureddense matrices has a time complexity of O(pnm). Froma computational perspective it is favorable to build astructured random sensing matrix which is memory ef-ficient, and enables the execution of fast matrix-matrixmultiplications. For instance, Woolfe et al. [26] showed

that the costs can be reduced to O(log(p)nm) using asubsampled random Fourier transform (SRFT) sensingmatrix

C = RFD, (26)

where R ∈ Cp×n draws p random rows (without replace-ment) from the identity matrix I ∈ Cn×n. F ∈ Cn×n

is the unnormalized discrete Fourier transform with thefollowing entries F(j, k) = exp(−2πi(j − 1)(k − 1)/m)and D ∈ Cn×n is a diagonal matrix with independentrandom diagonal elements uniformly distributed on thecomplex unit circle. While the SRFT sensing matrixhas nice theoretical properties, the improvement fromO(pnm) to O(log(p)nm) is not necessarily significant. Inpractice it is often sufficient to construct even simplersensing matrices. An interesting approach making thematrix-matrix multiplication (18) redundant is to usesingle-pixel measurements (random row-sampling)

C = R. (27)

In a practical implementation this allows constructionof the compressed matrix Y from choosing p randomrows without replacement from X. Hence, only p ran-dom numbers need to be generated and no memory isrequired for storing a sensing matrix C. A different ap-proach is the method of sparse random projections [27].The idea is to construct a sensing matrix C with identi-cal independent distributed entries as follows

cij =

1 with prob. 12s

0 with prob. 1− 1s ,

-1 with prob. 12s

(28)

where the parameter s controls the sparsity. WhileAchlioptas [27] has proposed the values s = 1, 2, Li et al.[28] showed that also very sparse (aggressive) samplingrates like s = n/log(n) achieve accurate results. Modernsparse matrix packages allow rapid execution of (18).

3.4 GPU Accelerated Implementation

While most current desktop computers allow multi-threading and also multiprocessing, using a graphics pro-cessing unit (GPU) enables massive parallel processing.The paradigm of parallel computing becomes more im-portant as larger amounts of data stagnate CPU clockspeeds. The architecture of a modern CPU and GPUis illustrated in Figure 5. The key difference betweenthese architectures is that the CPU consists of fewarithmetic logic units (ALU) and is highly optimizedfor low-latency access to cached data sets, while theGPU is optimized for data-parallel, throughput compu-tations. This is achieved by the large number of smallarithmetic logic units (ALU). Traditionally this archi-tecture was designed for the real-time creation of high-definition 2D/3D graphics. However, NVIDIA’s pro-gramming model for parallel computing CUDA opens up

Page 8: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

8

Algorithm 1 Compressed Dynamic Mode Decomposition. Given a matrix D ∈ Rn×m containing the flattenedvideo frames, this procedure computes the approximate dynamic mode decomposition, where Φ ∈ Cn×k are theDMD modes, b ∈ Ck are the amplitudes, and V ∈ Ck×m is the Vandermonde matrix describing the temporalevolution. The procedure can be controlled by the two parameters k and p, the target rank and the number ofsamples respectively. It is required that n ≥ m, integer k, p ≥ 1 and k � n and p ≥ k.

function [Φ,b,V] = cdmd(D, k, p)

(1) X,X′ = D Left/right snapshot sequence.

(2) C = rand(p,m) Draw p×m sensing matrix.

(3) Y,Y′ = C ∗D Compress input matrix.

(4) U,S,V = svd(Y, k) Truncated SVD.

(6) A = U∗ ∗Y′ ∗V ∗ S−1 Least squares fit.

(7) W,Λ = eig(A) Eigenvalue decomposition.

(8) Φ← X′ ∗V ∗ S−1 ∗W Compute full-state modes Φ.

(9) b = lstsq(Φ,x1) Compute amplitudes using x1 as intial condition.

(10) V = vander(diag(Λ)) Vandermonde matrix (optional).

ALU ALU

ALU ALU

Control

L2

DRAM

(a) CPU

L2

DRAM

(b) GPU

Fig. 5: Illustration of the CPU and GPU architecture.

Fig. 6: Illustration of the data parallelism in matrix-matrix multiplications.

the GPU as a general parallel computing device [29]. Us-ing high-performance linear algebra libraries, e.g. CULA[30], can help to accelerate comparable CPU implemen-tations substantially. Take for instance the matrix mul-tiplication of two n × n square matrices, illustrated inFigure 6. The computation involves the evaluation ofn2 dot products.2 The data parallelism therein is thateach dot-product can be computed independently. Withenough ALUs the computational time can be substan-tially accelerated. This parallelism applies readily to thegeneration of random numbers and many other linearalgebra routines.

Relatively few GPU accelerated background subtrac-tion methods have been proposed [31, 32, 33]. The au-thors achieve considerable speedups compared to thecorresponding CPU implementations. However, the pro-posed methods barely exceed 25 frames per second forhigh definition videos. This is mainly due to the factthat many statistical methods do not fully benefit fromthe GPU architecture. In contrast, linear algebra basedmethods can substantially benefit from parallel comput-ing. An analysis of Algorithm 1 reveals that generatingrandom numbers in line 2 and the dot products in lines3, 6, and 8 are particularly suitable for parallel process-ing. But also the computation of the deterministic SVD,the eigenvalue decomposition and the least-square solvercan benefit from the GPU architecture. Overall the GPUaccelerated DMD implementation is substantially fasterthan the MKL (Intel Math Kernel Library) acceleratedroutine. The disadvantage of current GPUs is the rather

2 Modern efficient matrix-matrix multiplications are basedon block matrix decomposition or other computational tricks,and do not actually compute n2 dot products. However theconcept of parallelism remains the same.

Page 9: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

9

limited bandwidth, i.e., the amount of data which canbe exchanged per unit of time, between CPU and GPUmemory. However, this overhead can be mitigated usingasynchronous memory operations.

4 Results

In this section we evaluate the computational perfor-mance and the suitability of compressed DMD for ob-ject detection. To evaluate the detection performance,a foreground mask X is computed by thresholding thedifference between the true frame and the reconstructedbackground. A standard method is to use the Euclideandistance, leading to the following binary classificationproblem

Xt(j) =

{1 if ‖xjt − xj‖ > τ,0 otherwise

(29)

where xjt denotes the j-th pixel of the t-th video frameand xj denotes the corresponding pixel of the modeledbackground. Pixels belonging to foreground objects areset to 1 and 0 otherwise. Access to the true foregroundmask allows the computation of several statistical mea-sures. For instance, common evaluation measures in thebackground subtraction literature are recall, precisionand the F-measure. While recall measures the ability tocorrectly detect pixels belonging to moving objects, pre-cision measures how many predicted foreground pixelsare actually correct, i.e., false alarm rate. The F-measurecombines both measures by their harmonic mean. Aworkstation (Intel Xeon CPU E5-2620 2.4GHz, 32GBDDR3 memory and NVIDIA GeForce GTX 970) wasused for all following computations.

4.1 Evaluation on Real Videos

We have evaluated the performance of compressed DMDfor object detection using the CD (ChangeDetection.net)and BMC (Background Models Challenge) benchmarkdataset [34, 35]. Figure 7 illustrates the 9 real videosof the latter dataset, posing many common challengesfaced in outdoor video surveillance scenarios. Mainly, thefollowing complex situations are encountered:

– Illumination changes: Gradual illuminationchanges caused by fog or sun.

– Low illumination: Bad light conditions, e.g., nightvideos.

– Bad weather: Introduced noise (small objects) byweather conditions, e.g., snow or rain.

– Dynamic backgrounds: Moving objects belongingto the background, e.g. waving trees or clouds.

– Sleeping foreground objects: Former foregroundobjects that becoming motionless and moving againat a later point in time.

(001)Boringparking

(002) Bigtrucks

(003)Wanderingstudents

(004)Rabbit inthe night

(005) SnowyChristmas

(006)Beware ofthe trains

(007) Trainin thetunnel

(008) Trafficduring

windy day

(009) Onerainy hour

Fig. 7: BMC dataset: Example frames of the 9 realvideos.

Evaluation settings. In order to obtain reproducible re-sults the following settings have been used. For a givenvideo sequence, the low-rank dynamic mode decomposi-tion is computed using a very sparse measurement ma-trix with a sparsity factor s = n/log(n) and p = 1000measurements. While, we use here a fixed number ofsamples, the choice can be guided by the formula p >k · log(n/k). The target-rank k is automatically deter-mined via the optimal hard-threshold for singular val-ues [25]. Once the dynamic mode decomposition is ob-tained, the optimal set of modes is selected using theorthogonal matching pursuit method. In general the useof K = 10 non-zero entries achieves good results. In-stead of using a predefined value for K, cross-validationcan be used to determine the optimal number of non-zero entries. Further, the dynamic mode decompositionas presented here is formulated as a batch algorithm, inwhich a given long video sequence is split into batches of200 consecutive frames. The decomposition is then com-puted for each batch independently.

The CD dataset. First, six CD video sequences are usedto contextualize the background modeling quality usingthe sparse-coding approach. This is compared to usingthe zero (static background) mode only. Figure 8 showsthe evaluation results of one batch by plotting the F-measure against the threshold for background classifica-tion. In fife out of the six examples the sparse-coding ap-proach (cDMD k=opt) dominates. In particular, signif-icant improvements are achieved for the dynamic back-

Page 10: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

10

20 30 40 50 60 70 80 90 100Threshold

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85

0.90

F­M

easure

DMD (k=0) F=0.733cDMD (k=0) F=0.738cDMD (k=opt) F=0.768

(a) Highway

10 20 30 40 50 60Threshold

0.60

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

F­M

easure

DMD (k=0) F=0.860cDMD (k=0) F=0.890cDMD (k=opt) F=0.901

(b) Blizzard

0 50 100 150 200 250Threshold

0.00.10.20.30.40.50.60.70.80.9

F­M

easure

DMD (k=0) F=0.572cDMD (k=0) F=0.520cDMD (k=opt) F=0.623

(c) Canoe

0 50 100 150 200 250Threshold

0.0

0.1

0.2

0.3

0.4

0.5

0.6

F­M

easu

re

DMD (k=0) F=0.397cDMD (k=0) F=0.399cDMD (k=opt) F=0.457

(d) Fountain02

0 50 100 150 200Threshold

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7F­M

easu

reDMD (k=0) F=0.677cDMD (k=0) F=0.675cDMD (k=opt) F=0.554

(e) Park

0 50 100 150 200 250 300 350Threshold

0.2

0.4

0.6

0.8

1.0

1.2

F­M

easure

DMD (k=0) F=0.951cDMD (k=0) F=0.977cDMD (k=opt) F=0.988

(f) Library

Fig. 8: The F-measure for varying thresholds is indicating the dominant background modeling performance of thesparsity-promoting compressed DMD algorithm. In particular, the performance gain (over using the zero modeonly) is substantial for the dynamic background scenes ‘Canoe’ and ‘Fountain02’.

ground video sequences ‘Canoe’ and ‘Fountain02’. Onlyin case of the ‘Park’ video sequence the method tends toover-fit. Interestingly, the performance of the compressedalgorithm is slightly better then the exact DMD algo-rithm, overall. This is due to the implicit regularizationof randomized algorithms [36, 37].

The BMC dataset. In order to compare the cDMD al-gorithm with other RPCA algorithms the BMC datasethas been used. Table 1 shows the evaluation results com-puted with the BMC wizard for all 9 videos. An individ-ual threshold value has been selected for each video tocompute the foreground mask. For comparison the eval-uation results of 3 other RPCA methods are shown [12].Overall cDMD achieves an average F-value of about0.648. This is slightly better then the performance ofGoDec [38] and nearly as good as LSADM [39]. How-ever, it is lower then the F-measure achieved with theRSL method [40]. Figure 9 presents visual results for ex-ample frames across 5 videos. The last row shows thesmoothed (median filtered) foreground mask.

Discussion. The results reveal some of the strengths andlimitations of the compressed DMD algorithm. First, be-cause cDMD is presented here as a batch algorithm, de-tecting sleeping foreground objects as they occur in video001 is difficult. Another weakness is the limited capabil-ity of dealing with non-periodic dynamic backgrounds,e.g., big waving trees and moving clouds as occurringin the videos 001, 005, 008 and 009. On the other handgood results are achieved for the videos 002, 003, 004

Fig. 9: Visual evaluation results for 5 example framescorresponding to the BMC Videos: 002, 003, 006, 007and 009. The top row shows the original grayscale im-ages (moving objects are highlighted). The second rowshows the differencing between the reconstructed cDMDbackground and the original frame. Row three shows thethresholded and row four the in addition median filteredforeground mask.

and 007, showing that DMD can deal with large movingobjects and low illumination conditions. The integrationof compressed DMD into a video system can overcomesome of these initial issues. Hence, instead of discardingthe previous modeled background frames, a backgroundmaintenance framework can be used to incrementally up-date the model. In particular, this allows to deal betterwith sleeping foreground objects. Further, simple post-

Page 11: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

11

processing techniques (e.g. median filter or morphologytransformations) can substantially reduce the false pos-itive rate.

4.2 Computational Performance

Figure 12 shows the average frames per seconds (fps) raterequired to obtain the foreground mask for varying videoresolutions. The results illustrate the substantial com-putational advantage of the cDMD algorithm over thestandard DMD. The computational savings are mainlyachieved by avoiding the expensive computation of thesingular value decomposition. Specifically, the compres-sion step reduces the time complexity from O(knm) toO(kpm). The computation of the full modes Φ in Eq. 24remain the only computational expensive step of the al-gorithm. However, this step is embarrassingly paralleland the computational time can be further reduced us-ing a GPU accelerated implementation. The decompo-sition of a HD 1280 × 720 videos feed using the GPUaccelerated implementation achieves a speedup of about4 and 21 compared to the corresponding CPU cDMDand (exact) DMD implementations. The speedup of theGPU implementation can even further be increased usingsparse or single pixel (sPixel) measurement matrices.

Figure 10 investigates the performance of the differ-ent measurement matrices in more detail. Therefor, thefps rate and the F-measure is plotted for a varying num-ber of samples p. Gaussian measurements achieves thebest accuracy in terms of the F-measure, but the com-putational costs become increasingly expensive. Singlepixel measurements (sPixel) is the most computationallyefficient method. The primary advantages of single pixelmeasurements are the memory efficiency and the simpleimplementation. Sparse sensing matrices offer the besttrade-off between computational time and accuracy, butrequire access to sparse matrix packages.

It is important to stress that randomized sensing ma-trices cause random fluctuations influencing the back-ground model quality, illustrated in Figure 11. The boot-strap confidence intervals show that sparse measure-ments have lower dispersion than single pixel measure-ments. This is, because single pixel measurements dis-card more information than sparse and Gaussian sensingmatrices.

5 Conclusion and Outlook

We have introduced the compressed dynamic mode de-composition as a novel algorithm for video backgroundmodeling. Although many techniques have been devel-oped in the last decade and a half to accomplish thistask, significant challenges remain for the computer vi-sion community when fast processing of high-definitionvideo is required. Indeed, real-time HD video analysis

0 500 1000 1500

0

100

200

300

400

500

600

Frames per second (fps)

Exact DMD

Gaussian cDMD

Sparse cDMD

sPixel cDMD

0 500 1000 1500Number of samples, p

0.72

0.74

0.76

F-measure

Fig. 10: Algorithms runtime (excluding computation ofthe foreground mask) and accuracy for a varying numberof samples p. Here a 720× 480 video sequence with 200frames is used.

Fig. 11: Bootstrap 95%-confidence intervals of the F-measure computed using both sparse and single pixelmeasurements.

Page 12: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

12

Measure BMC real videos Average

001 002 003 004 005 006 007 008 009

RSLDe La Torre et al. [40]

Recall 0.800 0.689 0.840 0.872 0.861 0.823 0.658 0.589 0.690 -Precision 0.732 0.808 0.804 0.585 0.598 0.713 0.636 0.526 0.625 -F-Measure 0.765 0.744 0.821 0.700 0.706 0.764 0.647 0.556 0.656 0.707

LSADMGoldfarb et al. [39]

Recall 0.693 0.535 0.784 0.721 0.643 0.656 0.449 0.621 0.701 -Precision 0.511 0.724 0.802 0.729 0.475 0.655 0.693 0.633 0.809 -F-Measure 0.591 0.618 0.793 0.725 0.549 0.656 0.551 0.627 0.752 0.650

GoDecZhou and Tao [38]

Recall 0.684 0.552 0.761 0.709 0.621 0.670 0.465 0.598 0.700 -Precision 0.444 0.682 0.808 0.728 0.462 0.636 0.626 0.601 0.747 -F-Measure 0.544 0.611 0.784 0.718 0.533 0.653 0.536 0.600 0.723 0.632

cDMDRecall 0.552 0.697 0.778 0.693 0.611 0.700 0.720 0.515 0.566 -Precision 0.581 0.675 0.773 0.770 0.541 0.602 0.823 0.510 0.574 -F-Measure 0.566 0.686 0.776 0.730 0.574 0.647 0.768 0.512 0.570 0.648

Table 1: Evaluation results of nine real videos from the BMC dataset. For comparison, the results of three otherleading robust PCA algorihtms are presented, adapted from [12].

Fig. 12: CPU and GPU algorithms runtime (including the computation of the foreground mask) for varying videoresolutions (200 frames). The optimal target rank is automatically determined and p = 1000 samples are used.

remains one of the grand challenges of the field. OurcDMD method provides compelling evidence that it is aviable candidate for meeting this grand challenge, evenon standard CPU computing platforms. The frame rateper second is highly competitive compared to other stat-of-the-art algorithms, e.g. Gaussian mixture-based algo-rithms. Compared to current robust principal componentanalysis based algorithm the increase in speed is evenmore substantial. In particular, the GPU accelerated im-plementation substantially improves the computationaltime.

Despite the significant computational savings, thecDMD remains competitive with other leading algo-rithms in the quality of the decomposition itself. Ourresults show, that for both standard and challenging en-vironments, the cDMD’s object detection accuracy interms of the F-measure is competitive to leading RPCAbased algorithms [12]. Though, the algorithm cannotcompete, in terms of the F-measure, with highly special-ized algorithms, e.g. optimized Gaussian mixture-basedalgorithms for background modeling [2]. The main dif-ficulties arise when video feeds are heavily crowded or

dominated by non-periodic dynamic background objects.Overall, the trade-off between speed and accuracy ofcompressed DMD is compelling.

Future work will aim to improve the background sub-traction quality as well as to integrate a number of inno-vative techniques. One technique that is particularly use-ful for object tracking is the multi-resolution DMD [41].This algorithm has been shown to be a potential methodfor target tracking applications. Thus one can envisionthe integration of multi-resolution ideas with cDMD, i.e.a multi-resolution compressed DMD method, in orderto separate the foreground video into different dynamictargets when necessary.

Acknowledgements We would like to express our grati-tude to E. R. Davies, K. Manohar and the three anonymousreviewers for many helpful comments on an earlier version ofthis paper.

JNK acknowledges support from Air Force Officeof Scientific Research (FA9500-15-C-0039). SLB acknowl-edges support from the Department of Energy underaward DE-EE0006785. NBE acknowledges support from theUK Engineering and Physical Sciences Research Council(EP/L505079/1).

Page 13: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

13

A Notation

Scalarsk Number of modes (target-rank)p Number of samples (measurements)s Number of sparse samplesK Number of non-zero amplitudesn Number of pixels per video framem Number of video framesλ Eigenvalueω Continuous-time eigenvalue

Vectorsx ∈ Rn Flattened video framey ∈ Rp Compressed video frameφ ∈ Rn DMD modeb ∈ Rk Amplitudesβ ∈ Rk Sparsity-constrained amplitudes

MatricesX,X′ ∈ Rn×m−1 Left and right snapshot sequenceY,Y′ ∈ Rp×m−1 Compressed left/right snapshot sequenceC ∈ Rp×n Measurement matrixA ∈ Rn×n Linear mapA ∈ Rk×k Rank-reduced linear mapΦ ∈ Rn×k DMD modesΦY ∈ Rp×k Compressed DMD modesW,WY ∈ Rk×k Rank-reduced eigenvectorsΛ,ΛY ∈ Rk×k Rank-reduced eigenvalues (diagonal matrix)B ∈ Rk×k Amplitudes (diagonal matrix)V ∈ Rk×m Vandermonde matrixUY ∈ Rp×k Truncated compressed left singular vectorsVY ∈ Rk×m−1 Truncated compressed right singular vectorsSY ∈ Rk×k Truncated compressed singular values

References

1. T. Bouwmans, Traditional and recent approaches inbackground modeling for foreground detection: Anoverview, Computer Science Review 11-12 (2014)31–66. doi:10.1016/j.cosrev.2014.04.001.

2. A. Sobral, A. Vacavant, A comprehensive review ofbackground subtraction algorithms evaluated withsynthetic and real videos, Computer Vision and Im-age Understanding 122 (2014) 4–21. doi:10.1016/j.cviu.2013.12.005.

3. J. Grosek, J. N. Kutz, Dynamic mode decompositionfor real-time background/foreground separation invideo (2014). arXiv:1404.7592.

4. N. B. Erichson, C. Donovan, Randomized low-rankdynamic mode decomposition for motion detection,Computer Vision and Image Understanding 146(2016) 40–50. doi:10.1016/j.cviu.2016.02.005.

5. J. N. Kutz, X. Fu, S. L. Brunton, N. B. Erichson,Multi-resolution dynamic mode decomposition forforeground/background separation and object track-ing, in: 2015 IEEE International Conference on Com-puter Vision Workshop (ICCVW), 2015, pp. 921–929. doi:10.1109/ICCVW.2015.122.

6. N. Halko, P. G. Martinsson, J. A. Tropp, Find-ing structure with randomness: Probabilistic algo-rithms for constructing approximate matrix decom-positions, SIAM Review 53 (2) (2011) 217–288. doi:10.1137/090771806.

7. S. L. Brunton, J. L. Proctor, J. H. Tu, J. N. Kutz,Compressed sensing and dynamic mode decompo-sition, Journal of Computational Dynamics 2 (2)(2015) 165–191. doi:10.3934/jcd.2015002.

8. P. Schmid, Dynamic mode decomposition of nu-merical and experimental data, Journal of FluidMechanics 656 (2010) 5–28. doi:10.1017/S0022112010001217.

9. C. Rowley, I. Mezic, S. Bagheri, P. Schlatter, D. Hen-ningson, Spectral analysis of nonlinear flows, Journalof Fluid Mechanics 641 (2009) 115–127.

10. E. J. Candes, X. Li, Y. Ma, J. Wright, Robust princi-pal component analysis?, Journal of the ACM 58 (3)(2011) 1–37. doi:10.1145/1970392.1970395.

11. T. Bouwmans, E. H. Zahzah, Robust PCA via prin-cipal component pursuit: A review for a compara-tive evaluation in video surveillance, Computer Vi-sion and Image Understanding 122 (2014) 22–34.doi:10.1016/j.cviu.2013.11.009.

12. T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, E.-H.Zahzah, Decomposition into low-rank plus additivematrices for background/foreground separation: Areview for a comparative evaluation with a large-scale dataset (2015). arXiv:1511.01245.

13. M. R. Jovanovic, P. J. Schmid, J. W. Nichols,Sparsity-promoting dynamic mode decomposition,Physics of Fluids (1994-present) 26 (2) (2014)024103.

14. S. G. Mallat, Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on signalprocessing 41 (12) (1993) 3397–3415.

15. J. A. Tropp, A. C. Gilbert, Signal recovery from ran-dom measurements via orthogonal matching pursuit,IEEE Transactions on information theory 53 (12)(2007) 4655–4666.

16. R. Rubinstein, M. Zibulevsky, M. Elad, Efficient im-plementation of the K-SVD algorithm using batchorthogonal matching pursuit, CS Technion 40 (8)(2008) 1–15.

17. F. Pedregosa, G. Varoquaux, A. Gramfort,V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Van-derplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, E. Duchesnay, Scikit-learn: Machinelearning in Python, Journal of Machine LearningResearch 12 (2011) 2825–2830.

18. D. L. Donoho, Compressed sensing, IEEE Transac-tions on Information Theory 52 (4) (2006) 1289–1306. doi:10.1109/TIT.2006.871582.

19. E. J. Candes, M. B. Wakin, An introduction to com-pressive sampling, IEEE Signal Processing Maga-zine 25 (2) (2008) 21–30. doi:10.1109/MSP.2007.

Page 14: N. Benjamin Erichson Steven L. Brunton J. Nathan Kutz ... · data analysis methods in use today: Fourier transforms and singular value decomposition. Borrowing ideas from compressed

14

914731.20. R. G. Baraniuk, Compressive sensing, IEEE Signal

Processing Magazine 24 (4) (2007) 118–120.21. E. Liberty, Simple and deterministic matrix sketch-

ing, in: Proceedings of the 19th ACM SIGKDD In-ternational Conference on Knowledge Discovery andData Mining, ACM, 2013, pp. 581–588.

22. D. P. Woodruff, Sketching as a tool for numeri-cal linear algebra, Foundations and Trends in The-oretical Computer Science 10 (1-2) (2014) 1–157.doi:10.1561/0400000060.

23. A. C. Gilbert, J. Y. Park, M. B. Wakin, SketchedSVD: Recovering spectral features from compres-sive measurements, arXiv preprint arXiv:1211.0361(2012) 1–10.

24. J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L.Brunton, J. N. Kutz, On dynamic mode decom-position: Theory and applications (2013). arXiv:1312.0041.

25. M. Gavish, D. Donoho, The optimal hard thresh-old for singular values is 4/

√3, Information The-

ory, IEEE Transactions on 60 (8) (2014) 5040–5053.doi:10.1109/TIT.2014.2323359.

26. F. Woolfe, E. Liberty, V. Rokhlin, M. Tygert, A fastrandomized algorithm for the approximation of ma-trices, Applied and Computational Harmonic Anal-ysis 25 (3) (2008) 335–366.

27. D. Achlioptas, Database-friendly random projec-tions: Johnson-Lindenstrauss with binary coins,Journal of computer and System Sciences 66 (4)(2003) 671–687.

28. P. Li, T. J. Hastie, K. W. Church, Very sparse ran-dom projections, in: Proceedings of the 12th ACMSIGKDD international conference on Knowledge dis-covery and data mining, ACM, 2006, pp. 287–296.

29. J. Nickolls, I. Buck, M. Garland, K. Skadron, Scal-able parallel programming with CUDA, Queue 6 (2)(2008) 40–53. doi:10.1145/1365490.1365500.

30. J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L.Paolini, E. J. Kelmelis, CULA: Hybrid GPU accel-erated linear algebra routines (2010). doi:10.1117/12.850538.

31. P. Carr, GPU accelerated multimodal backgroundsubtraction, in: Digital Image Computing: Tech-niques and Applications, IEEE, 2008, pp. 279–286.

32. V. Pham, P. Vo, V. T. Hung, et al., GPU implemen-tation of extended gaussian mixture model for back-ground subtraction, in: IEEE International Confer-ence on Computing and Communication Technolo-gies, Research, Innovation, and Vision for the Fu-ture, 2010, pp. 1–4.

33. Q. Lixia, S. Bin, L. Weiyao, W. Wen, S. Ruimin,GPU-accelerated video background subtraction us-ing Gabor detector, Journal of Visual Communica-tion and Image Representation 32 (2015) 1–9. doi:10.1016/j.jvcir.2015.07.010.

34. Y. Wang, P.-M. Jodoin, F. Porikli, J. Konrad,Y. Benezeth, P. Ishwar, CDnet 2014: An expandedchange detection benchmark dataset, in: IEEEWorkshop on Computer Vision and Pattern Recog-nition, IEEE, 2014, pp. 393–400.

35. A. Vacavant, T. Chateau, A. Wilhelm, L. Lequievre,A benchmark dataset for outdoor fore-ground/background extraction, in: ComputerVision–ACCV 2012 Workshops, Springer, 2013, pp.291–300.

36. M. W. Mahoney, Randomized algorithms for ma-trices and data, Foundations and Trends in Ma-chine Learning 3 (2) (2011) 123–224. doi:10.1561/2200000035.

37. N. B. Erichson, S. Voronin, S. L. Brunton, J. N.Kutz, Randomized matrix decompositions using R(2016). arXiv:1608.02148.

38. T. Zhou, D. Tao, Godec: Randomized low-rank &sparse matrix decomposition in noisy case, in: In-ternational Conference on Machine Learning, ICML,2011, pp. 1–8.

39. D. Goldfarb, S. Ma, K. Scheinberg, Fast alternat-ing linearization methods for minimizing the sumof two convex functions, Mathematical Program-ming 141 (1-2) (2013) 349–382. doi:10.1007/s10107-012-0530-2.

40. F. D. la Torre, M. Black, A framework for robustsubspace learning, International Journal of Com-puter Vision 54 (1-3) (2003) 117–142.

41. J. N. Kutz, X. Fu, S. L. Brunton, Multiresolutiondynamic mode decomposition, SIAM Journal on Ap-plied Dynamical Systems 15 (2) (2016) 713–735.