Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use...

7
Crowd Abnormality Detection and Localization - A Matrix Decomposition Approach Vikas Gupta Department of Electrical Engineering Indian Institute of Science, Bangalore, India [email protected] Soma Biswas Department of Electrical Engineering Indian Institute of Science, Bangalore, India [email protected] ABSTRACT Abnormality detection in crowded scenes plays a very im- portant role in automatic monitoring of surveillance feeds. Here we present a novel framework for abnormality detec- tion in crowd videos. The key idea of the approach is that rarely or sparsely occurring events correspond to abnormal activities while the commonly occurring events correspond to the normal activities. Given an input video, multiple fea- ture matrices are computed which are decomposed into their low-rank and sparse components, out of which the sparse components correspond to the abnormal activities. The ap- proach does not require any explicit modeling of crowd be- havior or training. Localization of the anomalies is obtained as a by-product of the proposed approach by doing an in- verse mapping between the entries of the matrix and the pixels in the video frames. The method is very general and can be applied for both sparsely crowded as well as densely crowded scenes and it can be used to detect both global and local abnormalities. Experimental evaluation on two widely used datasets as well as some dense crowd videos downloaded from the web shows the effectiveness of the pro- posed approach. Comparison with several state-of-the-art crowd abnormality detection approaches show that the pro- posed method compares well as compared to the other ap- proaches. Keywords crowd, abnormality, sparse component 1. INTRODUCTION Anomaly detection in crowded scenes has become an im- portant topic of research in the Computer Vision commu- nity over the last few years. It is of considerable practi- cal interest in monitoring multiple surveillance feeds, allow- ing human security personnel to focus only on the abnor- mal ones, thus saving manpower. The varying density of objects in the crowded scenes, inter-object occlusion, low- resolution of the surveillance videos resulting in few num- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ICVGIP ’14, December 14 - 18 2014, Bangalore, India Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-3061-9/14/12 ...$15.00. http://dx.doi.org/10.1145/2683483.2683505. Figure 1: Example of normal (top row) and abnor- mal activities (bottom row). Left Column: local anomaly is a cart on a pedestrian walkway; Right Column: Global anomaly where the whole frame is anomalous as in a panic situation. ber of pixels per object makes the task extremely challeng- ing. Traditional methods based on object detection, track- ing of individuals, etc., suffer due to the aforementioned problems. Further, anomalous events can either be local or global. In local abnormal events, local regions of the video behave in a different way compared to its neighbors. For example in Figure 1 (left column) the local anomaly is a cart on a pedestrian walkway. Global abnormal events are those events in which the whole frame is abnormal, for example in a panic situation as in Figure 1 (right col- umn). Various approaches have been proposed over the last few years based on optical flow [1][10][19], trajectory model- ing [2][11], spatial-temporal context [3][4][16][17], mixtures of dynamic textures [5], sparse representation [6], social force model [7][13], etc. In most of the above approaches, the gen- eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd behavior is termed as anomaly. In this work, we present a novel method for abnormality detection in crowded videos based on extracting the sparse component of the video. The approach is based on the in- tuition that rarely or sparsely occurring events correspond to abnormal activities while the commonly occurring events correspond to the normal activities. The entire video is di-

Transcript of Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use...

Page 1: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

Crowd Abnormality Detection and Localization - A MatrixDecomposition Approach

Vikas GuptaDepartment of Electrical Engineering

Indian Institute of Science, Bangalore, [email protected]

Soma BiswasDepartment of Electrical Engineering

Indian Institute of Science, Bangalore, [email protected]

ABSTRACTAbnormality detection in crowded scenes plays a very im-portant role in automatic monitoring of surveillance feeds.Here we present a novel framework for abnormality detec-tion in crowd videos. The key idea of the approach is thatrarely or sparsely occurring events correspond to abnormalactivities while the commonly occurring events correspondto the normal activities. Given an input video, multiple fea-ture matrices are computed which are decomposed into theirlow-rank and sparse components, out of which the sparsecomponents correspond to the abnormal activities. The ap-proach does not require any explicit modeling of crowd be-havior or training. Localization of the anomalies is obtainedas a by-product of the proposed approach by doing an in-verse mapping between the entries of the matrix and thepixels in the video frames. The method is very general andcan be applied for both sparsely crowded as well as denselycrowded scenes and it can be used to detect both globaland local abnormalities. Experimental evaluation on twowidely used datasets as well as some dense crowd videosdownloaded from the web shows the effectiveness of the pro-posed approach. Comparison with several state-of-the-artcrowd abnormality detection approaches show that the pro-posed method compares well as compared to the other ap-proaches.

Keywordscrowd, abnormality, sparse component

1. INTRODUCTIONAnomaly detection in crowded scenes has become an im-

portant topic of research in the Computer Vision commu-nity over the last few years. It is of considerable practi-cal interest in monitoring multiple surveillance feeds, allow-ing human security personnel to focus only on the abnor-mal ones, thus saving manpower. The varying density ofobjects in the crowded scenes, inter-object occlusion, low-resolution of the surveillance videos resulting in few num-

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Request permissions from [email protected] ’14, December 14 - 18 2014, Bangalore, IndiaCopyright is held by the owner/author(s). Publication rights licensed toACM.ACM 978-1-4503-3061-9/14/12 ...$15.00.http://dx.doi.org/10.1145/2683483.2683505.

Figure 1: Example of normal (top row) and abnor-mal activities (bottom row). Left Column: localanomaly is a cart on a pedestrian walkway; RightColumn: Global anomaly where the whole frame isanomalous as in a panic situation.

ber of pixels per object makes the task extremely challeng-ing. Traditional methods based on object detection, track-ing of individuals, etc., suffer due to the aforementionedproblems. Further, anomalous events can either be localor global. In local abnormal events, local regions of thevideo behave in a different way compared to its neighbors.For example in Figure 1 (left column) the local anomaly isa cart on a pedestrian walkway. Global abnormal eventsare those events in which the whole frame is abnormal,for example in a panic situation as in Figure 1 (right col-umn). Various approaches have been proposed over the lastfew years based on optical flow [1][10][19], trajectory model-ing [2][11], spatial-temporal context [3][4][16][17], mixturesof dynamic textures [5], sparse representation [6], social forcemodel [7][13], etc. In most of the above approaches, the gen-eral idea is to use training data to model the normal crowdbehavior and any deviation from normal crowd behavior istermed as anomaly.

In this work, we present a novel method for abnormalitydetection in crowded videos based on extracting the sparsecomponent of the video. The approach is based on the in-tuition that rarely or sparsely occurring events correspondto abnormal activities while the commonly occurring eventscorrespond to the normal activities. The entire video is di-

Page 2: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

vided into spatio-temporal blocks and optical flow featuresare computed from each block. The features for the en-tire video are arranged in a matrix form which is furtherdecomposed into its low-rank and sparse components, outof which the sparse components correspond to the abnor-mal activity. The proposed approach does not involve anytraining or modeling of normal crowd behavior. Extensiveexperimental evaluation on two publicly available datasets,UMN [8] and UCSD [9] data and videos downloaded fromthe web shows the effectiveness of the proposed approach.In all the datasets, our approach performs better or compa-rable to the state-of-the-art methods.

In the next section, we will discuss the related work. Sec-tion 3 gives details of the proposed approach. Experimentalevaluation is given in Section 4. The paper ends with direc-tions for future research and conclusions.

2. RELATED WORKIn this section, we give pointers to the related work in the

literature. The task of anomaly detection is to detect devi-ations from normal crowd behavior. Motion based methodstypically find the motion information within spatio-temporalvolumes and then model the crowd behavior. Adam et al. [1]computes histograms of optical flow over local patches to as-sign probabilities to new patches. Kim et al. [10] use mixtureof probabilistic principal component analyzers to learn pat-terns of local optical flow to arrive at the model and then useMarkov Random Field to perform inference. Kratz et al. [3]use spatio-temporal gradients over local space time volumesto model crowd behavior and use HMM for anomaly detec-tion. Sun et al. [4] use a combination of spatio-temporalsaliency and motion vectors over spatio-temporal volumesto arrive at a descriptor called attractive motion disorder tomeasure the global intensity of anomalies in the scene anddirectly infer about the same without any training. Conget al. [6] uses multi-scale histogram of optical flow as thefeature descriptor and uses it as a basis for sparse represen-tation and given a new frame, its sparse reconstruction costis computed to infer about the normalcy of the frame.

Trajectory based methods like Ali et al. [2] compare densecoherent crowd flow to fluid flow. They compute the flowmaps using particle advection over the entire frame andmodel dense crowd using Lagrangian coherent structuresfor segmentation and anomaly detection; however it failsto model incoherent crowd motion. Wu et al. [11] extendthis method using chaotic invariants to overcome this prob-lem. Mehran et al. [7] use the idea of social force [12] tomodel crowd behavior by computing the interaction forcebetween particles and use Latent Dirichlet Allocation foranomaly detection. Raghavendra et al. [13] extend the ideaof social force and suggest a new method for particle advec-tion using particle swarm optimization for detecting globalanomalies. Recently, dynamic textures based methods arebeing used in which a video is modeled as a collection ofspatio-temporal patches which are taken as samples froma mixture of dynamic textures. Mahadevan et al. [5] usesmixture of dynamic textures to compute spatial and tem-poral anomaly maps which are combined to give the overallanomaly map used to detect anomalies. Wang et al. [15]use clustering to group pixels into activities and short videoclips into different interactions and propose different hierar-chical Bayesian models such as LDA, HDP and dual-HDPfor learning crowd behavior. Benezeth et al. [16] estimate a

co-occurrence matrix and associated Markov Random Fieldacross spatio-temporal volumes from a training video se-quence and classification of normal/abnormal is done us-ing likelihood ratio test. Boiman and Irani [17] extract 3Dbricks to represent spatio-temporal neighborhood as a fea-ture and use dynamic programming to classify frames asnormal or abnormal. Spatio-temporal shapes combined witha flow descriptor are also used as features for event detec-tion [18] followed by parts-based matching for recognizingevents. Andrade et al. [19] compute the optical flow fieldsand use principal component analysis to arrive at the fea-tures used for the learning the HMM’s. A video segmentis classified as normal or abnormal using current observa-tions likelihood. Most of the approaches discussed abovehave been applied on sparse crowds and whether they willperform well on both sparse as well as dense crowds is yetto be verified.

3. PROPOSED APPROACHHere we describe in detail the proposed approach for ab-

normality detection and localization in video. The proposedapproach is based on the idea that rarely or sparsely oc-curring events correspond to abnormal activities while thecommonly occurring events correspond to the normal activ-ities. In this work, we represent a video using a collectionof feature matrices and the sparse components of these ma-trices indicates the abnormal events. The overall work-flowof our approach is given in Figure 3. Details of the featurecomputation and decomposition of the matrix into low-rankand sparse component are given below.

3.1 Feature ComputationOptical Flow (OF) is the distribution of the apparent ve-

locities of objects in an image and the velocities of objectsin a video can be measured by estimating the OF betweenthe video frames. The OF can be computed by solving thefollowing objective function [20]

E(u,v) =∑i,j

{ρD(I1(i, j)− I2(i+ ui,j , j + vi,j))

+ γ[ρS(ui,j − ui+1,j) + ρS(ui,j − ui,j+1)

+ ρS(vi,j − vi+1,j) + ρS(vi,j − vi+1,j)]} (1)

Figure 2: Sample frames (left) and their optical flowmaps (right).

Page 3: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

Figure 3: Overview of our method: (a) input video frames, (b) optical flow maps, (c) computed featurematrices for the entire video, (d) extracted sparse components from the feature matrices using matrix de-composition, (e) final combined sparse matrix used for detecting and localizing anomalies, (f) Detected localabnormal events shown using bounding boxes for a frame from the UMN dataset.

Here u and v are the horizontal and vertical components ofthe OF which is estimated from the frames I1 and I2, γ is aregularization parameter, and ρD and ρS are the data andspatial penalty functions. More details of the implementa-tion can be found in [20]. Figure 2 shows two frames fromtwo different videos (left) and the corresponding optical flowmaps (right).

In this work, we use two OF features computed from thevideo, namely OF magnitude and direction. The input videois divided into M ×N sized blocks. A temporal smoothingover a few frames is applied in order to smooth out thespurious variations in OF. For each feature (i.e. magnitudeand direction), one feature matrix is computed. First, foreach frame, the mean optical flow magnitude of each blockis computed which are then stacked together to form onecolumn of the feature matrix F1. The same procedure isrepeated for all the frames, thus populating the feature ma-trix F1, where each column corresponds to each frame of thevideo. Each row of the feature matrix corresponds to onespatial location of the video for all the frames. For com-puting the feature matrix F2 corresponding to the opticalflow direction, we compute the dominant direction of eachblock using principal component analysis, which forms theentry of the feature matrix. Here also, frame k of the videocorresponds to the k − th column of the matrix. Each rowshows how the dominant direction of that spatial locationvaries along with the frames. Other features can also beseamlessly included in the proposed framework.

3.2 Anomaly DetectionGiven the feature matrices Fi, i = 1, 2, our goal is to ex-

tract the sparse component which will indicate the abnormalactivities in the video. It has been shown [21][22], that if amatrix F can be written as F = A+S, where A is a low-rankmatrix and the matrix S is sufficiently sparse (relative to therank of A), the low-rank matrix A can be exactly recovered

by solving the following optimization problem [22]:

minA,S ‖ A ‖? +λ ‖ S ‖1, where F = A+ S (2)

where ‖ . ‖? denotes the nuclear norm of a matrix which isthe sum of its singular values, ‖ . ‖1 denotes the sum of theabsolute values of matrix entries, and λ is a positive weight-ing parameter. Different algorithms for solving the matrixrecovery problem, like the Iterative Thresholding Approach,the Accelerated Proximal Gradient Approach, the Dual Ap-proach, Augmented Lagrange Multipliers (ALM), etc. havebeen proposed [22]. The ALM method has some advantagesover the other methods in terms of superior convergence,easier parameter tuning, etc. [22]. In this work, we use theinexact ALM Method given in Figure 4. Please refer to [22]for more details of the algorithm.

Matrix decomposition is applied to the two feature matri-ces Fi, i = 1, 2 separately which output the low rank Ai andsparse Si components of the matrices. The regularizationterm λ can be varied to change the sparsity of the sparsecomponent if required for modeling different scenes. Giventhe sparse components of the two feature matrices, we usea simple sum rule to obtain the combined sparse matrixS = |S1|+ |S2|. More sophisticated ways of combining themcan lead to improved performance.

We use the following rule to detect which frames are ab-normal.

Frame =

{Abnormal; if ∃ a block, S(block,frame)>T,

Normal; Otherwise

(3)i.e. For any frame, if there exists any block, for which thesparse component is above the threshold T , the frame is con-sidered abnormal, otherwise, the frame is considered normal.

3.3 Abnormality LocalizationAn advantage of the proposed approach is that it in ad-

dition to indicating which frames are abnormal, it also lo-

Page 4: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

Figure 4: Inexact ALM Method for recovering theSparse Component of a Matrix [22].

calizes the abnormality. Each block in the feature matrixcorresponds to a spatial location in the input video, andeach frame corresponds to each column of the matrix. Fromthe abnormal frames we can find the abnormal blocks in astraightforward manner using an inverse process of findingwhich blocks correspond to which pixels. Hence the local-ization of anomalies can be done easily and efficiently.

4. EXPERIMENTAL RESULTSWe evaluate the performance of the proposed approach on

two publicly available datasets, UMN dataset [8] and UCSDdataset [9]. We also perform experiments on two videosdownloaded from the web for evaluating our method ondense crowds. We show that the method performs very wellfor both sparsely crowded scenes as well as densely crowdedscenes.

4.1 Experimental DetailsGiven the input videos, first the feature matrices are com-

puted. For computing a feature matrix, the video is dividedinto spatio-temporal blocks of size 20×20×3 and the OF iscomputed for every block. We use the implementation givenby Sun et al. [20] for computing the OF. For each block, themean magnitude of OF and the dominant OF direction arecomputed. For each frame of the video, the features of theblocks are stacked to form one column of the correspond-ing feature matrix. Thus each column of the feature matrixrepresents the features of one frame of the input video. Thematrix is decomposed into the low rank and sparse com-ponent using the inexact ALM Method in Figure 4. Thesparse component for the two features is combined usingsimple sum rule, but better ways of combining them can beconsidered in future. For a 200 frame video of resolution238 × 158, 16 secs is taken for computing the optical flowfor each frame, and 6 seconds for computing the feature ma-

trices and decomposing it into its two components using anunoptimized MATLAB code in a laptop. We see that thethe majority of the time is taken for optical flow computa-tion and so a faster way of computing the optical flow canlead to significant decrease in the computational time of theproposed approach.

The performance of the different approaches is measuredusing the Receiver Operating Characteristic (ROC) curvewhich is given by true positive rate (TPR) vs. false positiverate (FPR) as given below

TPR =True positive

True positive+False negative(4)

FPR =False positive

False positive+True negative(5)

where true positive is the correctly labeled abnormal events,false negative is incorrectly labeled normal events, false pos-itive is incorrectly labeled abnormal events and true neg-ative is correctly labeled normal events. The ROC curvefor evaluating the proposed approach is generated by grad-ually varying the threshold T for the sparse matrix in (3)and plotting the corresponding TPR and FPR values. Wecompute the area under ROC curve (AUC) and Equal ErrorRate (EER) for the different approaches to compare theirperformance on the different datasets.

4.2 UMN DatasetThe UMN dataset [8] from the University of Minnesota

consists of 11 video sequences of three different crowded sce-narios including one indoor and two outdoor scenes. Eachsequence begins with normal behavior followed by abnormalcrowd panic. Figure 5 shows some sample abnormal framesfrom the three scenes.

Figure 6: ROC curves for the three different scenesof UMN dataset for the proposed approach and dif-ferent state-of-the-art approaches.

The whole frame contains abnormal activity, thus we useit for global abnormality detection. There are a total of7740 frames (1450, 4415 and 2145 for scenes 1, 2 and 3respectively) and the frames are of size 240× 320.

We have compared the performance of the proposed ap-proach with several state-of-the-art approaches for crowdabnormality detection. Specifically, we have compared with

Page 5: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

!htb

Figure 5: Three abnormal frames from the three scenes of the UMN dataset. The red blocks indicates theabnormal regions detected by the proposed approach.

Table 1: Result on UMN Dataset

Method Area Under ROC

Optical Flow [7] 0.84Social Force Model [7] 0.96

Mixture of Dynamic Textures [5] 0.9965Chaotic Invariants [11] 0.99

SRC [6] Scene 1 0.995SRC [6] Scene 2 0.975SRC [6] Scene 3 0.964Ours Scene 1 0.9997Ours Scene 2 0.971Ours Scene 3 0.9834

Social Force Model (SFM) [7], Chaotic invariants of lagrangianparticle trajectories (Chaotic Invvariants) [11], Mixture ofDynamic Textures Model (MDT) [5] and Sparse Reconstruc-tion Cost (SRC) [6]. Figure 6 shows the ROC curves for the3 different scenes for the proposed approach and all the com-pared approaches. Table 1 shows the AUC obtained for allthe approaches. All the results for the other approaches havebeen taken from [6]. We see from the above results that theproposed approach performs favorably as compared with thestate-of-the-art approaches. Figure 5 shows the result of ab-normality localization in abnormal frames (using red blocks)taken from the three different scenes. We see that in addi-tion to being able to correctly classify the whole frames asnormal or abnormal, the proposed approach is able to local-ize the abnormality quite well.

4.3 UCSD DatasetWe now evaluate the performance of the proposed ap-

proach on the UCSD dataset [9]. The UCSD dataset is ac-quired by a stationary camera mounted at an elevation, over-looking pedestrian walkways. The density of crowds variesfrom sparse to very crowded. It is subdivided into two sets:”Ped1” and ”Ped2”. The Ped1 dataset consists of groups ofpeople walking towards and away from the camera. It has34 training clips and 36 testing clips and each clip has 200frames of resolution 238×158. The Ped2 dataset is of a scenewhere most pedestrians move horizontally. It has 16 train-ing clips and 12 testing clips with varying number of framesranging from 120 to 180. The frames are of size 360 × 240.All the clips are annotated with frame level ground truth.The UCSD dataset has localized anomalies and the com-mon anomalies are bikers, skaters, small carts etc. So, weuse this dataset to evaluate the performance of the proposedapproach for detecting local abnormalities.

Currently the proposed approach does not require anytraining, so we use only the testing clips. Figure 7 (left)and (right) shows the ROC curves for Ped 1 and Ped 2 re-spectively for the proposed method compared with differentother approaches. In addition to the approaches comparedwith in the UMN dataset, here we also compare with the ap-proach using multiple fixed-location monitors (MFLM) [1],mixture of optical flow models (MPPCA) [10] and Attrac-tive Motion Disorder Descriptor (AMD) [4]. We see that theproposed approach compares well with the other approaches,For Ped1 data, it is second to SRC in terms of EER, whilefor Ped2 data, it performs significantly better then all theother approaches for both the performance metrics consid-ered. The EER and the AUC of all the approaches is givenin Table 2. Results for all the other approaches have beentaken from [5]. The proposed approach also performs quitewell in localizing the abnormalities as shown in Figure 8 forfew frames of the UCSD dataset. The detected abnormali-ties are shown using bounding boxes and correspond quitewell with the actual abnormalities in the frames (shown us-ing blue).

4.4 Dense Crowd DataWe also test the performance of our algorithm on two

video clips of dense crowds downloaded from the web. Thefirst video is of a marathon in which people are runningstraight and the second video is that of a U-turn, one frameof which are shown in Figure 9. In both the videos, wemodified a few frames to introduce a anomaly. Since forboth the videos, the people are running, one anomaly canbe that some people suddenly stop running and we modelthis scenario using the synthetic anomaly. The total clipcontains 144 and 252 frames for the two videos respectively.We take a 40×40 block in both the videos (shown using theblocks in Figure 9) and fix it for the last 50 frames of bothvideos, thus modeling a group of people who suddenly stoprunning creating a local anomaly. Figure 10 shows the ROCfor the two videos using the proposed approach. The AUCfor the two videos is also shown in Figure 10. We see thatthe algorithm performs well in both the real videos.

5. FUTURE DIRECTIONS AND CONCLU-SIONS

In this work, we have proposed an approach for abnor-mality detection and localization in crowded scenes. Themethod is general and can be applied to sparse as well asdensely crowded scenarios, and also it can be used to detectboth local as well as global abnormal events. Experimental

Page 6: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

Figure 7: ROC curves for UCSD Ped 1 (left) and Ped 2 (right) datasets respectively. The proposed approachis compared with several state-of-the-art approaches for crowd abnormality detection.

Table 2: Result on UCSD Dataset

Method EER EER EER AUC AUCPed 1 Ped 2 Average Ped 1 Ped 2

MFLM [1] 38 % 42 % 40 % 64.90 % 63.05 %Social Force Model [7] 31 % 42 % 37 % 76.65 % 63.30 %

MPPCA [10] 40 % 30 % 35 % 66.70 % 76.90 %Mixture of Dynamic Textures [5] 25 % 25 % 25 % 83.35 % 84.65 %

Sparse [6] 19 % NA 19 % 89.60 % NAAMD [4] 27 % NA 27 % 79.19 NA

Ours 24 % 8 % 16 % 81.67 % 96.9 %

Figure 8: Detected anomalies shown using red bounding boxes for a few frames of the UCSD data. Theground truth abnormalities are shaded in blue (best viewed in color).

Figure 9: Two dense crowd videos downloaded fromthe web in which synthetic anomaly is introduced(in the block marked) modeling a group of peoplewho suddenly stop running.

evaluation on three datasets and comparison with state-of-the-art approaches for abnormality detection justifies theeffectiveness of the proposed approach. Except for the fea-ture computation, the other steps of the proposed approach,namely computing the sparse component and localizing theabnormality is quite fast.

The proposed approach can be extended in different ways.Currently, the approach uses just two OF features, namelymagnitude and direction. So the method fails to detect theabnormalities in videos like Figure 11 where the person walk-ing on the grass is the abnormality. In future, we plan toincorporate more features like visual saliency into the pro-posed framework. Currently, we do not use any trainingdata. In future, we plan to incorporate training data tolearn the optimal value of the sparsity parameter λ. Wealso plan to explore accurate and fast methods of comput-

Page 7: Crowd Abnormality Detection and Localization - A Matrix … · 2017-05-09 · eral idea is to use training data to model the normal crowd behavior and any deviation from normal crowd

Figure 10: ROC curves for the dense crowd videos.The AUC values for the two ROC’s are also shown.

Figure 11: Sample frames from two video clips ofUCSD data on which the proposed approach fails todetect the abnormality (person walking on grass).

ing optical flow so that the proposed approach can be madefaster and can be employed in real-time scenarios.

6. REFERENCES[1] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz,

“Robust real-time unusual event detection usingmultiple fixed-location monitors,” IEEE Transactionson Pattern Analysis and Machine Intelligence, vol. 30,no. 3, pp. 555–560, 2008.

[2] S. Ali and M. Shah, “A lagrangian particle dynamicsapproach for crowd flow segmentation and stabilityanalysis,” in IEEE Conference on Computer Visionand Pattern Recognition, 2007, pp. 1–6.

[3] L. Kratz and K. Nishino, “Anomaly detection inextremely crowded scenes using spatio-temporalmotion pattern models,” in IEEE Conference onComputer Vision and Pattern Recognition, 2009, pp.1446–1453.

[4] X. Sun, H. Yao, R. Ji, X. Liu, and P. Xu,“Unsupervised fast anomaly detection in crowds,” inACM international conference on Multimedia, 2011,pp. 1469–1472.

[5] W. Li, V. Mahadevan, and N. Vasconcelos, “Anomalydetection and localization in crowded scenes,” IEEETransactions on Pattern Analysis and MachineIntelligence, vol. 36, no. 1, pp. 18–32, 2014.

[6] Y. Cong, J. Yuan, and J. Liu, “Abnormal eventdetection in crowded scenes using sparserepresentation,” Pattern Recognition, vol. 46, no. 7,pp. 1851–1864, 2013.

[7] R. Mehran, A. Oyama, and M. Shah, “Abnormalcrowd behavior detection using social force model,” in

IEEE Conference on Computer Vision and PatternRecognition, 2009, pp. 935–942.

[8] “Unusual crowd activity dataset of University ofMinnesota available athttp://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi,.”

[9] “UCSD Dataset obtained fromhttp://www.svcl.ucsd.edu/projects/anomaly/dataset.html,.”

[10] J. Kim and K. Grauman, “Observe locally, inferglobally: a space-time mrf for detecting abnormalactivities with incremental updates,” in IEEEConference on Computer Vision and PatternRecognition, 2009, pp. 2921–2928.

[11] S. Wu, B. E. Moore, and M. Shah, “Chaotic invariantsof lagrangian particle trajectories for anomalydetection in crowded scenes,” in IEEE Conference onComputer Vision and Pattern Recognition, 2010, pp.2054–2060.

[12] D. Helbing and P. Molnar, “Social force model forpedestrian dynamics,” Physical review E, vol. 51,no. 5, p. 4282, 1995.

[13] R. Raghavendra, A. Del Bue, M. Cristani, andV. Murino, “Optimizing interaction force for globalanomaly detection in crowded scenes,” in IEEEConference on Computer Vision Workshops, 2011, pp.136–143.

[14] A. B. Chan and N. Vasconcelos, “Mixtures of dynamictextures,” in IEEE Conference on Computer Vision,vol. 1, 2005, pp. 641–647.

[15] X. Wang, X. Ma, and W. E. L. Grimson,“Unsupervised activity perception in crowded andcomplicated scenes using hierarchical bayesianmodels,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 31, no. 3, pp. 539–555, 2009.

[16] Y. Benezeth, P.-M. Jodoin, V. Saligrama, andC. Rosenberger, “Abnormal events detection based onspatio-temporal co-occurences,” in IEEE Conferenceon Computer Vision and Pattern Recognition, 2009,pp. 2458–2465.

[17] O. Boiman and M. Irani, “Detecting irregularities inimages and in video,” International Journal ofComputer Vision, vol. 74, no. 1, pp. 17–31, 2007.

[18] Y. Ke, R. Sukthankar, and M. Hebert, “Eventdetection in crowded videos,” in IEEE InternationalConference on Computer Vision, 2007, pp. 1–8.

[19] E. L. Andrade, S. Blunsden, and R. B. Fisher,“Modelling crowd scenes for event detection,” inInternational Conference on Pattern Recognition,vol. 1, 2006, pp. 175–178.

[20] D. Sun, S. Roth, and M. J. Black, “Secrets of opticalflow estimation and their principles,” in IEEEConference on Computer Vision and PatternRecognition, 2010, pp. 2432–2439.

[21] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robustprincipal component analysis?” Journal of the ACM,vol. 58, no. 3, p. 11, 2011.

[22] Z. Lin, M. Chen, and Y. Ma, “The augmentedlagrange multiplier method for exact recovery ofcorrupted low-rank matrices,” UIUC technical Report,UILU-ENG-09-2215, 2010.