[IEEE 2006 International Conference on Image Processing - Atlanta Marriott Marquis, Atlanta, GA, USA...

A BACKGROUND SUBTRACTION MODEL ADAPTED TO ILLUMINATION CHANGES

Julio Cezar Silveira Jacques Jr., Claudio Rosito Jung and Soraia Raupp Musse

Universidade do Vale do Rio dos Sinos - Graduate School of Applied ComputingAv. Unisinos, 950. Sao Leopoldo, RS, Brasil, 93022-000

ABSTRACTThis paper presents a new adaptive background model forgrayscale video sequences, that includes shadows and high-light detection. In the training period, statistics are computedfor each image pixel to obtain the initial background modeland an estimate of the image global noise, even in the pres-ence of several moving objects. Each new frame is then com-pared to this background model, and spatio-temporal featuresare used to obtain foreground pixels. Local statistics are thenused to detect shadows and highlights, and pixels that are de-tected as either shadow or highlight for a certain number offrames are adapted to become part of the background. Ex-perimental results indicate that the proposed algorithm caneffectively detect shadows and highlights, adapting the back-ground with respect to illumination changes.

Index Terms- Tracking, Background subtraction

1. INTRODUCTION

Object tracking is an active research subject in computer vi-sion, with several applications ranging from sportive perfor-mance evaluation to video surveillance. When static cam-eras are used, a popular approach is background subtraction,which consists of obtaining a mathematical model of the staticbackground, and comparing it with every new frame of thevideo sequence. However, illumination changes (e.g. shadowcast by a cloud) may cause the detection of numerous falseforeground pixels, and the background model should be adaptedaccordingly.

Several different background subtraction techniques havebeen proposed in the past years, some of them suitable forgrayscale images, and others for color images. This work pro-poses a new adaptive background subtraction algorithm formonochromatic video sequences, that is suitable for outdoorapplications that suffer from illumination changes.

2. RELATED WORK

A great variety of techniques for background subtraction andshadow detection have been proposed in the past years. In

E-mails: -Uulioj,crjung,soraiarm}@unisinos.br. This work was devel-oped in collaboration with HP Brazil R&D and Brazilian research agencyCNPq.

particular, most shadow detection algorithms rely on colorcues, and a just a few are suitable for monochromatic videosequences. Next, some techniques for grayscale video se-quences are described.

Rosin and Ellis [1] used a temporal median filter to obtainthe background model, and a hysteresis-based threshold toobtain foreground objects. They assume a roughly constantpixel ratio in shadowed regions, and apply a region growingalgorithm to detect object shadows. The background modelused in [1] is not adaptive, and may fail if several movingobjects are present in the training stage.

In their W4 system, Haritaoglu and collaborators [2] usedgrayscale images to build a background model, representingeach pixel by three values; its minimum intensity value, itsmaximum intensity value and the maximum intensity differ-ence between consecutive frames observed during the trainingperiod. W4 includes a background adaptation procedure, butit does not deal with shadows.

Elgammal et al. [3] used a nonparametric background modelrelying on kernel based estimators, that can be applied tocolor or grayscale images. However, for shadow detection,they need color information (the normalized rgb color space).

Chien et al. [4] built a background image from the accu-mulated frame difference information, and computed a mor-phological gradient of foreground objects. Then, regions withsmall gradients were characterized as shadows. As stated bythe authors, the shadow detection stage may find limitationswhen strongly textured background regions are present. Xu etal. [5] used the same background model as the one describedin [4], and proposed to solve the problem of textured back-grounds by using Canny edge maps, shadow region detec-tion by multi-frame integration, edge matching and conditiondilation. Despite the improvement in textured shadows, thistechnique faces limitations with relatively strong shadows inoutdoor applications.

Wang et al. [6] proposed a probabilistic approach for back-ground subtraction and shadow removal. In their approach, acombined intensity and edge measure is used for shadow de-tection, and temporal continuity is used to improve detectionresults. Results shown by the authors are good, but the de-termination of several parameters needed by their model in-crease the computational cost of the method.

Tian et al. [7] used an adaptive background model based

1-4244-0481-9/06/$20.00 C2006 IEEE 1817 ICIP 2006

on Gaussian mixtures, a local normalized cross-correlationmetric to detect shadows, and a texture similarity metric todetect illumination changes. Although illumination changesseem to be effectively detected by this method, such informa-tion is not used to update the background model.

3. THE PROPOSED MODEL

3.1. Initial Background Model

Let us denote It (x, y) the intensity of pixel (x, y) at frame t.The first stage of our method is to build an initial backgroundmodel during a training period. Let {I1(X, y), , IN(X, Y)}be N image frames used in the training period, and let A(x, y)and or(x, y) denote the median and standard deviation of eachpixel (x, y). Also, let us assume that noise introduced by theimage sensor is independent for each pixel, but has the samestatistical distribution.

In the training period, there may be moving objects thatinterfere with the computed median and standard deviation.In several applications, the number of pixels affected by ob-ject motion is much smaller than stationary pixels, and themedian of or(x, y) could be used as a good estimate of thestandard deviation of stationary background pixels, as no-ticed in [1]. However, there are situations where the numberof moving objects is considerably large (e.g. traffic moni-toring of a crowded highway), and the median would pro-vide a biased result. For example, let us consider the videosequence Highway II (available at http://cvrr.ucsd.edu/aton/shadow/data/highwayII-raw.avi). Thestandard deviation image o(X.z y) was obtained with a trainingperiod of 100 frames, with several cars moving along the road.One frame of this video sequence is shown in Figure l(a),and the histogram of o(x, y) is shown in Figure 1(b). Thishistogram shows a multimodal distribution, where the firstpeak is related to the standard deviation of background noise(stationary pixels), and the others are due to moving objects.Clearly, the median value (about 20) is far from the actualvalue of the average background noise (about 1.9).

(a) (hb

Fig. 1. (a) One frame of the sequence Highway II. (b) Thehistogram of the standard deviation image o(X , y).

To obtain a better estimate of background noise even invideo sequences containing a large number of moving objects,

we compute the histogram of u(X, y) in the training period,and retrieve the position of the first peak orm of the histogram.Then, a simple test to detect if a certain pixel It (x, y) belongsto the foreground would be:

It(x,y) C foreground if Itt(x,y) -A(x,y) > korm, (1)

where k is a parameter. In fact, a similar test was used in theW4 system [2], but their approach requires the minimum andmaximum values in the training period, as well as the maxi-mum interframe differences. Although such test is relativelyefficient, it does not consider two important facts when de-tecting foreground objects:

1. Spatial consistency: typically, foreground pixels do notappear isolated. In fact, they tend to appear in setsof connected points, representing valid foreground ob-jects.

2. Temporal consistency: as foreground objects move con-tinuously, they tend to occupy roughly the same portionin space in adjacent frames. Hence, if a pixel (x, y) isa valid foreground pixel at frame t, it is likely to appearas a foreground pixel at frame t + 1.

To contemplate hypothesis 1, a weighted average of valuesIt(x, y) -A(x, y) is computed in a 3 x 3 neighborhood cen-tered at pixel (x, y). Such average is computed through thefollowing 2D convolution:

Dt(x,y) It(x,y)- A(x,y) *A, A 16 [ 2

(2)It should be noticed that Dt(x, y) provides an average de-viation of values It (x, y) with respect to the median valuesA(x, y) in a small neighborhood centered at (x, y). To incor-porate the temporal constraint stated in hypothesis 2, a simpletemporal averaging procedure is applied:

(3)NDt (x, Y) =I

(Dt (x, Y) + Dt_ I (x, y)).

Finally, the proposed test for a foreground pixel is:

It(x, y) C foreground if NDt(x, y) > krm. (4)An experimental evaluation indicated that k = 6 providesa good compromise between misdetection and overdetectionof foreground regions. Figure 2 illustrates a comparison ofthe proposed technique with the background subtraction algo-rithm W4 [2] for the Highway II sequence, using 100 framesin the training stage for both methods. Since there are manymoving objects in the training period, the classification of sta-tionary pixels in W4 does not work very well, and severalobjects (or parts of objects) are misclassified as background.The proposed algorithm shows a better performance, but thetemporal continuity may "enlarge" detected foreground ob-jects in their flow direction, as illustrated in the middle-bottomimage.

1818

-.

t *

Fig. 2. Top: some frames extracted from the Highway IIvideo sequence. Middle: results using W4. Bottom: Resultsusing the proposed approach.

3.2. Shadow and Highlight Detection

The detection of cast shadows as foreground objects is verycommon in background subtraction algorithms, producing un-desirable consequences. For example, shadows can connectdifferent people walking in a group separately, generating asingle object (typically called blob) as output of backgroundsubtraction. In such cases, it is more difficult to isolate andtrack each person in the group. Also, global illuminationchanges (e.g. a cloud covering sunlight) may produce a largenumber of erroneous foreground pixels. In this work, we de-tect shadows and highlights by computing the local varianceof pixel ratios, as described next.

In shadowed regions, it is expected that a certain fractionoa of incoming light is blocked [3]. Although there are severalfactors that may influence the intensity of a pixel in shadow,we assume that the observed intensity of shadow pixels is di-rectly proportional to incident light; consequently, shadowedpixels are scaled versions (darker) of corresponding pixels inthe background model. An analogous reasoning can be as-sumed for highlighted regions.

Similarly to the approach described in [8], It (x, y)/A(x, y)is computed in a neighborhood around each foreground pixel(x, y), and the standard deviation of It(x, y)/A(x, y) withinthis neighborhood is obtained. Let R(x, y) denote a 3 x 3region centered at each foreground pixel (x, y). Such pixel isclassified as a shadowed pixel if:

stdR(jt Y) <Lstd and L.w < It V)) <1, (5)

where stdR '(x y) is the standard deviation of quantitiesIt(x, y)/A(x, y) over the region R, and LStd, Ll0w are thresh-olds. More precisely, LStd controls the maximum deviation

within the neighborhood being analyzed, and Ll0w preventsthe misclassification of dark objects with very low pixel in-tensities as shadowed pixels. A similar comparison is appliedto detect highlight pixels:

stdR ( A(x Y) < Lstd and 1 < ( ( ) < Lhigh, (6)

where Lhigh prevents the misclassification of very bright ob-jects as highlight pixels. LStd, Ll0w and Lhigh were deter-mined experimentally, by analyzing several video sequenceswith shadows and varying illumination conditions, obtainingLStd = 0.05, Llow = 0.5 and Lhigh = 1.3.

3.3. Background Adaptation

A common approach in background subtraction algorithms isto detect foreground pixels that remain static for some time,and to incorporate them to the background (e.g. an object thatis dropped in the scene). However, such approach can be mis-leading in surveillance applications, since a still person wouldbecome a part of the background (hence, not being detectedanymore).

In this work, we only apply background adaptation to thosepixels that are classified as shadows or highlight for a suffi-ciently large period of time. Let Ct(x, y) be a binary maskthat returns 1 if pixel (x, y) is classified as shadow (or high-light) at frame t, and 0 otherwise. Also, letM be the adapta-tion time (i.e. the number of frames used in the backgroundadaptation procedure). For each frame t, we compute:

t

At(x,y) = E Ci(,Y),i=t-M1+

(7)

which represents the number of times that pixel (x, y) wasdetected as shadow/highlight in the previous M frames. Thebackground model is recomputed at pixel (x, y) if:

(8)where 0 < p < 1 represents the minimum fraction of theadaptation period required for background adaptation. In thiswork, we used p = 0.8 andM = 50 frames.

4. EXPERIMENTAL RESULTS

This Section presents some experimental results of our adap-tive background subtraction algorithm with shadow/highlightsuppression. The example shown in Figure 3 illustrates threeframes of a video sequence containing shadows resulting fromdirect and indirect blocking of light. The second row showsdetected foreground objects (black), as well as pixels detectedas shadows (blue) or highlight (red). As it can be observed,shadows were correctly identified in these frames, and justa few isolated pixels were misclassified as shadows or high-lights. In this video sequence, illumination changes are local

1819

a

At (x, y) > pM.

and transient (i.e. shadows due to moving objects), and thebackground is not updated.

Fig. 3. Top: Frames extracted from a video sequence. Bot-tom: Foreground objects (black), shadow pixels (blue) andhighlight pixels (red).

Figure 4 illustrates a video sequence shot in a partiallycloudy day. During some frames, sunlight incidence was moreintense, resulting in brighter pixels (that are classified as fore-ground). As it can be observed, when the illumination is in-creasing, several foreground pixels are correctly classified ashighlights. After a few frames, such highlight pixels are in-corporated to the background, and are not detected as fore-ground pixels in the subsequent frames. However, it can beobserved that some dark background regions were not classi-fied as highlight after the illumination increase, because theratio It(x, y)/A(x, y) may exceed the threshold Lfigh = 1.3.

S W

Fig. 4. Top: Frames extracted from a video sequence. Bot-tom: Detected foreground pixels and background adaptation.

5. CONCLUSIONS

This work presented a novel adaptive background subtractionmethod for grayscale video sequences. A simple temporalmedian filter is used to generate an initial background model,and a global noise estimate is obtained requiring just a fewframes. Each new frame of the video sequence is comparedto the background model, and spatio-temporal constraints areused to detect foreground objects. A test based on the stan-dard deviation of pixel ratios is then applied to detect shad-

owed or highlighted pixels with respect to the background.Finally, pixels that are classified as shadow or highlight fora sufficiently large number of frames are adapted to becomepart of the background model.

Experimental results indicate that the proposed algorithmtends to produce foreground blobs with few holes, being ableto deal with local and global illumination changes. It is im-portant to notice that, in most adaptive background models,foreground pixels that remain unchanged for some time even-tually become part of the background (e.g. a box droppedon the floor). In this work, new objects inserted in the sceneare not absorbed by the background; only pixels that sufferedillumination changes are considered in the background adap-tation procedure.

6. REFERENCES

[1] Paul L Rosin and T. Ellis, "Image difference thresholdstrategies and shadow detection.," in 6th British MachineVision Conf:, Birmingham, 1995, pp. 347-356.

[2] I. Haritaoglu, D. Harwood, and L.S. Davis, "W4: Real-time surveillance of people and their activities," IEEETransactions on Pattern Analysis and Machine Intelli-gence, vol. 22, no. 8, pp. 809-830, August 2000.

[3] A.M. Elgammal, R. Duraiswami, D. Harwood, and L.S.Davis, "Background and foreground modeling using non-parametric kernel density estimation for visual surveil-lance," Proceeedings of the IEEE, vol. 90, no. 7, pp.1151-1163, 2002.

[4] S.-Y. Chien, S.-Y. Ma, and L.-G. Chen, "Efficient mov-ing object segmentation algorithm using background reg-istration technique," IEEE Transactions on Circuits andSystems for Video Technology, vol. 12, no. 7, pp. 577-586, 2002.

[5] D. Xu, X. Li, Z. Liu, and Y. Yuan, "Cast shadow detectionin video segmentation," Pattern Recognition Letters, vol.26, no. 1, pp. 5-26, 2005.

[6] Y. Wang, T. Tan, K.F. Loe, and J.K. Wu, "A probabilis-tic approach for foreground and shadow segmentation inmonocular image sequences," Pattern Recognition, vol.38, no. "', pp. 1937-1946, November 2005.

[7] Y.L. Tian, M. Lu, and A. Hampapur, "Robust and ef-ficient foreground analysis for real-time video surveil-lance," in IEEE Computer Vision and Pattern Recogni-tion, 2005, pp. I: 1182-1187.

[8] J. C. S. Jacques Jr., C. R. Jung, and S. R. Musse, "Back-ground subtraction and shadow detection in grayscalevideo sequences," in Proceedings of SIBGRAPI, Natal,Brazil, October 2005, pp. 189-196, IEEE Press.

1820

[IEEE 2006 International Conference on Image Processing - Atlanta Marriott Marquis, Atlanta, GA, USA...

Documents

Transcript of [IEEE 2006 International Conference on Image Processing - Atlanta Marriott Marquis, Atlanta, GA, USA...