[IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India...

5
Robust Multiple Target Tracking Under Occlusion Using Fragmented Mean Shift and Kalman Filter Gargi phadke Indian Institute of technology Bombay Mumbai,India 400076 Email: [email protected] Abstract- Object tracking is critical to visual surveillance and activi analysis. The major issue in multiple visual target tracking is occlusion handling. In this paper, we investigate how to improve the robustness of visual tracking method for multiple target tracking with occlusion. Here we propose weighted fragment based mean shiſt with Kalman filter with the consideration of color features of the target. Discrete wavelet transform is used to detect the target automatically. Inter frame difference of LL-subband is used for detection of the target. Automatic fragments are acquired by calculating the mean and standard deviation of detected target. Here the weighted fragments are derived from the likelihood function of foreground and background of that particular fragment using color histogram. The output of weighted fragmented mean shift is updated with the help of Kalman filter. The Proposed tracking algorithm has been tested on several challenging videos of different situations and compared with mean shiſt method using Bhattacharyya coefficients and Bhattacharyya distance. Extensive experiments authenticate the robustness and reliability of the proposed method. I. INTRODUCT ION Object detection and tracking are important in any vision based surveillance system. It is most demanding research area in computer vision with applications such as video surveillance, robot localization, driver assistance, fast ob- ject motion, change in appearance and cluttering. Occlu- sion makes tracking of object in video a challenging task. Generally, acking algorithm can be categorized into two major groups: mean shift tracker [1] and particle filtering based tracker. Although the particle filter based tracking is robust, but high computational cost resicts its application in real time scenarios. The mean shift tracker proposed by Comaniciu et al. [ 1], has advantage of low complexity, robustness and invariance to object deformations. Various approaches have been proposed to address the issue to overcome the drawbacks of the mean shift method. Mean shiſt acker is fail in fast motion, illumination changes and occlusion. laz et.al [ 16] gave the detail literature survey of the video tracking methods. Yang presented a new similarity measures between the target and the tar- get candidate, in place of the Bhattacharyya meic [2]. [3]and [4] introduced modification in mean shiſt to improve video tracking, This approach accounts for partial occlusion and pose changes by representing the target in multiple agments. Jeykar et al. also used fragmentation, [6], [7] and multiple features condition [ 13]. [ 12] used mean shift with initialization of particle filtering for multiple target acking. In the following section, we describe method of automatic detection of tget. In section l, Tget appearance model is explained in detail. In section IV foreground separation is done using likelihood ratio. It is followed by detail description of fragmented weighted mean shift and Kalman filter to update the target center. The last section deals with results and discussion. Future work is described in section VIII. II. TARGET IDENTIFICATION Initialization of target is an important process in the track- ing. Most of the time it is done manually. To make system automatic, target should be detected automatically. Here we have used wavelet based method described in [9] with proposed modification to reduce computational cost. Discrete Wavelet Transform (DWT) is adopted to detect the moving target. Most of the unwanted motion in the background can be decomposed into higher frequency subbands. We decompose the image into sub images using two dimensional DWT till level 3. The low equency subband (approximate band or LL band) is used for further processing to reduce computing cost. This helps in making the approach less susceptible to noise, as the approximate band has less noise compared to original image. Following subsection describes the detection method. A. Inter Fme Dference Let Diff(x,y) be the inter-ame difference between ap- proximate bands of two neighboring ames, where (x,y) is the position of the wavelet coefficient as in equationl. D(x,y)is calculated using equation 2, generating a binary image D. Diff(x,y) will have a value less than the threshold when the object is not moving. Diff(x,y) = ILL [ 3 ] n(X,y) - LL [ 3 ] n-l(X,y) 1 ( 1) D(x,y) = {I, if Diffx,y) > T (2) 0, otherwIse Where, LL [ 3 ] n(X,y) is wavelet coefficient of LL band of ame 'n' at position (x,y). Value of threshold T is found empirically. The value of D(x,y) = 1, indicates motion; this helps in identifying the moving target automatically as the tget will 978- 1-4244-9799-7/ 1 11$26.00 ©20 11 IEEE 517

Transcript of [IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India...

Page 1: [IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India (2011.02.10-2011.02.12)] 2011 International Conference on Communications and Signal

Robust Multiple Target Tracking Under Occlusion Using Fragmented Mean Shift

and Kalman Filter Gargi phadke

Indian Institute of technology Bombay Mumbai,India 400076

Email: [email protected]

Abstract- Object tracking is critical to visual surveillance and activity analysis. The major issue in multiple visual target tracking is occlusion handling. In this paper, we investigate how to improve the robustness of visual tracking method for multiple target tracking with occlusion. Here we propose weighted fragment based mean shift with Kalman filter with the consideration of color features of the target. Discrete wavelet transform is used to detect the target automatically. Inter frame difference of LL-subband is used for detection of the target. Automatic fragments are acquired by calculating the mean and standard deviation of detected target. Here the weighted fragments are derived from the likelihood function of foreground and background of that particular fragment using color histogram. The output of weighted fragmented mean shift is updated with the help of Kalman filter. The Proposed tracking algorithm has been tested on several challenging videos of different situations and compared with mean shift method using Bhattacharyya coefficients and Bhattacharyya distance. Extensive experiments authenticate the robustness and reliability of the proposed method.

I. INTRODUCTION

Object detection and tracking are important in any vision based surveillance system. It is most demanding research area in computer vision with applications such as video

surveillance, robot localization, driver assistance, fast ob­ject motion, change in appearance and cluttering. Occlu­sion makes tracking of object in video a challenging task. Generally, tracking algorithm can be categorized into two major groups: mean shift tracker [1] and particle filtering based tracker. Although the particle filter based tracking is robust, but high computational cost restricts its application

in real time scenarios. The mean shift tracker proposed by Comaniciu et al. [1], has advantage of low complexity, robustness and invariance to object deformations.

Various approaches have been proposed to address the

issue to overcome the drawbacks of the mean shift method. Mean shift tracker is fail in fast motion, illumination changes and occlusion. Yilaz et.al [ 16] gave the detail literature survey of the video tracking methods. Yang presented a new similarity measures between the target and the tar­get candidate, in place of the Bhattacharyya metric [2]. [3]and [4] introduced modification in mean shift to improve

video tracking, This approach accounts for partial occlusion and pose changes by representing the target in multiple fragments. Jeykar et al. also used fragmentation, [6] , [7] and multiple features condition [13]. [ 12] used mean shift with

initialization of particle filtering for multiple target tracking. In the following section, we describe method of automatic

detection of target. In section ill, Target appearance model is explained in detail. In section IV foreground separation is done using likelihood ratio. It is followed by detail description of fragmented weighted mean shift and Kalman filter to update the target center. The last section deals with results and discussion. Future work is described in section VIII.

II. TARGET IDENTIFICATION Initialization of target is an important process in the track­

ing. Most of the time it is done manually. To make system automatic, target should be detected automatically. Here we have used wavelet based method described in [9] with

proposed modification to reduce computational cost. Discrete

Wavelet Transform (DWT) is adopted to detect the moving target. Most of the unwanted motion in the background can be decomposed into higher frequency subbands. We decompose the image into sub images using two dimensional

DWT till level 3. The low frequency subband (approximate band or LL band) is used for further processing to reduce computing cost. This helps in making the approach less susceptible to noise, as the approximate band has less noise compared to original image. Following subsection describes the detection method.

A. Inter Frame Difference Let Diff(x,y) be the inter-frame difference between ap­

proximate bands of two neighboring frames, where (x,y) is the position of the wavelet coefficient as in equationl. D(x,y)is calculated using equation 2, generating a binary image D. Diff(x,y) will have a value less than the threshold

when the object is not moving.

Diff(x,y) = ILL[3]n(X,y) - LL[3]n-l(X,y)1 ( 1)

D(x, y) =

{I, if Diff�x,y) > T (2)

0, otherwIse

Where, LL[3]n(X, y) is wavelet coefficient of LL band of frame 'n' at position (x,y). Value of threshold T is found empirically.

The value of D(x, y) = 1, indicates motion; this helps in identifying the moving target automatically as the target will

978-1-4244-9799-7/111$26.00 ©20 11 IEEE

517

Page 2: [IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India (2011.02.10-2011.02.12)] 2011 International Conference on Communications and Signal

3

4

5

2

6

Fig. 1. 8 neighbors of 'p'pixeJ

o

7

be 'white' in the binary image D. But the value of the pixel near the center of the moving object is almost zero giving 'black' pixels in the center of the target like a 'hole' , in the binary image D. We filled this hole Using morphological closing operation. In the resultant binary image,the region which belongs to the target is 'white' as shown in Figure 2-(a).

B. Image Labeling

Scan the binary image D(x,y) pixel by pixel, from left to right and top to bottom. Let 'P' denote the pixel at any step in scanning process. Out of 8 neighboring pixels shown in

Figure 1, we consider only four pixels at at time. It is useful

for finding the extreme left and right corner of the target. The Labeling procedure as described in [19].

Extreme left corner is detected by considering pixel neigh­

bors 1,2,3,4 as shown in Figure 1, values for all this point should be zero. Similarly for extreme right we consider pixel neighbors 0,7,6,5, as shown in Figure 1, values of this points should be zero. In this way we can find out multiple targets as given in equation 3 and 4. Using inter band spatial relationship of discrete wavelet transform, extreme points can be determined in the original image.

(3)

(4)

Here "i" gives number of moving object. B:"in left top of

the corner of the image and B:"ax is the right-bottom corner cardinalate of moving object i. Figure 2(b) gives the extracted targets with box.

III. TARGET ApPEARANCE MODELING

The previous section gave the automatic detection of the target, in this section we describe development of the target model. To improve the performance of the mean shift tracker fragmented model is developed. Here fragmentation is done automatically.

Fig. 2. (a)Inter frame difference of image (b )detected target with bounding box (c)Fragmented target(d)Target with foreground and background method

A. Fragmentation Fragmention is an important issue to improve the robust­

ness of the trackers. The choice of the fragment is not specially restricted. But less distinctive fragment may not reflect direct motion of the target. So to get proper fragments we have used vertical projection method is used. The mean of the each row of the extracted target region is calculated,to form vector v v = (vI, v2, v3, ....... vn) [7]. Here n is the height of the target. Than using mean vector 'v, find out the new vector'sv' which is the difference between next and previous value of mean as in equation 5.

SVi = IVi+l - Vi-l l (5)

Using equation 5 we get the vector

SV = (svI, sv2, sv3 ......... svn). The fragment extracted using threshold value as in equation 6.

T = mean(SV) + a * std(SV) (6)

Here a is a constant, (we have tkaen a=2) and std is standard deviation. Using the threshold value points is segmented vertically. The result are as shown in Figure 2( c ).Mininmum

numbers of fragments should be four to achieve robustness. As the target is extracted and fragmented, target model is developed considering the color histogram feature as given in [I].

IV. WEIGHTED MEAN SHIFT TRACKER

Basic mean shift tracker considers only foreground fea­tures of target and target candidate [ 1] . In the proposed

method foreground and background in fragments are used for features extraction.

A. Foreground Feature Extraction Likelihood of the color being found in the foreground

of the region of interest is calculated [6]. Two windows means, two area are considered. One window is foreground

window which represent target. Another window is called as the background window, The area which is twice of the fore­ground window(target). It indicates we have to considered area around the target, as shown in Figure 2(d). The joint

518

Page 3: [IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India (2011.02.10-2011.02.12)] 2011 International Conference on Communications and Signal

color histogram hob and hbg in RGB space are calculated over target and background window. Here hob is histogram of the target andhbg is histogram of the background. L(Xi) the 'likelihood' of probability of the periocular pixel belongs to the foreground and can be given as

L(Xi) = log max (hob [b(Xi)], E) max(hbg[b(Xi)], E) (7)

where b(Xi) is color map of the pixel at Xi and E is included in equation to avoid numerical instability. The likelihood is thresholded to identify if the pixel belongs to foreground or not as given in equation (8).

T(Xi) = {I, if L(Xi? > Th

0, otherwIse (8)

This likelihood is integrated with basic mean shift target model. The threshold used in our experiment is 0.8.

B. Likelihood Weight Calculation Unlike the basic mean shift tracker here we add the weight

[6]. Let Lube the likelihood calculated for Uth histogram bin. It gives major of probability to that particular color. But L is may be positive or negative so avoid it, use sigmoid function Auas given in equation 9.

1 Au = max(l - _(I -a) ,.1) (9) l+exp[-'b-] 'a' is based on the foreground region and 'b' controls the

slope of mapping. Here we considered values (a,b)=(l,l).

c. Mean Shift Vector The target model is modified as in equation 10, using

computed weight in equation 9. m

ik = C 2: k(l lxiI12)Au8[b(Xi) - u] (10) i=1

and the target candidate considering color is computed as follows

m

Pu = C 2: k(l lxiI12)Au8[b(Xi) - u] ( 11) i=1

Unlike the basic mean shift tracker here we have con­sidered fragments of the individual target and mean shift tracking is separately applied on each individual fragments of target. Since vector Pu and ikis same length, Bhattacharyya

distance is still a valid metric. Hence we use the bhat­tacharyya coefficient (p) as given in equation 12 and the Bhattacharyya distance as given in equation 13.

m

P= 2: vPuik (12) u=1

( 13)

The calculation of mean shift vector and the tracking is done as in [1] . The new center for the fragments of the target is as given in equation 14

"n x'w'g(YO-Xi)2 6,=1 " h Y = "n x'g(�)2 6,=1 ' h V. FRAGMENT BASED WEIGHTED MEAN SHIFT

TRACKER

(14)

The previous section explained how to improve the basic mean shift tracker considering color feature of entire target. The fragmented mean shift tracker is used [6] , [7], [ 11] to handle partial occlusion condition but they consider only the maximum Battacharyya coefficient for fragment center updation. Whether we considere all the centers for finding the new center for total target as given in equation 15.

� Pi(Yi - di) center = L '--:..:..j-::----'-i=1 2:i=1 Pi

(15)

Here 'i' indicate the fragments number, 'f' indicate total

number of fragments and the 'd' is the distance of the fragment from the center of the target. This distance is always remain same. P gives bhattacharyya coefficients for all frag­ments. This new center is updated using Kalman filtering. It is explained in next section. Now using the final center calculate new centers for all fragments for initialization of the centers of all fragments for next frame. This values is considered for further process. Y jO is updated centers for fragments, d is distance of fragment center from target center, Y j is the new centers for fragments as given in equation 16.

Yj = YjO +di (16)

VI. UPDATION USING KALMAN FILTER

We decided to use the Kalman filter because it uses the

motion of the model which adds the robustness to the tracker. The Kalman filter can be described by two equations, the state equation and the measurement equation [15].

x(k) = A(k - l)x(k -1) + B(k)w(k) (17)

z(k) = c(k)x(k) + v(k) (18) x(k) is the state vector, z(k) is the measured value at time k, A(k-l) is state transition matrix and B(x) is control matrix. In this paper A is a velocity constant matrix, c(k) is measurement matrix, V(k) and w(k) is the noise(assume to be gaussian distributed with zero mean). In this paper,the state vector is

x = [x,x',y,y']T (19)

X and Y represent the x and y coordinate component of

the target central position (respectively). x' and y' represent the x and y coordinate velocity component of target central position. The measured value is z(k).

(20)

Here the measured value for the Kalman filter is the updated center of which is obtained from the weighted mean shift tracker. Here Xc = X coordinate of center and Yc = Y coordinate of the center. We update the final center position of the target to handle occlusion and the motion of the target Using Kalman filter.

519

Page 4: [IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India (2011.02.10-2011.02.12)] 2011 International Conference on Communications and Signal

(e) (d)

Fig. 3. Failure of the basic mean shift tracker for challenge 1 :(a) Frame 379 (b) Frame 419 (c) Frame 435 (d) Frame 455

Fig. 4. Output of Proposed Method for challenge 1: (a) Frame 379 (b) Frame 419 (c) Frame 435 (d) Frame 455

VII. RESULTS AND DISCUSSION

In this section, we provide implementation details and show the results on the number of challenging sequences. After detection of target, it is automatically fragmented based on standard deviation and mean of target. The Kalman filter initialization is done. The background is considered as double of foreground of the target.

Firstly using the mean shift method tracking is described in [1] it shown in Figure 3The basic mean shift tracker fails for challenge one video whrere two persons is crossing each other, their distance from camera is same.

Figure (4)shows the proposed method tracks successfully for same sequence. It shows that proposed method handle occlusion and motion condition unlike basic mean shift

tracker.

For comparison we use Bhattacharyya coefficient and Bhattacharyya distance as given in equation 12 and 13 graphically shown in Figure 5. When the tracker works properly Bhattacharyya coefficient will be maximum and distance will be reduced. Here we have used 100 frames with target size( white shirt man) is 30 x 100 and another target (yellow shirt man ) 24 x 80.

To check the robustness of the proposed method, we considered different sequences of challenging video. Next sequence is taken from caviardata [17] as a challenge

iu 1 l or ]

• • • • • _;�.m II)

• •

i_&fI'I9tM!qII\ -1iI •• l

U 1-1iII1I'9tM1rJII:!

P &.�I � Jjf1t�.l l"r n�t� i " 0' \

V l'I I I I I�l lt� • • .... , ...

�)

Fig. 5. Comparison Of The Tracking Performance Of Two Algorithms for challenge 1 :(a)Bhattcharyya coefficients(b )Battcharyya distance

(d)

Fig. 6. Output of Proposed Method for Caviar Video Database Challenge 2: (a) Frame 1937 (b) Frame 2006 (c) Frame 2030 (d) Frame 2036

two. Here three persons are walking, it can handle double occlusion for man(wearing colorful shirt). The Proposed method properly work as shown in Figure 6.

Figure 7will show the comparison of the Bhattacharyya coefficients and distance for above video. Here we use 120 frames for process target size are 24 x 80 (for woman) and second target( man) is 28 x 100.

The sequence is taken from PETS2006 [18] database as challenge three. In this one person coming towards the camera and another is going away from the camera and in between they overlap each other. This is where the mean shift fails but proposed method tracks properly as shown in Figure 8.

Here the Figure 9 gives the comparison for above sequence

considering Bhattacharyya coefficient and distance between the proposed method and basic mean shift method. Total frames are 50 and target size (yellow shirt man) 12 x 40 and

14 x 50 for the back shirt man.

VIII. CONCLUSION AND FUTURE WORK

In this work, we have proposed simple but effective method for handing the occlusion and motion. We have considered different sequences wth different challenges. Though the conditions are challenging, the proposed method performs successfully. It worked for multiple occlusion in

520

Page 5: [IEEE 2011 International Conference on Communications and Signal Processing (ICCSP) - Kerala, India (2011.02.10-2011.02.12)] 2011 International Conference on Communications and Signal

..

u •

., • �) ... , ... (,)

Fig. 7. Comparison Of The Tracking Performance Of Two Algorithms For Caviar Video Database:(a)Bhattcharyya coefficients(b)Battcharyya distance

Fig. 8. Output Of Proposed Method For PETS2006 as challenge 3: (a) Frame 2417 (b) Frame 2447 (c) Frame 2450 (d) Frame 2460

II) (b)

Fig. 9. Comparison of The Tracking Performance of Two Algorithms For PETS2006 as challenge 3:(a)Bhattcharyya coefficients(b)Battcharyya distance

challenge two and total occlusion in challenge three. Some limitations of the proposed method are it not adap­

tive for handing the scaling and orientation. Illumination parameter are not considered. We plan to address this issue to make the method more effective and robust in future.

[1]

[2]

[3]

REFERENCES

D.Comaniciu,V.Ramesh and P.Meer, "Real time tracking of non regid objects using mean shift".proc.conference on computer vision and pattern recognation(CVPR), PPI42-149, June2000 . C.Yang,Ramani D,L.Davis "Efficient mean shift tracking via a new similarity measure".proc.conferece on computer vision and pattern recognation(CVPR) 2005. Jwu,Wang "Aspatial color mean shift object tracking algoritihm with scale and orientation estimation" .Pattern recognation letters journal 2008.

[4] Adam A,Riviln E Shirnshomi "Robust fragment based tracking using the integral histogram " .computer society conjernece on computer vision and pattern regognation, PP798-805,2006.

[5] K Lee,youn-Mi Lee "Tracking multi-person to illumination changes and occlusions" .ICAT 2004.

[6] Jaideep J, R.Y.babu, K.R.Ramakrishan "Robust object tracking with background weighted local kemels".!ournal computer vision and image understandingPP 296-309,2008.

[7] Faonglin wang,s.Yu ,lie Yang "Robust and effiecient fragments-based tracking using mean shift".!ournal of electronics and communications 2009.

[8] M.khansari,H.Rabiee "Occlusion handing for object tracking in crowed video scences based on the unsecimated wavelet feacutres"lEEE 2007.

[9] F.H.Chang, Y.L.chen "Real time multiple objects tracking and in­dification based on discrete wavelet trasform".!ournal on Pattern recognation 2005.

[10] J.zahoa, Whqua,"An Aprroach based on mean shift and kalamn fitIer for target tracking under occulusion".I.conference on machine earning and cyberestic bauding July 2009.

[11] V.shrikrishan,T.nagraj,Subashis chudhari,"fmgment based tracking for scale and orientation adaptation" .Indian conference on computer vision ,graphics and image processing 2008.

[12] Satoshi Y,"Multiple tracking using mean shift with particle filter based Intialization" .International conference information visualization 2008.

[13] Amir babarian,Saeed Rastegar,"Mean shift based object tracking with multiple feactures".southestean sympoisim on system theory unversity of tennesses space insitute 2009.

[14] A.Miller,A.Basharat "Person and vehicle tracking in survillance video".!ornal springer velag Berlin Heideberg PP 174-178,2008.

[15] Y.Bar shlom,X-R Li "Estimation with applications to tmcking with applications to tracking and navigation"John wiley and sons ,New yark USA,2001.

[16] A.Yilmaz,O.Javed ,M.shah,"Object tracking :A survey"ACM com-put.surv. vol38, 2006.

[17] (http/groups.inf.ed.ac.uk/vision/caviar/caviardata) [18] (http://www.pets2006.net) [19] R.C. Gonzalez and R.E.Woods,"Digital Jamge processing"New Jer­

sey,USA Prentice hall 2005.

521