Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

53
Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition Oytun Akman

description

Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition. Oytun Akman. Overview. Surveillance Systems Single Camera Configuration Moving Object Detection Tracking Event Recognition Multi -c amera Configuration Moving Object Detection - PowerPoint PPT Presentation

Transcript of Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

Page 1: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

Oytun Akman

Page 2: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

2

Overview

Surveillance Systems Single Camera Configuration

Moving Object Detection Tracking Event Recognition

Multi-camera Configuration Moving Object Detection Occlusion Handling Tracking Event Recognition

Page 3: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

3

Single Camera Configuration

Moving Object Detection (MOD) Tracking Event Recognition

Page 4: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

4

Single Camera Configuration

Moving Object Detection (MOD)

Input Image - Background Image = Foreground Mask

Page 5: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

5

Single Camera Configuration

Moving Object Detection (MOD)

Frame Differencing (M. Piccardi, 1996)

Eigenbackground Subtraction (N. Oliver, 1999)

Parzen Window (KDE) Based MOD (A. Elgammal, 1999)

Mixture of Gaussians Based MOD (W. E. Grimson, 1999)

Page 6: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

6

Single Camera Configuration

MOD – Frame Differencing

Foreground mask detection

Background model update

thresholdtyxBD

thresholdtyxBDtyxFM

tyxBMtyxItyxBD

),,(0

),,(1),,(

)1,,(),,(),,(

)1,,()1(),,(),,( tyxBMtyxItyxBM

Page 7: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

7

Single Camera Configuration

MOD – Eigenbackground Subtraction

Principal Component Analysis (PCA) Reduce the data

dimensionCapture the major

variance Reduced data represents

the background model

(http://web.media.mit.edu/~tristan/phd/dissertation/chapter5.html)

Page 8: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

8

Single Camera Configuration

MOD – Parzen Window Based

Nonparametrically estimating the probability of observing pixel intensity values, based on the sample intensities

Page 9: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

9

Single Camera Configuration

MOD – Mixture of Gaussians Based

Based on modeling each pixel by mixture of K Gaussian distributions

Probability of observing pixel value xN at time N,

where (assuming that R,G,B are independent)

K

k

xx

kDkN

kNkT

kN

ewxp1

2

1

2/12/

1

2

1)(

Ikk2

Page 10: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

10

Single Camera Configuration

MOD - Simulation Results

Page 11: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

11

Single Camera Configuration

Tracking

Object Association Mean-shift Tracker (D. Comaniciu, 2003) Cam-shift Tracker (G. R. Bradski, 1998) Pyramidal Kanade-Lucas-Tomasi

Tracker (KLT) (J. Y. Bouguet, 1999)

(A constant velocity Kalman filter is associated with each tracker)

Page 12: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

12

Single Camera Configuration

Tracking – Object Association

Oi(t) = OJ(t+1) if Bounding box overlapping D(Oi(t), OJ(t+1)) < thresholdmd,

D() is a distance metric between

color histograms of objects Kullback-Leibler divergence

Bhattacharya coefficient

2121 1),( hhhhD

1

22

2

1121 loglog),(

h

hh

h

hhhhD

O bje ct1 ( t )

O bje ct2 ( t )

O bje ct1 ( t+ 1 )

O bje ct2 ( t+ 1 )

Page 13: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

13

Single Camera Configuration

Tracking – Mean-shift Tracker

Similarity function between the target model q and the candidate model p(y) is

where p and q are m-bin color histograms

._

1

)(1)(

coeffyyaBhattachar

m

uuu qypyd

(http://www.lisif.jussieu.fr/~belaroussi/face_track/CamshiftApproach.htm)

Page 14: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

14

Single Camera Configuration

Tracking - Mean-shift Tracker - Simulation Result

Page 15: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

15

Single Camera Configuration

Tracking – Cam-shift Tracker

Backprojection image (probability distribution image) calculated

Mean-shift algorithm is used to find mode of probability distribution image around the previous target location

Page 16: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

16

Single Camera Configuration

Tracking – Cam-shift Tracker - Simulation Result

Page 17: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

17

Single Camera Configuration

Tracking – Pyramidal KLT

Optical flow d=[dx dy] of the good feature point (corner) is found by minimizing the error function

xx

xx

yy

yy

wu

wux

wu

wuyyxyx dydxJyxIddd 2),(),(),()(

(http://www.suri.it.okayama-u.ac.jp/research/2001/s-takahashi/s-takahashi.html)

Page 18: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

18

Single Camera Configuration

Tracking - Pyramidal KLT - Simulation Results

Page 19: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

19

Single Camera Configuration

Event Recognition - Hidden Markov Models (HMM)

GM - HMMs, trained by proper object trajectories, are used to model the traffic flow (F. Porikli, 2004) (F. Bashir, 2005)

in

in

im

im

im

im

yx

yx

yx

itr

..

..)(11

m :starting frame number in which the object enters the FOV

n :end frame number in which the object leaves the FOV

Page 20: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

20

Single Camera Configuration

Event Recognition – Simulation Result

Page 21: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

21

Multi-camera Configuration

Background Modeling Occlusion Handling Tracking Event Recognition

Page 22: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

22

Multi-camera Configuration

Background Modeling Three background modeling algorithms

Foreground Detection by Unanimity Foreground Detection by Weighted Voting Mixture of Multivariate Gaussians Background Model

Page 23: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

23

Multi-camera Configuration

Background Modeling Common field-of-view must be defined to specify the region in which

the cameras will cooperate

Page 24: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

24

Multi-camera Configuration

Background Modeling - Unanimity If (x is foreground) && (xI is foreground) foreground

Page 25: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

25

Multi-camera Configuration

Background Modeling – Weighted Voting

thresholdtxxBD

thresholdtxxBDtyxFM

txMBtxItxBMtxItxxBD

ii

ii

iiiiiiii

),,(0

),,(1),,(

)1,(),()1,(),(),,(

and are the coefficients to adjust the contributions of the cameras. Generally, the contribution for the first camera (reference camera with better positioning) is larger than the second one, and

Page 26: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

26

Multi-camera Configuration

Background Modeling – Weighted Voting

Page 27: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

27

Multi-camera Configuration

Background Modeling – Mixture of Multivariate Gaussians

Each pixel modeled by mixture of K multivariate Gaussian distributions

where

K

k

XX

kDkN

kNkT

kN

ewXp1

2

1

2/12/

1

2

1)(

N

N

N

NN xH

x

x

xX

12

Page 28: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

28

Multi-camera Configuration

Background Modeling – Mixture of Multivariate Gaussians

Input image

Mixture of Multivariate Gaussians

Single camera MOG

Page 29: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

29

Multi-camera Configuration

Background Modeling - Conclusions

Projections errors due to the planar-object assumption Erroneous foreground masks False segmentation results

Cameras must be mounted on high altitudes compared to object heights

Background modeling by unanimity False segmented regions are eliminated Any camera failure failure in final mask Solved by weighted voting

In multivariate MOG method missed vehicles in single camera MOG method can be segmented

Page 30: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

30

Multi-camera Configuration

Occlusion Handling

Occlusion Primary issue of surveillance systems

False foreground segmentation results Tracking failures

Difficult to solve by using single-camera configuration

Occlusion-free view generation by using multiple cameras Utilization of 3D information Presence of different points of views

Page 31: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

31

In p u t im ag e 1B ack g ro u n dM o d el 1

Bac k g r o u n dS u b tr ac tio n

F o r eg r o u n d M as k

S eg m en ta tio nM o d u le

E p ip o larM atc h in g

S eg m en ts

P o in t T r an s f er

T r if o c a l T en s o r

T o p - v iew p o in ts

m atc h ed s eg m en t c en ter s

G r ap h Bas edC lu s te r in g

I n d iv id u al o b jec ts

In p u t im ag e 2B ack g ro u n dM o d el 2

Bac k g r o u n dS u b tr ac tio n

F o r eg r o u n d M as k

S eg m en ta tio nM o d u le

S eg m en ts

Multi-camera Configuration Occlusion Handling – Block Diagram

Page 32: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

32

Multi-camera Configuration

Occlusion Handling - Background Subtraction

Foreground masks are obtained using background subtraction

Page 33: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

33

Multi-camera Configuration

Occlusion Handling – Oversegmentation

Foreground mask is oversegmented using “Recursive Shortest Spanning Tree” (RSST) and K-means algorithms

RSST K-means

Page 34: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

34

Multi-camera Configuration

Occlusion Handling – Top-view Generation

Page 35: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

35

Multi-camera Configuration

Occlusion Handling – Top-view Generation

Corresponding match of a segment is found by comparing the color histograms of the target segment and candidate segments on the epipolar line

RSST K-means

Page 36: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

36

Multi-camera Configuration

Occlusion Handling – Clustering

Segments are grouped using “shortest spanning tree” algorithm using the

weight function

RSST K-means

diffdiffij DHw

Page 37: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

37

Multi-camera Configuration

Occlusion Handling – Clustering

After cutting the edges greater than certain threshold

RSST K-means

Page 38: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

38

Multi-camera Configuration

Occlusion Handling – Conclusions

Successful results for partially occluded objects Under strong occlusion

Epipolar matching fails Objects are oversegmented or undersegmented Problem is solved if one of the cameras can see the object without

occlusion RSST and K-means

have similar results K-means has better real time performance

Page 39: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

39

Multi-camera Configuration

Tracking – Kalman Filters

Advantage: continuous and correct tracking as long as one of the cameras is able to view the object

Tracking is performed in both of the views by using Kalman filters

2D state model: ][ yxi vvyxX

State transition model:

1000

0100

1010

0101

iA

Observation model:

0010

0001iO

Page 40: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

40

Multi-camera Configuration

Tracking – Object Matching

Objects in different views are related to each other via homography

212 )()( xHxHxx

x

x '

H x

H -1x'

C a m e ra v ie w 1 C a m e ra v ie w 2

d

d'

Page 41: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

41

C am er a 1 C am er a 2

a

b ca '

b'

c'

P1 a

P1 b P1 c

P1 c'

P1 b '

P1 a '

P1 a

P1 b P1 c

C am er a 1 C am er a 2

P1 c'

P1 b 'P1 a '

Multi-camera Configuration

Tracking - Example

H

U

U

U

U

U

U

Homogra

phic

projection

P2 a

P2 b P2 c

P2 a '

P2 b '

P2 c'

Page 42: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

42

C am er a 1 C am er a 2

P3 bP3 c

P3 b '

P3 c'

UU

U

U

Multi-camera Configuration

Tracking - Example

P4 bP4 c

P4 b '

P4 c'

Page 43: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

43

Multi-camera Configuration

Tracking – Simulation Results

Multi-camera Tracking

Page 44: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

44

Multi-camera Configuration

Tracking – Simulation Results

Single-camera Tracking Single-camera Tracking

Page 45: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

45

Multi-camera Configuration

Tracking – Simulation Results

Multi-camera Tracking

Page 46: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

46

Multi-camera Configuration

Tracking – Simulation Results

Single-camera Tracking Single-camera Tracking

Page 47: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

47

Multi-camera Configuration

Event Recognition - Trajectories

Extracted trajectories from both of the views are concatenated to obtain a multi-view trajectory

in

in

im

im

im

im

yx

yx

yx

itr

11

11

11

..

..)(11

1

in

in

im

im

im

im

yx

yx

yx

itr

22

22

22

..

..)(11

2

in

in

in

in

im

im

im

im

im

im

im

im

yyxx

yyxx

yyxx

trtritr

2121

2121

2121

....

....]|[)(1111

213

Page 48: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

48

Multi-camera Configuration

Event Recognition – Training

Page 49: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

49

Multi-camera Configuration Event Recognition – Viterbi Distances of Training Samples

Object ID Viterbi Distance to GM_HMM_1 Viterbi Distance to GM_HMM_2 Viterbi Distance to GM_HMM_1+2

1 10.0797 9.90183 19.7285

2 10.2908 10.1049 20.1867

3 10.2266 10.1233 20.1006

4 10.577 10.6716 21.2304

5 9.99018 9.84572 19.6763

6 10.0584 9.85901 19.6572

7 10.0608 9.88434 19.7496

8 10.2821 10.2472 20.3949

9 10.0773 9.8764 19.7181

10 10.3629 10.2508 20.3399

11 10.0322 9.86696 19.6382

12 10.0695 9.92222 19.7072

13 10.1321 9.95447 19.7818

14 10.2666 10.139 20.2119

15 10.2661 10.0629 20.0147

16 10.038 9.92932 19.6548

17 10.126 9.98202 19.7991

18 10.2134 10.108 19.9983

19 10.8046 10.5149 21.3008

20 10.4454 10.2919 20.333

21 10.111 9.90018 19.6983

22 10.1791 9.9294 19.9025

23 10.0511 10.0658 20.1564

24 10.1007 10.2248 20.248

25 10.3782 9.8986 19.9865

26 10.0308 9.9264 19.8682

27 10.0816 10.2139 20.1286

Page 50: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

50

Multi-camera Configuration

Event Recognition – Simulation Results with Abnormal Data

Average distance to GM_HMM_1 : 10.20 Average distance to GM_HMM_2 : 10.06 Average distance to GM_HMM_1+2: 20.04

Object ID Viterbi Distance to GM_HMM_1

Viterbi Distance to GM_HMM_2

Viterbi Distance to GM_HMM_1+2

28 20.4058 19.9481 45.1818

29 21.2409 19.7736 45.034

30 26.9917 24.7016 55.2278

31 10.7213 10.5773 21.2099

32 10.4648 10.5105 22.1852

33 10.1611 9.97222 19.7785

Page 51: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

51

Multi-camera Configuration

Tracking & Event Recognition - Conclusions

Tracking Successful results Correct initial segmentationOther tracker algorithms can be used

Event recognitionGM_HMM1+2 classifies the test data better

Page 52: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

52

Thank you...

Page 53: Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition

53

Summary - Surveillance Single camera configuration

Moving object detection Frame differencing Eigen background Parzen window (KDE) Mixture of Gaussians

Tracking Object association Mean-shift tracker Cam-shift tracker Pyramidal Kanade-Lucas-Tomasi tracker (KLT)

Event recognition

Multi-camera configuration Background modeling

Foreground detection by unanimity Foreground detection by weighted voting Mixture of multivariate Gaussian distributions

Occlusion handling Tracking Event recognition