Talk 2009-monash-seminar-perception

44
Mahfuzul Haque Manzur Murshed Manoranjan Paul Object Detection Based on Human Visual Perception

Transcript of Talk 2009-monash-seminar-perception

Page 1: Talk 2009-monash-seminar-perception

Mahfuzul Haque

Manzur Murshed

Manoranjan Paul

Object Detection Based on Human

Visual Perception

Page 2: Talk 2009-monash-seminar-perception

Object Detection

Real-time Surveillance Applications

Input

Output

Page 3: Talk 2009-monash-seminar-perception

Object Detection: Applications

• Intelligent visual surveillance

– Event Detection

– Tracking

– Behaviour Analysis

– Activity Recognition

• Remote sensing

• Traffic monitoring

• Context-aware applications

Feature

Extraction

Object

Detection

Behaviour

Analysis

Page 4: Talk 2009-monash-seminar-perception

Object Detection: How?

• Not a practical approach

• Illumination variation

• Local background motion

• Camera displacement

• Shadow and reflection

Challenges with BBS

Current frame

Background

Model

Detected object

Background Modelling

Basic Background Subtraction (BBS)

- =

Current frame Background Detected object

Page 5: Talk 2009-monash-seminar-perception

Typical Surveillance Setup

Object

Detection Feature

Extraction

Object

Tracking

Behaviour

Analysis

Surveillance

Video Stream Frame-size reduction

Frame-rate reduction

Background model

Page 6: Talk 2009-monash-seminar-perception

State-of-the-art

Region and texture-based approaches

Shape-based approaches

Predictive modeling

Model initialization approaches

Nonparametric background modeling

Stationary foreground detection

Pixel-based approaches

Environment modelling

Background subtraction

Background modelling

Background maintenance

Foreground detection

Moving foreground detection

Object detection

Moving object detection

Page 7: Talk 2009-monash-seminar-perception

State-of-the-art

Hierarchical (Zhong et al., ICPR, 2008)

Type-2 Fuzzy MOG (Baf et al., LNCS, 2008)

Cascaded Classifiers (Chen et al., WMVS, 2007)

Gaussian Mixture Model with SVM (Zhang et al., THS, 2007)

Generalized Gaussian Mixture Model (Allili et al., CRV, 2007)

Bayesian Formation (Lee, PAMI, 2005)

Gaussian Mixture Model (Stauffer et al., PAMI, 2000)

Gaussian Mixture Model (Stauffer et al., CVPR, 1999)

Single Gaussian Model (Wren et al., PAMI, 1997)

Pixel-based Background Modelling

Page 8: Talk 2009-monash-seminar-perception

Background Modelling

Sky

Cloud

Leaf

Moving Person

Road

Shadow

Moving Car

Floor

Shadow

Walking People

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

Sky

Cloud

Person

Leaf

x (Pixel intensity)

Page 9: Talk 2009-monash-seminar-perception

Background Modelling

road shadow car road shadow

Frame 1 Frame N

Current frame Detected object

Background

Model

How to identify the

background models?

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%

Models are ordered by ω/σ

Context

Information (T)

Page 10: Talk 2009-monash-seminar-perception

Typical Surveillance Setup

Object

Detection Feature

Extraction

Object

Tracking

Behaviour

Analysis

Surveillance

Video Stream Frame-size reduction

Frame-rate reduction

Background model

Model adaptability: learning rate (α)

Page 11: Talk 2009-monash-seminar-perception

Scenario 1

Test Sequence: PETS2001_D1TeC2

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

α = Learning rate

T = Background data proportion

Page 12: Talk 2009-monash-seminar-perception

Scenario 2

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

Test Sequence: VSSN06_camera1

α = Learning rate

T = Background data proportion

Page 13: Talk 2009-monash-seminar-perception

Scenario 3

Test Sequence: CAVIAR_EnterExitCrossingPaths2cor

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

α = Learning rate

T = Background data proportion

Page 14: Talk 2009-monash-seminar-perception

Observation Summary

• Slow learning rate (α) is not preferable (ghost

or back-out).

• Simple post-processing will not improve the

detection quality at fast learning rate (α).

• Need to know the context behaviour in

advance.

Page 15: Talk 2009-monash-seminar-perception

How can we detect abnormal situations?

“Hey, a mob will be approaching soon,

and background will be visible only 10%

of that duration. Please set T = 0.1”

Page 16: Talk 2009-monash-seminar-perception

Research Goals

• A new object detection technique for

unconstrained environment, i.e., no context

dependant information (no T)

• Better detection quality at fast learning rate (α)

• Better stability across learning rates (α)

Page 17: Talk 2009-monash-seminar-perception

The New Technique

• Pixel-based

• MOG for environment modelling

• Incorporating human perceptual characteristics

in the underlying background model

– Model Reference Point

– Model Extent

Page 18: Talk 2009-monash-seminar-perception

Model Reference Point

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%

Models are ordered by ω/σ

P(x)

x x

μ

P(x)

x x

b New!

Page 19: Talk 2009-monash-seminar-perception

Model Reference Point

P(x)

x x

b

Higher agility than using mean

Not tied with the learning rate

Realistic: actual intensity value

No artificial value due to mean

time

μ

b

Page 20: Talk 2009-monash-seminar-perception

Model Reference Point

ω1

σ12

µ1

b1

road

ω2

σ22

µ2

b2

shadow

ω3

σ32

µ3

b3

car

65% 20% 15%

Models are ordered by ω/σ

P(x)

x x

b

P(x)

x x

b

P(x)

x x

b

Page 21: Talk 2009-monash-seminar-perception

Model Extent

P(x)

x x x = Kσ

μ

P(x)

x x

b

x = ?

ω1

σ12

µ1

b1

road

ω2

σ22

µ2

b2

shadow

ω3

σ32

µ3

b3

car

65% 20% 15%

Models are ordered by ω/σ

Page 22: Talk 2009-monash-seminar-perception

Model Extent

P(x)

x x

x = Kσ

μ

• Depends on model std. dev.

• Model std. dev. is further tied

with learning rate

• Low detection sensitivity during

initial age of the model

• High detection sensitivity for in

stationary regions

• Adverse consequences:

– Redundant model introduced

– Precious model dropped

P(x)

x x

μ

Page 23: Talk 2009-monash-seminar-perception

Model Extent

x = ?

P(x)

x x

b

How is x related with b?

Low x High x

Page 24: Talk 2009-monash-seminar-perception

Human Visual Perception

Acquisition

Compression

Processing

Transmission

Reproduction

Why images are distorted?

System 1

System 2 Reference

Image

Distorted

Images

How distortion is measured?

RMSEPSNR

25510log20

x dB

y dB

|x – y| < 0.5 dB

Not perceivable

by human visual

system

Page 25: Talk 2009-monash-seminar-perception

Human Visual Perception

Our Problem

System 1

System 2 Reference

Image

Distorted

Images

How the range is determined?

1255

10log20255

10log20

xbxb

x dB

y dB

|x – y| < 0.5 dB

P(x)

b

x x x = ?

Not perceivable

by human visual

system

Page 26: Talk 2009-monash-seminar-perception

Are we designing an artificial human eye?

• It’s a computer/machine vision application.

• Isn’t 0.5 dB is too sensitive to envelope

shadow, reflection, and noise?

Page 27: Talk 2009-monash-seminar-perception

Impact of Human Perceptual Threshold First Frame Test Frame Ground Truth 0.5 dB 0.75 dB 1.0 dB 2.0 dB

Page 28: Talk 2009-monash-seminar-perception

Summary of the technique

• Pixel Based

• Environment Modelling: MOG

• New variable in MOG: most recent observation

• Detection Phase

– No Gaussian mean as the reference

– Most recent observation as the reference

– No Gaussian variance for computing model extent

– Model extent based on human perceivable threshold

Page 29: Talk 2009-monash-seminar-perception

Total 50 test sequences from 8 different sources

Scenario distribution Indoor

Outdoor

Multimodal

Shadow and Reflection

Low background-foreground contrast

Test Sequences

Evaluation Qualitative and quantitative

Lee (PAMI, 2005)

Stauffer and Grimson (PAMI, 2000)

False Positive (FP)

False Negative (FN)

False Classification

Experiments

Page 30: Talk 2009-monash-seminar-perception

Test Sequences

PETS (9) Wallflower (7) UCF (7) IBM (11) CAVIAR (7) VSSN06 (7) Other (2)

Page 31: Talk 2009-monash-seminar-perception

Visual Comparison

Page 32: Talk 2009-monash-seminar-perception

Quantitative Analysis

Page 33: Talk 2009-monash-seminar-perception

ROC: S&G

Page 34: Talk 2009-monash-seminar-perception

ROC: Lee

Page 35: Talk 2009-monash-seminar-perception

ROC: Proposed Technique

Page 36: Talk 2009-monash-seminar-perception

PDR: S&G vs. Proposed Technique (α = 0.1)

Page 37: Talk 2009-monash-seminar-perception

PDR: Proposed Technique

Page 38: Talk 2009-monash-seminar-perception

PDR: S&G (T = 0.6)

Page 39: Talk 2009-monash-seminar-perception

Instability (ALL)

Page 40: Talk 2009-monash-seminar-perception

Performance Matrix (ALL)

Page 41: Talk 2009-monash-seminar-perception

Performance Matrix (ALL)

Page 42: Talk 2009-monash-seminar-perception

Research Summary

• A new object detection technique

• Context independent

• Higher stability

• Higher agility (fast learning rate)

• Future direction:

– Multimodal scenarios

– Even higher detection quality – multilevel approach

Page 43: Talk 2009-monash-seminar-perception

Q&A [email protected]

http://www.mahfuzulhaque.com

Thanks!

Page 44: Talk 2009-monash-seminar-perception

Image Source

http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg