Talk 2009-monash-seminar-perception

Mahfuzul Haque

Manzur Murshed

Manoranjan Paul

Object Detection Based on Human

Visual Perception

Object Detection

Real-time Surveillance Applications

Input

Output

Object Detection: Applications

• Intelligent visual surveillance

– Event Detection

– Tracking

– Behaviour Analysis

– Activity Recognition

• Remote sensing

• Traffic monitoring

• Context-aware applications

Feature

Extraction

Object

Detection

Behaviour

Analysis

Object Detection: How?

• Not a practical approach

• Illumination variation

• Local background motion

• Camera displacement

• Shadow and reflection

Challenges with BBS

Current frame

Background

Model

Detected object

Background Modelling

Basic Background Subtraction (BBS)

- =

Current frame Background Detected object

Typical Surveillance Setup

Object

Detection Feature

Extraction

Object

Tracking

Behaviour

Analysis

Surveillance

Video Stream Frame-size reduction

Frame-rate reduction

Background model

State-of-the-art

Region and texture-based approaches

Shape-based approaches

Predictive modeling

Model initialization approaches

Nonparametric background modeling

Stationary foreground detection

Pixel-based approaches

Environment modelling

Background subtraction

Background modelling

Background maintenance

Foreground detection

Moving foreground detection

Object detection

Moving object detection

State-of-the-art

Hierarchical (Zhong et al., ICPR, 2008)

Type-2 Fuzzy MOG (Baf et al., LNCS, 2008)

Cascaded Classifiers (Chen et al., WMVS, 2007)

Gaussian Mixture Model with SVM (Zhang et al., THS, 2007)

Generalized Gaussian Mixture Model (Allili et al., CRV, 2007)

Bayesian Formation (Lee, PAMI, 2005)

Gaussian Mixture Model (Stauffer et al., PAMI, 2000)

Gaussian Mixture Model (Stauffer et al., CVPR, 1999)

Single Gaussian Model (Wren et al., PAMI, 1997)

Pixel-based Background Modelling


Sky

Cloud

Leaf

Moving Person

Road

Shadow

Moving Car

Floor

Shadow

Walking People

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

Sky

Cloud

Person

Leaf

x (Pixel intensity)


road shadow car road shadow

Frame 1 Frame N

Current frame Detected object

Background

Model

How to identify the

background models?

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%

Models are ordered by ω/σ

Context

Information (T)

Typical Surveillance Setup

Object

Detection Feature

Extraction

Object

Tracking

Behaviour

Analysis

Surveillance

Video Stream Frame-size reduction

Frame-rate reduction

Background model

Model adaptability: learning rate (α)

Scenario 1

Test Sequence: PETS2001_D1TeC2

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

α = Learning rate

T = Background data proportion

Scenario 2

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

Test Sequence: VSSN06_camera1

α = Learning rate


Scenario 3

Test Sequence: CAVIAR_EnterExitCrossingPaths2cor

T = 0.4 T = 0.6 T = 0.8

α = 0.1

α = 0.01

α = 0.001

First Frame

Test Frame

Ground Truth

α = Learning rate


Observation Summary

• Slow learning rate (α) is not preferable (ghost

or back-out).

• Simple post-processing will not improve the

detection quality at fast learning rate (α).

• Need to know the context behaviour in

advance.

How can we detect abnormal situations?

“Hey, a mob will be approaching soon,

and background will be visible only 10%

of that duration. Please set T = 0.1”

Research Goals

• A new object detection technique for

unconstrained environment, i.e., no context

dependant information (no T)

• Better detection quality at fast learning rate (α)

• Better stability across learning rates (α)

The New Technique

• Pixel-based

• MOG for environment modelling

• Incorporating human perceptual characteristics

in the underlying background model

– Model Reference Point

– Model Extent

Model Reference Point

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%


P(x)

x x

μ

P(x)

x x

b New!


P(x)

x x

b

Higher agility than using mean

Not tied with the learning rate

Realistic: actual intensity value

No artificial value due to mean

…

time

μ

b

…


ω1

σ12

µ1

b1

road

ω2

σ22

µ2

b2

shadow

ω3

σ32

µ3

b3

car

65% 20% 15%


P(x)

x x

b

P(x)

x x

b

P(x)

x x

b

Model Extent

P(x)

x x x = Kσ

μ

P(x)

x x

b

x = ?

ω1

σ12

µ1

b1

road

ω2

σ22

µ2

b2

shadow

ω3

σ32

µ3

b3

car

65% 20% 15%


Model Extent

P(x)

x x

x = Kσ

μ

• Depends on model std. dev.

• Model std. dev. is further tied

with learning rate

• Low detection sensitivity during

initial age of the model

• High detection sensitivity for in

stationary regions

• Adverse consequences:

– Redundant model introduced

– Precious model dropped

P(x)

x x

μ

Model Extent

x = ?

P(x)

x x

b

How is x related with b?

Low x High x

Human Visual Perception

Acquisition

Compression

Processing

Transmission

Reproduction

Why images are distorted?

System 1

System 2 Reference

Image

Distorted

Images

How distortion is measured?

RMSEPSNR

25510log20

x dB

y dB

|x – y| < 0.5 dB

Not perceivable

by human visual

system

Human Visual Perception

Our Problem

System 1

System 2 Reference

Image

Distorted

Images

How the range is determined?

1255

10log20255

10log20

xbxb

x dB

y dB

|x – y| < 0.5 dB

P(x)

b

x x x = ?

Not perceivable

by human visual

system

Are we designing an artificial human eye?

• It’s a computer/machine vision application.

• Isn’t 0.5 dB is too sensitive to envelope

shadow, reflection, and noise?

Impact of Human Perceptual Threshold First Frame Test Frame Ground Truth 0.5 dB 0.75 dB 1.0 dB 2.0 dB

Summary of the technique

• Pixel Based

• Environment Modelling: MOG

• New variable in MOG: most recent observation

• Detection Phase

– No Gaussian mean as the reference

– Most recent observation as the reference

– No Gaussian variance for computing model extent

– Model extent based on human perceivable threshold

Total 50 test sequences from 8 different sources

Scenario distribution Indoor

Outdoor

Multimodal

Shadow and Reflection

Low background-foreground contrast

Test Sequences

Evaluation Qualitative and quantitative

Lee (PAMI, 2005)

Stauffer and Grimson (PAMI, 2000)

False Positive (FP)

False Negative (FN)

False Classification

Experiments

Test Sequences

PETS (9) Wallflower (7) UCF (7) IBM (11) CAVIAR (7) VSSN06 (7) Other (2)

Visual Comparison

Quantitative Analysis

ROC: S&G

ROC: Lee

ROC: Proposed Technique

PDR: S&G vs. Proposed Technique (α = 0.1)

PDR: Proposed Technique

PDR: S&G (T = 0.6)

Instability (ALL)

Performance Matrix (ALL)

Research Summary

• A new object detection technique

• Context independent

• Higher stability

• Higher agility (fast learning rate)

• Future direction:

– Multimodal scenarios

– Even higher detection quality – multilevel approach

Q&A [email protected]

http://www.mahfuzulhaque.com

Thanks!

Image Source

http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg

http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg

Talk 2009-monash-seminar-perception

Technology

Transcript of Talk 2009-monash-seminar-perception