Talk 2009-monash-seminar-perception
-
Upload
mahfuzul-haque -
Category
Technology
-
view
65 -
download
0
Transcript of Talk 2009-monash-seminar-perception
Mahfuzul Haque
Manzur Murshed
Manoranjan Paul
Object Detection Based on Human
Visual Perception
Object Detection
Real-time Surveillance Applications
Input
Output
Object Detection: Applications
• Intelligent visual surveillance
– Event Detection
– Tracking
– Behaviour Analysis
– Activity Recognition
• Remote sensing
• Traffic monitoring
• Context-aware applications
Feature
Extraction
Object
Detection
Behaviour
Analysis
Object Detection: How?
• Not a practical approach
• Illumination variation
• Local background motion
• Camera displacement
• Shadow and reflection
Challenges with BBS
Current frame
Background
Model
Detected object
Background Modelling
Basic Background Subtraction (BBS)
- =
Current frame Background Detected object
Typical Surveillance Setup
Object
Detection Feature
Extraction
Object
Tracking
Behaviour
Analysis
Surveillance
Video Stream Frame-size reduction
Frame-rate reduction
Background model
State-of-the-art
Region and texture-based approaches
Shape-based approaches
Predictive modeling
Model initialization approaches
Nonparametric background modeling
Stationary foreground detection
Pixel-based approaches
Environment modelling
Background subtraction
Background modelling
Background maintenance
Foreground detection
Moving foreground detection
Object detection
Moving object detection
State-of-the-art
Hierarchical (Zhong et al., ICPR, 2008)
Type-2 Fuzzy MOG (Baf et al., LNCS, 2008)
Cascaded Classifiers (Chen et al., WMVS, 2007)
Gaussian Mixture Model with SVM (Zhang et al., THS, 2007)
Generalized Gaussian Mixture Model (Allili et al., CRV, 2007)
Bayesian Formation (Lee, PAMI, 2005)
Gaussian Mixture Model (Stauffer et al., PAMI, 2000)
Gaussian Mixture Model (Stauffer et al., CVPR, 1999)
Single Gaussian Model (Wren et al., PAMI, 1997)
Pixel-based Background Modelling
Background Modelling
Sky
Cloud
Leaf
Moving Person
Road
Shadow
Moving Car
Floor
Shadow
Walking People
P(x)
x µ
σ2
P(x)
x µ
σ2
P(x)
x µ
σ2
P(x)
Sky
Cloud
Person
Leaf
x (Pixel intensity)
Background Modelling
road shadow car road shadow
Frame 1 Frame N
Current frame Detected object
Background
Model
How to identify the
background models?
ω1
σ12
µ1
road
ω2
σ22
µ2
shadow
ω3
σ32
µ3
car
65% 20% 15%
Models are ordered by ω/σ
Context
Information (T)
Typical Surveillance Setup
Object
Detection Feature
Extraction
Object
Tracking
Behaviour
Analysis
Surveillance
Video Stream Frame-size reduction
Frame-rate reduction
Background model
Model adaptability: learning rate (α)
Scenario 1
Test Sequence: PETS2001_D1TeC2
T = 0.4 T = 0.6 T = 0.8
α = 0.1
α = 0.01
α = 0.001
First Frame
Test Frame
Ground Truth
α = Learning rate
T = Background data proportion
Scenario 2
T = 0.4 T = 0.6 T = 0.8
α = 0.1
α = 0.01
α = 0.001
First Frame
Test Frame
Ground Truth
Test Sequence: VSSN06_camera1
α = Learning rate
T = Background data proportion
Scenario 3
Test Sequence: CAVIAR_EnterExitCrossingPaths2cor
T = 0.4 T = 0.6 T = 0.8
α = 0.1
α = 0.01
α = 0.001
First Frame
Test Frame
Ground Truth
α = Learning rate
T = Background data proportion
Observation Summary
• Slow learning rate (α) is not preferable (ghost
or back-out).
• Simple post-processing will not improve the
detection quality at fast learning rate (α).
• Need to know the context behaviour in
advance.
How can we detect abnormal situations?
“Hey, a mob will be approaching soon,
and background will be visible only 10%
of that duration. Please set T = 0.1”
Research Goals
• A new object detection technique for
unconstrained environment, i.e., no context
dependant information (no T)
• Better detection quality at fast learning rate (α)
• Better stability across learning rates (α)
The New Technique
• Pixel-based
• MOG for environment modelling
• Incorporating human perceptual characteristics
in the underlying background model
– Model Reference Point
– Model Extent
Model Reference Point
ω1
σ12
µ1
road
ω2
σ22
µ2
shadow
ω3
σ32
µ3
car
65% 20% 15%
Models are ordered by ω/σ
P(x)
x x
μ
P(x)
x x
b New!
Model Reference Point
P(x)
x x
b
Higher agility than using mean
Not tied with the learning rate
Realistic: actual intensity value
No artificial value due to mean
…
time
μ
b
…
Model Reference Point
ω1
σ12
µ1
b1
road
ω2
σ22
µ2
b2
shadow
ω3
σ32
µ3
b3
car
65% 20% 15%
Models are ordered by ω/σ
P(x)
x x
b
P(x)
x x
b
P(x)
x x
b
Model Extent
P(x)
x x x = Kσ
μ
P(x)
x x
b
x = ?
ω1
σ12
µ1
b1
road
ω2
σ22
µ2
b2
shadow
ω3
σ32
µ3
b3
car
65% 20% 15%
Models are ordered by ω/σ
Model Extent
P(x)
x x
x = Kσ
μ
• Depends on model std. dev.
• Model std. dev. is further tied
with learning rate
• Low detection sensitivity during
initial age of the model
• High detection sensitivity for in
stationary regions
• Adverse consequences:
– Redundant model introduced
– Precious model dropped
P(x)
x x
μ
Model Extent
x = ?
P(x)
x x
b
How is x related with b?
Low x High x
Human Visual Perception
Acquisition
Compression
Processing
Transmission
Reproduction
Why images are distorted?
System 1
System 2 Reference
Image
Distorted
Images
How distortion is measured?
RMSEPSNR
25510log20
x dB
y dB
|x – y| < 0.5 dB
Not perceivable
by human visual
system
Human Visual Perception
Our Problem
System 1
System 2 Reference
Image
Distorted
Images
How the range is determined?
1255
10log20255
10log20
xbxb
x dB
y dB
|x – y| < 0.5 dB
P(x)
b
x x x = ?
Not perceivable
by human visual
system
Are we designing an artificial human eye?
• It’s a computer/machine vision application.
• Isn’t 0.5 dB is too sensitive to envelope
shadow, reflection, and noise?
Impact of Human Perceptual Threshold First Frame Test Frame Ground Truth 0.5 dB 0.75 dB 1.0 dB 2.0 dB
Summary of the technique
• Pixel Based
• Environment Modelling: MOG
• New variable in MOG: most recent observation
• Detection Phase
– No Gaussian mean as the reference
– Most recent observation as the reference
– No Gaussian variance for computing model extent
– Model extent based on human perceivable threshold
Total 50 test sequences from 8 different sources
Scenario distribution Indoor
Outdoor
Multimodal
Shadow and Reflection
Low background-foreground contrast
Test Sequences
Evaluation Qualitative and quantitative
Lee (PAMI, 2005)
Stauffer and Grimson (PAMI, 2000)
False Positive (FP)
False Negative (FN)
False Classification
Experiments
Test Sequences
PETS (9) Wallflower (7) UCF (7) IBM (11) CAVIAR (7) VSSN06 (7) Other (2)
Visual Comparison
Quantitative Analysis
ROC: S&G
ROC: Lee
ROC: Proposed Technique
PDR: S&G vs. Proposed Technique (α = 0.1)
PDR: Proposed Technique
PDR: S&G (T = 0.6)
Instability (ALL)
Performance Matrix (ALL)
Performance Matrix (ALL)
Research Summary
• A new object detection technique
• Context independent
• Higher stability
• Higher agility (fast learning rate)
• Future direction:
– Multimodal scenarios
– Even higher detection quality – multilevel approach
Image Source
http://www.inkycircus.com/photos/uncategorized/2007/04/25/eye.jpg