1
Distributed Vision Processing Distributed Vision Processing in Smart Camera Networksin Smart Camera Networks
CVPRCVPR--0707
Hamid Aghajan, Stanford University, USAFrançois Berry, Univ. Blaise Pascal, FranceHorst Bischof, TU Graz, AustriaRichard Kleihorst, NXP Research, NetherlandsBernhard Rinner, Klagenfurt University, AustriaWayne Wolf, Princeton University, USA
March 18, 2007Minneapolis, USA
Course Website – http://wsnl.stanford.edu/cvpr07/index.php
CVPR 2007 Short Course 2Distributed Vision Processing in Smart Camera Networks
OutlineOutline
I. IntroductionII. Smart Camera Architectures
1. Wireless Smart Camera2. Smart Camera for Active Vision
III. Distributed Vision Algorithms1. Fusion Mechanisms2. Vision Network Algorithms
IV. Requirements and Case StudiesV. Outlook
2
Distributed Vision Processing Distributed Vision Processing in Smart Camera Networksin Smart Camera Networks
CVPRCVPR--0707
CHAPTER III:
Distributed Vision Algorithms
Hamid Aghajan, Wayne Wolf
CVPR 2007 Short Course 4Distributed Vision Processing in Smart Camera Networks
Fusion MechanismsFusion Mechanismsin Distributed Vision Networksin Distributed Vision Networks
Contact: Hamid Aghajan
hamid <AT> WSNL.Stanford.EduJoint work with: Chen Wu, Chung-Ching Chang
3
CVPR 2007 Short Course 5Distributed Vision Processing in Smart Camera Networks
Smart Camera NetworksSmart Camera NetworksTime
Space (Views)
Feature Levels
Time
Space (Views)
Feature Levels
Space (views)• Overcome ambiguities, occlusions• Enhance estimate robustness
Time• Increase confidence level of estimates• Detection of key frames
Feature levels• Exchange of features with other nodes across algorithmic layers
Fusion Dimensions
CVPR 2007 Short Course 6Distributed Vision Processing in Smart Camera Networks
Fusion MechanismsFusion MechanismsFeature fusion:
Use of multiple, complementary features within a camera node
Spatial fusion:Localization, epipolar geometry, ROI and feature matchingValidation of estimates by checking consistency, outlier removal3D reconstructionTemporal fusion:
Local interpolation / smoothing of estimatesExchange of updates via spatial fusionSpatiotemporal estimate smoothing and prediction
Model-based fusion:3D human body reconstruction, human gesture analysisFeedback to in-node feature extraction
Decision fusion:Estimates based on soft decisionsAdequate features in own observationsCost, latency of communication
Key features and key frames:
Information assisting other nodes
4
CVPR 2007 Short Course 7Distributed Vision Processing in Smart Camera Networks
Layered Spatial CollaborationLayered Spatial Collaboration
Description Layer 1 :Images
Description Layer 2 :Features
Description Layer 3 : Gesture Elements
Description Layer 4 : Gestures
Decision Layer 1 : within a single camera
Decision Layer 2 : collaboration between
cameras
Decision Layer 3 : collaboration between
cameras
R1 R2 R3
F1 F2 F3f12f11 f21
f22f31
f32
E1 E2 E3
G
Description Layers Decision Layers
Opportunistic data fusion
Fusion of features within a single camera
Fusion based on collaboration among multiple cameras
Feature-based fusion
Soft decision fusion
Final decision
Case Study: Human Gesture Analysis
In-node feature extraction
Security
Gaming
Smart Presentation
Accident Detection
CVPR 2007 Short Course 8Distributed Vision Processing in Smart Camera Networks
Layered Spatial CollaborationLayered Spatial Collaboration
Description Layer 1 :Images
Description Layer 2 :Features
Description Layer 3 : Gesture Elements
Description Layer 4 : Gestures
Decision Layer 1 : within a single camera
Decision Layer 2 : collaboration between
cameras
Decision Layer 3 : collaboration between
cameras
R1 R2 R3
F1 F2 F3f12f11 f21
f22f31
f32
E1 E2 E3
G
Description Layers Decision Layers
Opportunistic data fusion
Fusion of features within a single camera
Fusion based on collaboration among multiple cameras
Feature-based fusion
Soft decision fusion
Final decision
Case Study: Human Gesture Analysis
In-node feature extraction
Security
Gaming
Smart Presentation
Accident Detection
• Mutual reasoning:- Joint estimation
• Assisted reasoning:- Estimate validation- Key feature exchange
• Self reasoning:- In-node feature extraction
5
CVPR 2007 Short Course 9Distributed Vision Processing in Smart Camera Networks
Fusion MechanismsFusion Mechanisms
Mutual Reasoning
• Spatial fusion• Spatiotemporal fusion
• Model-based• Active vision• Feedback
• Feature fusion• Temporal fusion
Self Reasoning
Assisted ReasoningExchange of key features
CVPR 2007 Short Course 10Distributed Vision Processing in Smart Camera Networks
The Big PictureThe Big Picture
6
CVPR 2007 Short Course 11Distributed Vision Processing in Smart Camera Networks
ModelModel--based Fusionbased Fusion
Vision Reasoning /Interpretations
Human Model
KinematicsAttributesStates
• Motivation to build a human model:A concise reference for merging information from camerasOffers flexibility for interpretation in different applications:
• Various gesture interpretation applicationsAllows recreation of body gesture in virtual domain
• Viewing angles to body not available from any of the camerasAllows employment of active vision methods:
• Focus on what is important• Develop more detail in time
Helps address privacy concerns in various applications
CVPR 2007 Short Course 12Distributed Vision Processing in Smart Camera Networks
ModelModel--based Fusionbased Fusion•Approach:
Exchange segments and attributes, combine to reconstruct a 3D modelSubject’s information mapped and maintained in the model:•Geometric configuration: dimensions, lengths, angles•Color / texture / motion of different segments
1θ
2θ
3θ
4θ1ϕ
2ϕ
3ϕ
4ϕ
x
y
z
O
ellipses ellipses ellipses
CAM1 CAM2 CAM3
•Advantages:Employ higher level of in-node processingExchange descriptions only relevant to modelAffordable communication for multi-camera collaborationInitialization for active vision in nodes:•Provides color (other feature) distributions for rough segmentation•Helps with body part tracking (motion flow)•Offers hint on what is important to look for in images
7
CVPR 2007 Short Course 13Distributed Vision Processing in Smart Camera Networks
Data FlowData Flow
local processing routines
interface
in out
Cam 1
local processing routines
interface
in out
Cam 2
local processing routines
interface
in out
Cam 3
2D attribute descriptions
2D attribute descriptions
2D attribute descriptions
The collaboration routine
in out
3D model parameters
CVPR 2007 Short Course 14Distributed Vision Processing in Smart Camera Networks
Use of FeedbackUse of Feedback
CAM 1
CAM 2
CAM N
Merge informationProject / decompose to each image plane again
CAM 1
CAM 2
CAM N
3D description
Gesture extraction
Gestures
1e
2e
3e
CAM 1
CAM 2
CAM N
Feedback• Initialize in-node feature extraction• Active vision (focus on what is important)
8
CVPR 2007 Short Course 15Distributed Vision Processing in Smart Camera Networks
Feature Fusion: Optical Flow and ColorFeature Fusion: Optical Flow and ColorUse of complementary features
• Edge and color• Color and motion
Combine pixel-based and region-based methods
CVPR 2007 Short Course 16Distributed Vision Processing in Smart Camera Networks
ModelModel--based Human Posture Reconstructionbased Human Posture Reconstruction
Background subtraction
Rough segmentation
EM: refine color models
Watershed segmentation
Color segmentation and ellipse fitting in local processing
Ellipse fitting
Generate test configurations
Score test configurations
Update each test configuration using
PSO
Combine 3 views to get 3D skeleton geometric configuration
Update 3D model (color/texture,
motion)
3D human body model
Maintain current model
Previous color distribution Previous geometric
configuration and motion
Local processing from other cameras
Check stop criteria
N
Y
Segmentation function: Single camera
Model fitting function: Collaborative
From model, or via k-means
Refine color models (Perceptually Organized)
Or other morphological method with constraints
Concise description of segments
Goodness of ellipse fits to segments
Projection on image planes
E.g. parameters for the upper body (arms)
Feedback
9
CVPR 2007 Short Course 17Distributed Vision Processing in Smart Camera Networks
InIn--Node Feature Fusion for SegmentationNode Feature Fusion for Segmentation
CVPR 2007 Short Course 18Distributed Vision Processing in Smart Camera Networks
Collaborative Model FittingCollaborative Model Fitting
Frame 1
Frame 28
Frame 70
Frame 81
Frame 105
Frame 148
10
CVPR 2007 Short Course 19Distributed Vision Processing in Smart Camera Networks
Collaborative Model FittingCollaborative Model FittingFrame 105
CVPR 2007 Short Course 20Distributed Vision Processing in Smart Camera Networks
Collaborative Model FittingCollaborative Model Fitting
11
CVPR 2007 Short Course 21Distributed Vision Processing in Smart Camera Networks
Spatial FusionSpatial Fusion• Geometric fusion
• Mutual reasoning– Joint estimation– Joint refinement– Decision fusion
• Assisted reasoning– Estimate validation– Key frame exchange
– Making correspondences– Tracking– Reconstruction of 3D models– Camera network calibration – Use of epipolar geometry to:
•Feature matching•Outlier removal•ROI mapping between camera views
-100-50
0
X
Y
Z
Y
Z
X
Z
Y
X
Mapped to an ellipsoidCamera 1
(Training set)Camera 3(Test set)
Face Orientation Estimation• Color and geometry-based method• Spatial / temporal validation method
CVPR 2007 Short Course 22Distributed Vision Processing in Smart Camera Networks
Color and Geometry FusionColor and Geometry FusionFace orientation analysis
In-node feature extraction by fusion of Color and GeometryApply position constraints for the eyes when thresholding Cb/Cr
12
CVPR 2007 Short Course 23Distributed Vision Processing in Smart Camera Networks
Color and Geometry FusionColor and Geometry Fusion
100 200 300 100 200 300 100 200 300
100 200 300
50
00
50
00
100 200 300
50
100
150
200
100 200 300
50
100
150
200
50
00
50
00
50
100
150
200
50
100
150
200
Fals
e fa
ce
cand
idat
eFa
lse
eye
cand
idat
es
Epipolar lines for false candidates
• Use geometry of cameras to:– Match features– Remove false feature candidates
Face orientation analysisFeature matching with epipolar geometry
x1 x2
X
baseline
Epipolarline for x1
l’x1 x2
X
baseline
Epipolarline for x1
l’
An Example of Mutual Reasoning
CVPR 2007 Short Course 24Distributed Vision Processing in Smart Camera Networks
Feature FusionFeature Fusion• Level of features for fusion between cameras?
– Features are typically dense fields• Edge points, motion vectors
– They are locally fused to derive descriptions (sparse)• Descriptions are exchanged
– Valuable features may be exchanged as dense descriptors• Communication cost issues need to be considered
• Key features and key frames allow selective sharing of dense features
Low-level features
High-level features
Low-level descriptions
High-level descriptions
Processing within a single camera
Collaboration between cameras
Features (single camera) or descriptions (shared)
Dense
Sparse
13
CVPR 2007 Short Course 25Distributed Vision Processing in Smart Camera Networks
Key FramesKey Frames• Frames with high confidence estimates
– Node with key frame observation broadcasts derived information
– Other nodes use them to refine their local estimates
CVPR 2007 Short Course 26Distributed Vision Processing in Smart Camera Networks
Temporal FusionTemporal Fusion• Use key frames to re-initialize local face angle estimate
– Use angle estimates close to zero (frontal view)
• Aims to limit error propagation in time– Use optical flow to locally track angle changes between frames– Interpolate between two key frames to limit optical flow error propagation
Key frame Key frame
Interpolate orientation estimation between key frames by optical
flow estimation with a forward-backward probabilistic model
Cameras initialize face angles
Cameras initialize face angles
Cameras interpolate face angles between key frames using local optical flow
Key frames
Local optical flow is used to track face angle between key frames
14
CVPR 2007 Short Course 27Distributed Vision Processing in Smart Camera Networks
Spatial / Temporal ValidationSpatial / Temporal Validation• Estimates between key frames are
corrected by:• Temporal smoothing (one camera)• Outlier removal (multiple cameras)
• Can this be done more effectively?Spatiotemporal filtering
Key frames
Temporal smoothing
Spatial smoothing
CVPR 2007 Short Course 28Distributed Vision Processing in Smart Camera Networks
Spatiotemporal FusionSpatiotemporal Fusion
Opportunistic creation of face profile
0 5 10 15 20 25 30 35 40-200
0
200
degr
ee
0 10 20 5045 62 70 80 85 110 120 130-10-20-27-40-50-65-70-80-95-100-105-125-140 30
B C D E F G H I J KA M N O P Q R S T U VL X Y ZW
0 5 10 15 20 25 30 35 40-150
-100
-50
0
50
100
150True orientation
frame
degr
ee
Right Profile
Left Profile
Query result examples: side profiles
Mapped to ellipsoid
15
CVPR 2007 Short Course 29Distributed Vision Processing in Smart Camera Networks
Spatiotemporal FusionSpatiotemporal Fusion
CVPR 2007 Short Course 30Distributed Vision Processing in Smart Camera Networks
Decision FusionDecision Fusion• Smart home care network for fall detection
Accelerometer Signal Classifier
State1 State2 State3
Decision Making Process
State0Trigger Image Analysis
Camera 1 Camera 2 Camera 3
Analysis Analysis Analysis
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
20
40
60
80
100
120
140
160
180
Acce
lero
met
er A
mpl
itude
Cha
nge
Duration (s)
Falling versus Sitting Down
Sitting DownFalling
State0=1State0=3
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
20
40
60
80
100
120
140
160
180
Acce
lero
met
er A
mpl
itude
Cha
nge
Duration (s)
Falling versus Sitting Down
Sitting DownFalling
State0=1State0=3
No Report (Safe)
Report All Useful Data
(Possible Hazard)
Report (Hazard)
50 100 150 200 250 3000
50
100
150
200
250Accelerometer Signal for Falling and Sitting Down
Time (s)
Sign
al L
evel
x axisy axisz axis
FallSitting Down
50 100 150 200 250 3000
50
100
150
200
250Accelerometer Signal for Falling and Sitting Down
Time (s)
Sign
al L
evel
x axisy axisz axis
FallSitting Down
States are combined as soft decisions to create a report
Joint work with: Arezou Keshavarz, Ali Maleki-Tabar
16
CVPR 2007 Short Course 31Distributed Vision Processing in Smart Camera Networks
Decision FusionDecision FusionW
ait for More O
bservations
-Initialize Search Space for 3D
Model
-Validate Model
CVPR 2007 Short Course 32Distributed Vision Processing in Smart Camera Networks
Decision FusionDecision Fusion
17
CVPR 2007 Short Course 33Distributed Vision Processing in Smart Camera Networks
Decision FusionDecision Fusion
CVPR 2007 Short Course 34Distributed Vision Processing in Smart Camera Networks
Decision FusionDecision Fusion
18
CVPR 2007 Short Course 35Distributed Vision Processing in Smart Camera Networks
Virtual PlacementVirtual PlacementCollaborativeFace Analysis
Feature Fusion Ellipse Fitting
In-node processing Model-based Spatiotemporal
Fusion
CVPR 2007 Short Course 36Distributed Vision Processing in Smart Camera Networks
Event InterpretationEvent InterpretationVision Reasoning /
Interpretations
Human Model
KinematicsAttributesStates
Interpretation levels
Behavior analysis
Instantaneous action
Low-levelfeatures
AI reasoning
Posture / attributes
Model parameters
AI
Vision Processing
QueriesContextPersistenceBehavior attributes
Feedback
19
CVPR 2007 Short Course 37Distributed Vision Processing in Smart Camera Networks
Event InterpretationEvent Interpretation
CVPR 2007 Short Course 38Distributed Vision Processing in Smart Camera Networks
ReferencesReferences• C. Wu and H. Aghajan, Model-based Human Posture Estimation for Gesture Analysis in an Opportunistic
Fusion Smart Camera Network, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept. 2007.
• C. Chang and H. Aghajan, A LQR Spatiotemporal Fusion Technique for Face Profile Collection in Smart Camera Surveillance, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept. 2007.
• C. Chang and H. Aghajan, Spatiotemporal Fusion Framework for Multi-Camera Face Orientation Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007.
• C. Wu and H. Aghajan, Model-based Image Segmentation for Multi-View Human Gesture Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007.
• Hamid Aghajan and Chen Wu, Layered and Collaborative Gesture Analysis in Multi-Camera Networks, Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2007.
• Chen Wu and Hamid Aghajan, Opportunistic Feature Fusion-based Segmentation for Human Gesture Analysis in Vision Networks, IEEE SPS-DARTS, March 2007.
• Chen Wu and Hamid Aghajan, Collaborative Gesture Analysis in Multi-Camera Networks, ACM SenSysWorkshop on Distributed Smart Cameras (DSC), Oct. 2006.
• Chung-Ching Chang and Hamid Aghajan, Collaborative Face Orientation Detection in Wireless Image Sensor Networks, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.
• A. Maleki-Tabar, A. Keshavarz, H. Aghajan, “Smart Home Care Network using Sensor Fusion and Distributed Vision-Based Reasoning”, ACM Multimedia Workshop On Video Surveillance and Sensor Networks (VSSN), Oct. 2006.
• A. Keshavarz, A. Maleki-Tabar, H. Aghajan, “Distributed Vision-Based Reasoning for Smart Home Care”, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.
http://wsnl.stanford.edu/publications.php
20
CVPR 2007 Short Course 39Distributed Vision Processing in Smart Camera Networks
OutlineOutline
I. IntroductionII. Smart Camera Architectures
1. Wireless Smart Camera2. Smart Camera for Active Vision
III. Distributed Vision Algorithms1. Fusion Mechanisms2. Vision Network Algorithms
IV. Requirements and Case StudiesV. Outlook
Top Related