Download - CVPR2007 DistVisionProc Chapter3 1 - pervasive.uni-klu.ac.at · Wireless Smart Camera 2. Smart Camera for Active Vision III. Distributed Vision Algorithms 1. ... Localization, epipolar

1

Distributed Vision Processing Distributed Vision Processing in Smart Camera Networksin Smart Camera Networks

CVPRCVPR--0707

Hamid Aghajan, Stanford University, USAFrançois Berry, Univ. Blaise Pascal, FranceHorst Bischof, TU Graz, AustriaRichard Kleihorst, NXP Research, NetherlandsBernhard Rinner, Klagenfurt University, AustriaWayne Wolf, Princeton University, USA

March 18, 2007Minneapolis, USA

Course Website – http://wsnl.stanford.edu/cvpr07/index.php

CVPR 2007 Short Course 2Distributed Vision Processing in Smart Camera Networks

OutlineOutline

I. IntroductionII. Smart Camera Architectures

1. Wireless Smart Camera2. Smart Camera for Active Vision

III. Distributed Vision Algorithms1. Fusion Mechanisms2. Vision Network Algorithms

IV. Requirements and Case StudiesV. Outlook

2

Distributed Vision Processing Distributed Vision Processing in Smart Camera Networksin Smart Camera Networks

CVPRCVPR--0707

CHAPTER III:

Distributed Vision Algorithms

Hamid Aghajan, Wayne Wolf


Fusion MechanismsFusion Mechanismsin Distributed Vision Networksin Distributed Vision Networks

Contact: Hamid Aghajan

hamid <AT> WSNL.Stanford.EduJoint work with: Chen Wu, Chung-Ching Chang

3


Smart Camera NetworksSmart Camera NetworksTime

Space (Views)

Feature Levels

Time

Space (Views)

Feature Levels

Space (views)• Overcome ambiguities, occlusions• Enhance estimate robustness

Time• Increase confidence level of estimates• Detection of key frames

Feature levels• Exchange of features with other nodes across algorithmic layers

Fusion Dimensions


Fusion MechanismsFusion MechanismsFeature fusion:

Use of multiple, complementary features within a camera node

Spatial fusion:Localization, epipolar geometry, ROI and feature matchingValidation of estimates by checking consistency, outlier removal3D reconstructionTemporal fusion:

Local interpolation / smoothing of estimatesExchange of updates via spatial fusionSpatiotemporal estimate smoothing and prediction

Model-based fusion:3D human body reconstruction, human gesture analysisFeedback to in-node feature extraction

Decision fusion:Estimates based on soft decisionsAdequate features in own observationsCost, latency of communication

Key features and key frames:

Information assisting other nodes

4


Layered Spatial CollaborationLayered Spatial Collaboration

Description Layer 1 :Images

Description Layer 2 :Features

Description Layer 3 : Gesture Elements

Description Layer 4 : Gestures

Decision Layer 1 : within a single camera

Decision Layer 2 : collaboration between

cameras


cameras

R1 R2 R3

F1 F2 F3f12f11 f21

f22f31

f32

E1 E2 E3

G

Description Layers Decision Layers

Opportunistic data fusion

Fusion of features within a single camera

Fusion based on collaboration among multiple cameras

Feature-based fusion

Soft decision fusion

Final decision

Case Study: Human Gesture Analysis

In-node feature extraction

Security

Gaming

Smart Presentation

Accident Detection


Layered Spatial CollaborationLayered Spatial Collaboration

Description Layer 1 :Images

Description Layer 2 :Features

Description Layer 3 : Gesture Elements

Description Layer 4 : Gestures

Decision Layer 1 : within a single camera


cameras


cameras

R1 R2 R3

F1 F2 F3f12f11 f21

f22f31

f32

E1 E2 E3

G

Description Layers Decision Layers

Opportunistic data fusion

Fusion of features within a single camera

Fusion based on collaboration among multiple cameras

Feature-based fusion

Soft decision fusion

Final decision

Case Study: Human Gesture Analysis

In-node feature extraction

Security

Gaming

Smart Presentation

Accident Detection

• Mutual reasoning:- Joint estimation

• Assisted reasoning:- Estimate validation- Key feature exchange

• Self reasoning:- In-node feature extraction

5


Fusion MechanismsFusion Mechanisms

Mutual Reasoning

• Spatial fusion• Spatiotemporal fusion

• Model-based• Active vision• Feedback

• Feature fusion• Temporal fusion

Self Reasoning

Assisted ReasoningExchange of key features


The Big PictureThe Big Picture

6


ModelModel--based Fusionbased Fusion

Vision Reasoning /Interpretations

Human Model

KinematicsAttributesStates

• Motivation to build a human model:A concise reference for merging information from camerasOffers flexibility for interpretation in different applications:

• Various gesture interpretation applicationsAllows recreation of body gesture in virtual domain

• Viewing angles to body not available from any of the camerasAllows employment of active vision methods:

• Focus on what is important• Develop more detail in time

Helps address privacy concerns in various applications


ModelModel--based Fusionbased Fusion•Approach:

Exchange segments and attributes, combine to reconstruct a 3D modelSubject’s information mapped and maintained in the model:•Geometric configuration: dimensions, lengths, angles•Color / texture / motion of different segments

1θ

2θ

3θ

4θ1ϕ

2ϕ

3ϕ

4ϕ

x

y

z

O

ellipses ellipses ellipses

CAM1 CAM2 CAM3

•Advantages:Employ higher level of in-node processingExchange descriptions only relevant to modelAffordable communication for multi-camera collaborationInitialization for active vision in nodes:•Provides color (other feature) distributions for rough segmentation•Helps with body part tracking (motion flow)•Offers hint on what is important to look for in images

7


Data FlowData Flow

local processing routines

interface

in out

Cam 1


interface

in out

Cam 2


interface

in out

Cam 3

2D attribute descriptions



The collaboration routine

in out

3D model parameters


Use of FeedbackUse of Feedback

CAM 1

CAM 2

CAM N

Merge informationProject / decompose to each image plane again

CAM 1

CAM 2

CAM N

3D description

Gesture extraction

Gestures

1e

2e

3e

CAM 1

CAM 2

CAM N

Feedback• Initialize in-node feature extraction• Active vision (focus on what is important)

8


Feature Fusion: Optical Flow and ColorFeature Fusion: Optical Flow and ColorUse of complementary features

• Edge and color• Color and motion

Combine pixel-based and region-based methods


ModelModel--based Human Posture Reconstructionbased Human Posture Reconstruction

Background subtraction

Rough segmentation

EM: refine color models

Watershed segmentation

Color segmentation and ellipse fitting in local processing

Ellipse fitting

Generate test configurations

Score test configurations

Update each test configuration using

PSO

Combine 3 views to get 3D skeleton geometric configuration

Update 3D model (color/texture,

motion)

3D human body model

Maintain current model

Previous color distribution Previous geometric

configuration and motion

Local processing from other cameras

Check stop criteria

N

Y

Segmentation function: Single camera

Model fitting function: Collaborative

From model, or via k-means

Refine color models (Perceptually Organized)

Or other morphological method with constraints

Concise description of segments

Goodness of ellipse fits to segments

Projection on image planes

E.g. parameters for the upper body (arms)

Feedback

9


InIn--Node Feature Fusion for SegmentationNode Feature Fusion for Segmentation


Collaborative Model FittingCollaborative Model Fitting

Frame 1

Frame 28

Frame 70

Frame 81

Frame 105

Frame 148

10


Collaborative Model FittingCollaborative Model FittingFrame 105


Collaborative Model FittingCollaborative Model Fitting

11


Spatial FusionSpatial Fusion• Geometric fusion

• Mutual reasoning– Joint estimation– Joint refinement– Decision fusion

• Assisted reasoning– Estimate validation– Key frame exchange

– Making correspondences– Tracking– Reconstruction of 3D models– Camera network calibration – Use of epipolar geometry to:

•Feature matching•Outlier removal•ROI mapping between camera views

-100-50

0

X

Y

Z

Y

Z

X

Z

Y

X

Mapped to an ellipsoidCamera 1

(Training set)Camera 3(Test set)

Face Orientation Estimation• Color and geometry-based method• Spatial / temporal validation method


Color and Geometry FusionColor and Geometry FusionFace orientation analysis

In-node feature extraction by fusion of Color and GeometryApply position constraints for the eyes when thresholding Cb/Cr

12


Color and Geometry FusionColor and Geometry Fusion

100 200 300 100 200 300 100 200 300

100 200 300

50

00

50

00

100 200 300

50

100

150

200

100 200 300

50

100

150

200

50

00

50

00

50

100

150

200

50

100

150

200

Fals

e fa

ce

cand

idat

eFa

lse

eye

cand

idat

es

Epipolar lines for false candidates

• Use geometry of cameras to:– Match features– Remove false feature candidates

Face orientation analysisFeature matching with epipolar geometry

x1 x2

X

baseline

Epipolarline for x1

l’x1 x2

X

baseline

Epipolarline for x1

l’

An Example of Mutual Reasoning


Feature FusionFeature Fusion• Level of features for fusion between cameras?

– Features are typically dense fields• Edge points, motion vectors

– They are locally fused to derive descriptions (sparse)• Descriptions are exchanged

– Valuable features may be exchanged as dense descriptors• Communication cost issues need to be considered

• Key features and key frames allow selective sharing of dense features

Low-level features

High-level features

Low-level descriptions

High-level descriptions

Processing within a single camera

Collaboration between cameras

Features (single camera) or descriptions (shared)

Dense

Sparse

13


Key FramesKey Frames• Frames with high confidence estimates

– Node with key frame observation broadcasts derived information

– Other nodes use them to refine their local estimates


Temporal FusionTemporal Fusion• Use key frames to re-initialize local face angle estimate

– Use angle estimates close to zero (frontal view)

• Aims to limit error propagation in time– Use optical flow to locally track angle changes between frames– Interpolate between two key frames to limit optical flow error propagation

Key frame Key frame

Interpolate orientation estimation between key frames by optical

flow estimation with a forward-backward probabilistic model

Cameras initialize face angles

Cameras initialize face angles

Cameras interpolate face angles between key frames using local optical flow

Key frames

Local optical flow is used to track face angle between key frames

14


Spatial / Temporal ValidationSpatial / Temporal Validation• Estimates between key frames are

corrected by:• Temporal smoothing (one camera)• Outlier removal (multiple cameras)

• Can this be done more effectively?Spatiotemporal filtering

Key frames

Temporal smoothing

Spatial smoothing


Spatiotemporal FusionSpatiotemporal Fusion

Opportunistic creation of face profile

0 5 10 15 20 25 30 35 40-200

0

200

degr

ee

0 10 20 5045 62 70 80 85 110 120 130-10-20-27-40-50-65-70-80-95-100-105-125-140 30

B C D E F G H I J KA M N O P Q R S T U VL X Y ZW

0 5 10 15 20 25 30 35 40-150

-100

-50

0

50

100

150True orientation

frame

degr

ee

Right Profile

Left Profile

Query result examples: side profiles

Mapped to ellipsoid

15


Spatiotemporal FusionSpatiotemporal Fusion


Decision FusionDecision Fusion• Smart home care network for fall detection

Accelerometer Signal Classifier

State1 State2 State3

Decision Making Process

State0Trigger Image Analysis

Camera 1 Camera 2 Camera 3

Analysis Analysis Analysis

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

20

40

60

80

100

120

140

160

180

Acce

lero

met

er A

mpl

itude

Cha

nge

Duration (s)

Falling versus Sitting Down

Sitting DownFalling

State0=1State0=3

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

20

40

60

80

100

120

140

160

180

Acce

lero

met

er A

mpl

itude

Cha

nge

Duration (s)

Falling versus Sitting Down

Sitting DownFalling

State0=1State0=3

No Report (Safe)

Report All Useful Data

(Possible Hazard)

Report (Hazard)

50 100 150 200 250 3000

50

100

150

200

250Accelerometer Signal for Falling and Sitting Down

Time (s)

Sign

al L

evel

x axisy axisz axis

FallSitting Down

50 100 150 200 250 3000

50

100

150

200

250Accelerometer Signal for Falling and Sitting Down

Time (s)

Sign

al L

evel

x axisy axisz axis

FallSitting Down

States are combined as soft decisions to create a report

Joint work with: Arezou Keshavarz, Ali Maleki-Tabar

16


Decision FusionDecision FusionW

ait for More O

bservations

-Initialize Search Space for 3D

Model

-Validate Model


Decision FusionDecision Fusion

17





18


Virtual PlacementVirtual PlacementCollaborativeFace Analysis

Feature Fusion Ellipse Fitting

In-node processing Model-based Spatiotemporal

Fusion


Event InterpretationEvent InterpretationVision Reasoning /

Interpretations

Human Model

KinematicsAttributesStates

Interpretation levels

Behavior analysis

Instantaneous action

Low-levelfeatures

AI reasoning

Posture / attributes

Model parameters

AI

Vision Processing

QueriesContextPersistenceBehavior attributes

Feedback

19


Event InterpretationEvent Interpretation


ReferencesReferences• C. Wu and H. Aghajan, Model-based Human Posture Estimation for Gesture Analysis in an Opportunistic

Fusion Smart Camera Network, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept. 2007.

• C. Chang and H. Aghajan, A LQR Spatiotemporal Fusion Technique for Face Profile Collection in Smart Camera Surveillance, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept. 2007.

• C. Chang and H. Aghajan, Spatiotemporal Fusion Framework for Multi-Camera Face Orientation Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007.

• C. Wu and H. Aghajan, Model-based Image Segmentation for Multi-View Human Gesture Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007.

• Hamid Aghajan and Chen Wu, Layered and Collaborative Gesture Analysis in Multi-Camera Networks, Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2007.

• Chen Wu and Hamid Aghajan, Opportunistic Feature Fusion-based Segmentation for Human Gesture Analysis in Vision Networks, IEEE SPS-DARTS, March 2007.

• Chen Wu and Hamid Aghajan, Collaborative Gesture Analysis in Multi-Camera Networks, ACM SenSysWorkshop on Distributed Smart Cameras (DSC), Oct. 2006.

• Chung-Ching Chang and Hamid Aghajan, Collaborative Face Orientation Detection in Wireless Image Sensor Networks, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.

• A. Maleki-Tabar, A. Keshavarz, H. Aghajan, “Smart Home Care Network using Sensor Fusion and Distributed Vision-Based Reasoning”, ACM Multimedia Workshop On Video Surveillance and Sensor Networks (VSSN), Oct. 2006.

• A. Keshavarz, A. Maleki-Tabar, H. Aghajan, “Distributed Vision-Based Reasoning for Smart Home Care”, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.

http://wsnl.stanford.edu/publications.php

20


OutlineOutline

I. IntroductionII. Smart Camera Architectures

1. Wireless Smart Camera2. Smart Camera for Active Vision

III. Distributed Vision Algorithms1. Fusion Mechanisms2. Vision Network Algorithms

IV. Requirements and Case StudiesV. Outlook