Human Gesture Recognition by Mohamed Bécha Kaâniche
description
Transcript of Human Gesture Recognition by Mohamed Bécha Kaâniche
1
Human Gesture Recognition
by
Mohamed Bécha Kaâniche11/02/2009
2
OutlineIntroductionState of the artProposed MethodHuman Gesture DescriptorHuman Gesture Learning and ClassificationPreliminary resultsConclusion
3
Introduction
Human Gesture Recognition ??
Human Gesture ??
Gesture Recognition ??
4
Introduction (2)
What is a Gesture ?
Any meaningful movement of the human body !
To convey information or to interact with environment !
[Pei 1984] identifies 700 000 non-verbal signals !
[Birdwhistell 1963] estimates 250 000 Face expressions !
[Krout 1935] identifies 5000 Hand gestures !
Gesture signification differs widely from one culture to another !
Synchronous with speech, gaze, expressions !
According to [Hall 1973] 65% of communication is non-verbal !
Non-verbal: gesture, appearance, voice, chronemics , haptics !
5
Introduction (3)
Gesture
Dynamics
ConsciousEmblems
Illustrators
Affect displays
Regulators
Unconscious
Adaptors
Statics
6
Introduction (4)
What kind of gesture recognition ?
Identify, eventually interpret automatically human gestures !
Use a set of sensors and electronic processing units !
According to the type of sensors we distinguish:
Pen-based gesture recognition
Multi-touch surface based gesture recognition
Tracker-based gesture recognition
Instrumented gloves, Wii remote control,…
Body suits.
Vision-based gesture recognition
7
Introduction (5)
Vision-based gesture recognition ?
Advantages: Passive, non-obtrusive and « low-cost ».
Challenges:
Efficiency: real-time constraints.
Robustness: background/foreground changes.
Occlusion: Change of the point of view, self-occlusion,…
Categories:
Head/Face gesture recognition
Hand/arm gesture recognition
Body gesture recognition
8
State of the art
Vision-based Gesture Recognition System
Sensor Processing
FeatureExtraction
Gesture Classification
GestureDatabase
Recognized Gesture
9
State of the art (2)
Issues:
Number of cameras: mono/multi cameras, stereo/multi-view ?
Speed and latency: fast enough with low enough latency
interaction.
Structured environment: background, lighting, motion speed.
User requirements: clothes, body markers, glasses, beard,…
Primary features: edges, regions, silhouettes, moments,
histograms.
2D/3D representation.
Time representation: Time aspect representation.
10
State of the art (3)
Gesture Models
3D models3D Texture
Volumetric models3D Geometric
models3D Skeleton
models
Appearance modelsColor based
models
Shape geometry
based models
2D deformable template models
Motion based models
11
State of the art (4)
Techniques of gesture recognition
For postures
Linear classific. (e.g. k-means)
Non-Linear classific.
(e.g. N.N.)
For gesturesHidden-
Markov-
chains
Dynamic
Time Wrapi
ng
Time Delay Neural Netwo
rk
Finite State Machines, Dynamic Bayesian
Network, PNF
12
State of the art (5)
About Motion models based approaches:
Automata based recognition:
Very complex and difficult process !
Unreliable with monocular environment !
Computationally expensive !
Integrate the time aspect in the gesture model. (Unique model!)
Techniques dedicated to posture recognition can be used.
Early Methods: Optical flow, Motion History (MHI, 3D-MHM)
[Calderara 2008] proposes a global descriptor: action
signature.
[Liu 2008] proposes local descriptors: cuboids.
13
Proposed Method
Hypotheses
Monocular environment.
Dedicated to isolated individuals. (For implementation reason)
No restrictions on the environment and the clothes of targets.
Distinguishable body parts: Not handle target far away from the
camera.
Assume sensor processing algorithms provided !
Availability of a segmentation algorithm.
Availability of a people classifier.
Availability of a people tracker. (For implementation reason)
14
Proposed Method (2)
Type of gestures and actions to recognize
15
Proposed Method (3)
Method Overview
Sensor Processing
Local Motion DescriptorsExtraction
Gesture Classification
GestureCodebook
Recognized Gesture
16
Proposed Method (4)
Local Motion Descriptors Extraction
Corners Extraction
2D – HoG Descriptors
Computation
2D-HoG Descriptors
Tracker
From sensor Processing
Local Motion Descriptors
17
Proposed Method (5)
Gesture Codebook Learning
Training video sequences
Sensor Processing
Local Motion DescriptorsExtraction
Sequence annotations
GestureCodebook
Clustering
Code-words
18
Human Gesture Descriptor
Steps for Human Gesture Descriptor Generation:
Corners Detection
Find interest points where the motion can be easily tracked.
Ensure uniform distribution of feature through the body.
2D HoG Descriptors Extraction
For each interest point compute a 2D HoG descriptor.
Local Motion Descriptors Computation
Tracking 2D HoG Descriptors to build local motion descriptors
Gesture descriptor Computation: matching the local motion
descriptors with the learned code-words.
19
Human Gesture Descriptor (2)
Corners detection:
Shi-Tomasi features:
Given an image and its gradients and respectively
through the x axis and the y axis.
The Harris matrix for an image pixel in a window of size (u,v) is:
[Shi 1994] prove that is a better measure of corner
strength than the measure proposed by Harris Detector.
Where and are the eigen values of the Harris matrix.
xg ygI
),min( 21
u v yyx
yxx
ggg
gggvuH 2
2
.
.),(
1 2
20
Human Gesture Descriptor (3)
Corners detection (cont’d):
FAST features (Features from Accelerated Segment Test) :
21
Human Gesture Descriptor (4)
2D HoG Descriptor:
Descriptor bloc (3x3 cells):
Cell
Corner Point
5x5 or 7x7 pixels
22
Human Gesture Descriptor (5)
2D HoG Descriptor (cont’d):
For each pixel in the descriptor bloc we compute:
and
For each cell in the descriptor bloc we compute:
where K is the number of orientation bins and :
22 ),(),(),( vugvugvug yx )),(
),((tan),( 1
vug
vugvu
x
y
Kijij ff ..1)]([
ijcvu
ij vubinvugf),(
),().,()(
ijc
23
Human Gesture Descriptor (6)
2D HoG Descriptor (cont’d):
The 2D HoG Descriptor associated to the descriptor bloc is:
where:
and is a normalisation coefficient defined as:
The dimention of is 9 x K and its components values are in [0..1]
3..1,
jiijnd
ij
ij
fn
3
1
3
1 1i j
K
ijf
d
24
Human Gesture Descriptor (7)
Local Motion Descriptor:
Track 2D HoG Descriptor with the least square method using
kalman filter:
1. Initialization (t=0, first frame):
compute new 2D HoG descriptors
For each of them associate its position « x0 » and initialize
the error tolerance « P0 » (2x2 covariance matrix).
2. Prediction (t>0):
For each 2D HoG descriptor in the last frame,
use the Kalman filter to predict the relative position of the
descriptor « » which is considered as search center.tx̂
25
Human Gesture Descriptor (8)
Local Motion Descriptor (cont’d):
3. Correction (t>0):Locate the 2D HoG descriptor in the current frame (which is in the
neighborhood of the predicted position ) by using its real position
(measurement by minimizing the squared error) to carry out the position
correction using the Kalman filter : finding the final estimate .
Steps 2 and 3 are carried out while the tracking runs.
Consider a 2D HoG descriptor tracked successfully during a
temporal window, the Local Motion Descriptor is the concatenation
of all the values of the descriptors in this temporal window.
tx̂
tx̂
26
Human Gesture Descriptor (9)
Local Motion Descriptor (cont’d):
Time Update (Predict)
(1) Project the position ahead
(2) Project the error covariance
Measurement Update(Correct)
(1) Compute the Kalman gain
(2) Update estimate with measurement
(3) Update the error covariance
),,ˆ(ˆ 1 tvxfx ttt
tTtttttt QFPFP
1,11, ..
tTttt
Ttt
t RHPH
HPG
..
.
),,ˆ(ˆˆ 1 ttttttt ddxhyGxx
tttt PHGIP )(
tttttt
tttt
rxddxhy
qvxfx
),,(
),(
1
11
27
Human Gesture Learning and Classification
Gesture Learning
Training video sequences
Sensor Processing
Local Motion DescriptorsExtraction
Sequence annotations
GestureCodebook
K-means Clustering
Code-words
28
Human Gesture Learning and Classification (2)
Gesture Learning (cont’d):
k-means: classify the generated local descriptors (for all gestures) into « k » clusters.
Let « n » the number of generated local descriptors,
and « m » the number of gestures in the training set:
where « T » is a parameter (strictly positive integer) which can be fixed empirically or learned with an Expectation Maximization (EM) algorithm.
Minimize total intra-cluster variance (the squared error function):
nkmT .
k
i cxij
ij
xV1
)(
29
Human Gesture Learning and Classification (3)
Gesture Classification:
The k-nearest neighboors algorithm:
Given a Gesture codebook database {(code-word,gesture)},
and an input {code-word}:
For each code-word in the input,
select the k-nearest input code-words in the database using euclidean distance.
For each correspondant output gesture
Vote for the gesture.
Select the gesture that win the vote.
30
Preliminary Results
Cuurent Progress:
Evaluate Local Motion Descriptors Generation
Training gestures from KTH and IXMAS databases.
Walking Boxing
31
Conclusion
Contributions:
Local Motion Descriptors for Gesture representation.
Tracking local texture-based descriptor.
Future Work:
Add Likelihood information by using Maximization of Mutual
Information algorithm for the Gesture Learning Process.
Evaluate SVM classifier and compare its results to the k-
nearest neighboors algorithm.
32
Thank you for your attention !