Post on 12-Jul-2015
Contexte - IntroductionDOCTORAT DE L’UNIVERSITÉ DE TOULOUSE
Université Toulouse III - Paul Sabatier Systèmes embarqués et robotique
Real time human posture detection withmultiple depth sensors
JURY
Paul CHECCHIN Rapporteur
Alberto IZAGUIRRE Rapporteur
Mohamed AKIL Examinateur
Michel DEVY Examinateur
Frédéric LERASLE Directeur de thèse
Jean-Louis BOIZARD Directeur de thèse
Groupe RAP - Groupe N2IS
Wassim FILALI 07 Novembre 2014
Theses background
Human posture recognition
2
Data acquisition
Learning
Real time reconstruction
Hardware integration
evaluations
Multiple depth sensors
Body parts detection
Plan
Kinect2Kinect
Mono sensor RGB-D multi sensor RGB-D
4
Mono sensor RGB
Multi sensor RGB
Introduction - Historic
Depth sensor technology
Optical Diffractive Element
active RGB-D camera
Primesense - Patent
5
Context - Application
Video games Videosurveillance(Health/Office)
6
mono/multisensors RGB Approches
Humain model[Sundaresan et al. 2005]
Model
• Geometrical shapes adjustement
• Full model adjustment
Apparence Methods
• Images projection
• Adjusting the posture
3D Reconstruction Methods
• Voxellisation
• 3D Reconstruction
[Sigal et al. 2004]
Deformable surface[Li et al. 2011]
7
mono RGB-D Approches : Advanatges and Disadvantages
Resolution
Random errorFor depth estimation
Scale
• Compensated by processingto avoid overlearning
Orientation
• Relative to the sensor
• Has Impact on learning
Auto occultations
• No solution
Précision
• Limits the field of view
[Shotton et al 2011a]
[Koshelham et al 2012]
8
2.5 D Descriptor
multi Kinects Approches
[Zhang et al. 2012 ]
[Berger et al. 2011 ]
Particle filtering
Model adjustment
9
•No many examples of multi RGB-D in literature•No learning Approches
multi RGB-D Approches – Advantages and disatvantages
Advantages Disadvantages Avoid interferences
Temporal multiplexing
Vibration
Correction
[Maimone et al. 2012]
10
Our work on the Algorithmic
Our contributions
3D Descriptor for body
parts labeling
Free parameters
Database
Hardware architecture
New descriptor
Investigations on their
influence
Learning
Evaluations
Plateform
Example
11
Plan
Mocap in LAAS
Nombre de caméras Hawk 4
Résolution Hawk 640 x 480
Nombre de caméras Eagle 6
Résolution des caméras Eagle 2352 x 1728
Fréquence 200
13
MOCAP system Operation
Temporal synchronisation
1) Chess for image calibration
2) Active camera
3) MOCAP
4) MOCAP calibration square
14
Database - Recorded SequencesNSC13 IRSS35
Color views 3 3
Depth views 3 3
MOCAP cameras 10 10
MOCAP markers 13 35
Frequency 5 images / s 20 images / s
Nb sequences 5 8
Total Nb Postures 1 951 21 569
Sequences M2,
M3,
M4,
M5,
M6
Posture en T, mouvements bras
jambes, marche, course, saut,
pompes, break dance, natation
(bras), accroupis, chute arrière,
chute avant, équilibre, ping-
pong, volley ball, haltérophilie,
Tennis
C1, C2, C3,
C4, C5, C6,
C7, C8, C9
Posture en T, mouvements bras jambes genoux,
accroupis, bascule, haltérophilie, tennis, volley ball, ping-
pong, natation (bras), pétanque, lancement de poids,
volley ball, Pétanque, marche, course, assis debout, assis
par terre, saut, équilibre, étirement, boxe, bowling,
danse, chute avant, chute arrière, conduite, déplacer
chaise, s’asseoir, balayer assis, déplacer meuble, bouger
et filmer, jouer avec des balles, karaté, échauffement,
saut à la corde
Evaluation criteria15
p p
Recorded sequences - Illustrations
MOCAP
Depth
Intermediate body parts
Central body parts(defined by MOCAP)
Centers of body parts
Application
16
Plan
Our approch
18
Our approch (BPR) vs. [Shotton et al. 2011]
Segm
enta
tio
n
Ran
do
m f
ore
st 2
.5D
Mea
n s
hif
t 2
DM
ean
sh
ift
3D
Real dataset MOCAP
Sythetic dataset for learning
Ran
do
m f
ore
st 3
D
Free parameters study
19
Vo
xelli
sati
on
Our 3D descriptor
XY
Z
(X1,y1,z1)
(X2,y2,z2)
(X3,y3,z3)
(X4,y4,z4)
(X5,y5,z5)
(0,0,0,1,1)
(1,0,1,0,1)
1 Postur 7 0 K Voxels
T2(X2,y2,z2)
T3(x3,y3,z3)
T4(X4,y4,z4)
T5(x5,y5,z5)
T1(x1,y1,z1) Crossing the decision tree
20
Decision Tree generation
T2(X2,y2,z2)
T3(x3,y3,z3)
T4(X4,y4,z4)
T5(x5,y5,z5)
T1(x1,y1,z1)
Φ Ensemble de vecteurs candidats
75M, 90K
αS0
21
Descripteurs tirés
Decision forest
x log(x)
Entropy
Information gain
22
Trees Forest
Ponderation
Vote
Plan
Descriptors size
0.453
0.666
0.7680.800 0.786 0.786
0.77755.1%
68.1%
73.1% 74.2% 73.3% 72.1% 71.3%
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
0.400
0.450
0.500
0.550
0.600
0.650
0.700
0.750
0.800
0.850
0.900
0.1 0.2 0.4 0.7 1 1.5 2
clas
sifi
cati
on
%
me
anA
vera
geP
reci
sio
n
Valeur maximale de la norme des Vecteurs (m)
Taille de la fenêtre des vecteurs descripteurs - UniNorm
mAP
Classif
24
Number of Trees
0.792
0.836
0.902
73.5%
88.3%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
0.700
0.750
0.800
0.850
0.900
0.950
1 2 3 4 5 7 9 12 16 20
Cla
ssif
icat
ion
me
anA
vera
geP
reci
sio
n
Nombre d'Arbres (N)
Nombre d'arbres (N)
mAP
Classif
25
Quantitative Evaluations
0,875
0,39
0,161
0,159
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
<0,01 <0,02 <0,03 <0,04 <0,05 <0,10 <0,15 <0,20 <0,30 <0,50
Me
an a
vera
gep
reci
sio
n
Seuil du calcul du "Mean average precision" en (m)
Comparaison BPR vs. OpenNI(Séquence : IRSS35-C3)
BPR
ONI0
ONI1
ONI2
26
Qualitative Evaluations
27
Plan
Our work on the Hardware level
Analysis of requirements
Architectural exploration
GPU
FPGA
Comparative evaluation
Conclusion
Functional alaysis
CPU
GPU
FPGA
CPU
29
Solutions catalogue
Functionnal analysis – Modelisation SysML
30
640x480x16bit
Box : 500K Voxels100K Full Voxels
1000 postures25M VoxelsTree of 700K nodes
Voxellisation
Hardware solution catalog
PC
GP GPU
Embedded
Processors
Servers
microcontrollers
ARM
Dédiés
DSP
FPGA
ASIC
Cloud
PIC12F/ 8bits / 30MHz / 2mW / 1$
i7-5960X / 8Cores / 3.5GHz / 140W / 1000$
Tesla K40 / 2880 Cores / 235W / 5500$
100x(16 Cores/ 104GB) => $140/h
Virtex-7 / 2M LC / 6.8 BT/20-40W/$17K-$40K
31
Architectures evaluation on the Background detection function
- principle
32
Image Background
CPU - Plateform
Xion Pro Live
ServerHP Z800
• Display• Calibration
Capture « multi thread »Background detection3D Geometry
•Cameras•Rays•Voxellisation,…
Decision forest
bpr
Capture platform
Benefits
Algorithms evaluation platform
ASIC PS-1080
33
Performance - 10 to 30 ms
Learning time : 1h to 10h
Prediction time of one full posture 70 ms
GPU – Background detection
Relatively quick handling
Parallelisation / Acceleration x30
DisatvantagesAvantages
High power consumption
CPU dependency
Memory copy Host/GPU
34
Performance - 1 to 2 ms
FPGA – Components
Demosaicing Line Fifo
Start of PacketEnd of Packet
Generation
I2C Control
Frame Writer
Fifo
Counter
@Data
Frame Reader
Fifo
Counter
@Data
Memory write Memory read
Pixel Fetcher
Fifo
Data OutData In = @Fifo
Memory read
Reusable components library
Benefits Distorsion correction Rotation
Images fusionHomography
35
FPGA – Background detection
Hardware blosck for the background de tection
Optimised model
36
Image
fond
Image fond
FPGA – Integration in the SOPC
Ressource Usage Usage %
Logic elements 7 619 11%
Total logic
registres5 218 8%
Total LAB 630 15%
Total Internal
memory usage
(bit)
739 840 64%
Total memory
bloc usage188 75%
PLLs 2 50%
Global clocks 16 100%
37
Performance - 3 ms
Altera Cyclone IV 115K
Plan
Architectures Comparision
CPU GPU FPGA
Runtime - - Xeon One Thread
10 ms to 30 ms
Quadro FX48001 to 2 ms
Altera Cyclone IV3 ms
Details - Depends on the number of pixels to
process
4 ms for 4 channles Time to read the image from the memory. Can be
concatenated with other functions.
Avantages •Flexibility•Development platform
•Average learning curve •Highly parallel architecture•Reduced processing time•Reduced consumption
Disadvantages •Processing time•Processing / power
•High consumption•CPU dependency•Bottlenecks
•Long learning curve•Important development time•Limited precision processing (fixed/floating point)
39
Repartition
Fonction
Cap
ture
De
pth
pro
cess
ing
Bac
kgro
un
d
de
tect
ion
Blo
bs
sele
ctio
n
Vo
xelli
sati
on
Lab
elin
g
Me
an s
hif
t
Solution Ressource
Console
Kinect – Sensor
Kinect – PS1080
Console – Processor
Console – GPU
PC
Xtion – Sensor
Xtion – PS1080
Processor
GPU
FPGA
External Sensor
Specific Module
Soft-core
40
Plan
Conclusions
42
Perspectives
Temporal filteringMulti Kinect : fusion of reconstructions
Synthetic datasetEnrichir la base de
données
Learning algorithm parallelisation
Servers/ Cloud / GPU Learn bigger database
Enhance labeling quality
Hardware integration
Integrate all functionalities
Prototype compact à faible consommation
Mono Kinect : pixels labeling
Fall detection
Human activities recognition
Human machine interaction
43
Thanks