Multi-Angle Hand Posture Recognition Based on Hierarchical ...

55
基於階層式時序記憶的多角度手勢辨識方法 Multi-Angle Hand Posture Recognition Based on Hierarchical Temporal Memory 別:資訊工程學系碩士班 學號姓名:M09602060 王勻駿 指導教授:黃雅軒 博士 101 2

Transcript of Multi-Angle Hand Posture Recognition Based on Hierarchical ...

Hierarchical Temporal Memory
i

HTM)


HTM
Adaboost SVM 86.8% 85.5%

ii
ABSTRACT
In the field of pattern recognition, angle variation plays an important role in producing
effective recognition results. To overcome the angle variation problems, this thesis adopts
the Hierarchical Temporal Memory (HTM). Based on the inherent property of the HTM
algorithm which applies temporal information to organize the continuous change in time of
image features in constructing their respective “invariant features”, a multi-angle hand
posture recognition method is hence proposed in this thesis.
We first obtain input images from a webcam. The input images will then be individually
processed by skin detection, background segmentation, and edge detection. The processed
results are next combined with a voting method to acquire the correct hand posture region.
If a forearm exists, a forearm segmentation step will be executed; otherwise it will be
skipped. After normalization of the output images through the forearm segmentation step,
the images are forwarded to HTM for learning and training the classifier model. Our
experimental results show that when using the same set of training and test data, the
proposed multi-angle hand posture recognition method can achieve 92.5% recognition rate,
.
iii








1-2Xbox-360Microsoft Kinect..................................................................2
2-1 ........................................................................................7
3-1 .....................................................................................9
3-8 …………………………………………………………………19
4-1 ……………………………………………….23
4-3HTM ……………………………….24
4-54-1…………………………………………………………………27
5-5 ..................................................................................40



G55 [4]
Toshiba

2
Microsoft 2010 Xbox-360 Microsoft Kinect[7]( 1-2
)


HSV[11]YCbCr[12]

[13][14](Threshold
Value)[11][15]
Classifier with the Histogram Technique) [12][14](Gaussian Classifier) [16]
(Gaussian Mixture Model)[17](Multilayer Perceptron) [18]




(Kalman filter)

Field)
4

(Week Classifier)
25
Equalization)(Support Vector Machine,
SVM)
(Model)
RCE (Restricted Coulomb Energy)



al. [28] 3D
distance transform (Hidden Markov Model,
HMM)
L. Bretzner et al. [29] Iuv
(Scale-space Analysis) Colour Blob Features
Likelihood map
AdaBoost Mean shift
5

Y. Sato et al.[31]




HTM







RGB
] [
200 0(a)(b)(c)
(d)
3-2 (b))(Histogram)
((,)|)
((,)|)
0.06
Vector Quantization(VQ) VQ


Codebook (Quantization)(Clustering)

codebook codeword
= {1, 2, 3, … , } codebook L codeword codeword
( = 1, … , ) RGB = {, , }
= ( , , , , , ) f
codewordMaximum Negative Run-Length(MNRL)
codeword p codeword q
codeword
= {1, 2, 3, … , } N N
RGB t codebook codeword
12

Step 2. t 1 N
a. = (, , ) = √2 + 2 + 2
b. = {|1 ≤ ≤ } codeword
1. (, ) ≤
2. (, ⟨, ⟩) =
c. C codeword = + 1
codeword
2. = (, , 1, − 1, , )
d. codeword codeword
1. = ( +
+1 ,
+
+1 ,
+
+1 )
2. = ({, } , {, } , + 1, {, − } , , )
Step 3. codebook codeword ( = 1, … , )
= {, ( − + − 1)}
(, )(, ⟨, ⟩) RGB
3-4
= {, , }
13
2 = 2 cos2 = ⟨,⟩2
2 (3-4)
2 = 2 + 2 + 2 2 = 2 + 2 + 2 (3-5)

} (3-6)
< 1 > 1 0.4~0.7 1.1~1.5

, (3-7)
MNRL
codeword
2 5
100 280
5

Step 1. = (, , ) = √2 + 2 + 2
Step 2.
a. (, ) ≤
b. (, ⟨, ⟩) =
codeword codebook Step 2
Step 3. codeword codeword
3-6
3-6
= (3 + 26 + 9) − (1 + 24 + 7) (3-8)
= (7 + 28 + 9) − (1 + 22 + 3) (3-9)
= || + || (3-10)
Gx Gy
Sobel
3-7Sobel Sobel x y
(2)


component)

Opening => AB = (A- B) + B Closing => AB = (A+ B) - B
A+ B = b

, (3-12)

0.30.10.6


( 3-8 (c))
1.1 (
3-8 (c))
128*128
HTM
19
(c)(
1980(Log-Polar Transform, LPT)
LPT
{ = √2 + 2
= −1(

21

Temporal Memory, HTM)HTM
2004 Jeff Hawkins On Intelligence[37]
[38]Bayesian
Network[39]
(non-Supervised Learning)
HTM


HTM HTM
(2) HTM(HTM Node) HTM HTM
HTM

HTMCategory Node

HTM 4-1 2
3 HTM 4
5(Category Node)
6(Output Node)
HTM HTM NodeHTM Node
HTM Node
HTM Node(Feature Pattern)
HTM Node
(Spatial Pooler, SP)(Temporal Pooler, TP) 4-4
SP
4-3HTM
SP HTM
SP SP

dxw SP Coincidence
D Coincidence
d D Coincidence
x Coincidence Coincidence
Coincidence
26
(Time-Adjacency Matrix)Coincidence
Coincidence
(6,8) 12 − 1 Coincidence 6 t
Coincidence 8,(, ) 12 12
∗ c Coincidence
(Markov Graph) TP 12
HTM
Coincidence
#1 4 2 13 1 3 2 3 1
#2 0 1 0 9 2 3 14 2
#3 8 2 2 7 5 0 4 1
#4 0 8 4 3 1 3 2 9
#5 2 1 7 1 3 0 2 1
#6 1 4 1 2 1 5 0 12
#7 1 9 4 0 1 3 4 2
#8 3 1 3 11 0 8 3 1
27

4-6 3 24 7
36 8 1 5
4-6
CoincidenceSP
CoincidenceBelief
TP y
y TP TP

(c)
TP

[38] 3-2 4-9 m1 m2
k
3-22 Coincidence

31
4-9
k Coincidence k i
Coincidence k i
−(−|)
k
(−|) Coincidence
(−| )\
(|) Coincidence
k
(−| )


5.2
)(4242)(3841)(4097 )
640*480 60( 5-2 (a)) 45( 5-2 (b))
30( 5-2 (c))

5.3
5-2 5-4 5-6
5-2

150 1000 91.7%
150 200 81.5%
150 200 89.5%
150 200 86%
2500 SVM
1000
150 1000
5-4





42

[1] F. Echtler, T. Pototschnig, and G. Klinker, “An LED-Based Multitouch Sensor for
LCD Screens,” in Proc. ACM Tangible and Embedded Interaction Conf., pp. 227-230,
2010.
[2] L. Xie and Z. Liu, “Realistic Mouth-Synching for Speech-Driven Talking Face Using
Articulatory Modelling,” IEEE Transactions on Multimedia, Vol. 9, No. 3, pp.500-510,
April 2007.
[3] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati, “Detecting Moving Objects, Ghosts,
and Shadows in Video Streams,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 25, No. 10, pp. 1337-1342, Oct. 2003.
[4] http://chinese.engadget.com/2008/06/14/toshiba-qosmio-g55-features-spursengine-vis
[5] http://chinese.engadget.com/2008/09/02/toshibas-cambridge-research-lab-shows-off-g
[7] http://chinese.engadget.com/2010/06/14/microsoft-kinect-gets-official/, Microcos
t Kinect , 2010.
[8] J. Park, J. Seo, A. Dongun, and S. Chung, “Detection of Human Faces Using Skin
Color and Eyes,” in Proc. IEEE International Conference on Multimedia and Expo,
Vol. 1, pp. 133-136, 2000.
[9] J. L. Crowley and J. Coutaz, “Vision for Man Machine Interaction,” Robotics and
Autonomous Systems, pp.347-358, 1997.
[10] M. Soriano, S. Huovinen, B. Martinkauppi, and M. Laaksonen, “Using the Skin Locus
43
to Cope with Changing Illumination Conditions in Color-Based Face Tracking,” in
Proc. IEEE Nordic Signal Processing Symposium, pp. 383-386, 2000.
[11] Y. Wang and B. Yuan, “A Novel Approach for Human Face Detection from Color
Images under Complex Background,” Pattern Recognition, Vol. 34, pp. 1983-1992,
2001.
[12] H. Wang and S. F. Chang, “A Highly Efficient System for Automatic Face Detection
in Mpeg Video,” IEEE Transactions on Circuits and Systems for Video Technology,
Vol. 7, No. 4, pp. 615-628, Aug. 1997.
[13] S. L. Phung, A. Bouzerdoum, and D. Chai, “Skin Segmentation Using Color Pixel
Classification: Analysis and Comparison,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, Vol. 27, No.1, Jan. 2005.
[14] V. Vezhnevets, V. Sazonov, and A. Andreeva, “A Survey on Pixel-Based Skin Color
Detection Techniques,” in Proc. Graphicon-2003, Moscow, Russia, pp. 85-92, Sept.
2003.
[15] P. S. Hiremath and A. Danti, “Detection of Multiple Faces in an Image Using Skin
Color Information and Lines-of-Separability Face Model,” International Journal of
Pattern Recognition and Artificial Intelligence, Vol. 20, No. 1, pp. 39-61, Feb. 2006.
[16] M. H. Yang and N. Ahuja, “Gaussian Mixture Model for Human Skin Color and Its
Applications in Image and Video Databases,” in Proc. SPIE Storage and Retrieval for
Image and Video Databases, Vol. 3656, pp. 45-466, Jan. 1999.
[17] M. H. Yang and N. Ahuja, “Gaussian Mixture Model for Human Skin Color and Its
Application in Image and Video Databases,” in Proc. SPIE Storage and Retrieval for
Image and Video Databases VII, Vol. 3656, pp. 458-466, 1999.
[18] S. L. Phung, D. Chai, and A. Bouzerdoum, “A Universal and Robust Human Skin
Color Model Using Neural Networks,” in Proc. INNS-IEEE International Joint
44
Conference of Neural Networks, Vol. 4, pp. 2844-2849, Jul. 2001.
[19] Alan M. McIvor, “Background Subtraction Techniques,” in Proc. Image and Vision
Computing, Auckland, New Zealand, Nov. 2000.
[20] M. Piccardi, “Background Subtraction Techniques: A Review,” in Proc. IEEE
International Conference on Systems, Man and Cybernetics, Vol. 4, pp. 3099-3104,
2004.
[21] Brian D. O. Anderson and John B. Moore, “Optimal Filtering,” in Thomas Kailath
(editor), Information and System Science Series, Prentice-Hall, Inc., Englewood Cliffs,
New Jersey, USA, pp.1-61, 1979.
[22] C. Ridder, O. Munkelt, and H. Kirchner, “Adaptive Background Estimation and
Foreground Detection using Kalman-Filter,” in Proc. International Conference on
Recent Advances in Mechatronics, pp. 193-199, 1995.
[23] S. Murali and R. Girisha, “Segmentation of Motion Objects from Surveillance Video
Sequences using Temporal Differencing Combined with Multiple Correlation,” in
Proc. IEEE International Conference on Advanced Video and Signal Based
Surveillance, pp. 472-477, 2009.
[24] S. Sun, D. Haynor and Y. Kim, “Motion Estimation Based on Optical Flow with
Adaptive Gradients,” in Proc. IEEE International Conference on Image Processing,
Vol. 1, pp. 852-855, 2000.
[25] Q. Chen, N.D. Georganas and E.M. Petriu, “Hand Gesture Recognition Using
Haar-Like Features and a Stochastic Context-Free Grammar,” IEEE Transactions on
Instrumentation and Measurement, Vol. 57, No. 8, p.1562-1571, Aug. 2008.
[26]Y. T. Chen, and K. T. Tseng, “Developing a Multiple-Angle Hand Gesture Recognition
System for Human Machine Interactions,” in Proc. IEEE Industrial Electronics
Society Conference (IECON), pp. 489-492, Taipei, Taiwan, Nov. 2007.
45
[27] X. Yin and M. Xie, “Finger Identification and Hand Posture Recognition for Human–
Robot Interaction,” Image Vision Computing, pp.1291-1300, mmm 2007.
[28] X. Liu and K. Fujimura, “Hand Gesture Recognition Using Depth Data,” in Proc.
IEEE International Conference on Automatic Face and Gesture Recognition, pp.
529-534, 2004.
[29] L. Bretzner, I. Laptev, and T. Lindeberg. “Hand Gesture Recognition Using
Multi-Scale Colour Features, Hierarchical Models and Particle Filtering,” in Proc.
Face and Gesture, pp. 423-428, 2002. (fix below according to what I am doing up to
here.)
[30] Z. Z. Bien, K. H. Park, J. W. Jung, and J. H. Do, “Intention Reading is Essential in
Human-Friendly Interfaces for the Elderly and the Handicapped,” in Proc. IEEE
Transactions on Industrial Electronics, Vol. 52, Issue 6, pp. 1500 – 1505, Dec. 2005.
[31] K. Oka, Y. Sato, and H. Koike, "Real-Time Fingertip Tracking and Gesture
Recognition", in Proc. IEEE on Computer Graphics and Applications, pp. 64 – 71,
Dec. 2002.
, 2005 7 .
[33] D. Chai and A. Bouzerdoum, “A Bayesian Approach to Skin Color Classification
inYCbCr Color Space,” in Proc. IEEE Region Ten Conference, Kuala Lumpur,
Malaysia, Vol.2 , pp 421-424, 2000.
[34] K. Kim, T.H. Chalidabhongse, D. Harwood, and L. Davis, “Real-Time
Foreground-Background Segmentation using Codebook Model,” Real-Time Imaging,
Vol. 11, No. 3, pp. 172-185, Jun. 2005.
[35] Eric L. Schwartz, “Computational anatomy and functional architecture of striate
cortex: a spatial mapping approach to perceptual coding,” in Proc. of the Vision
46
Research 20, pp.645-699, 1980.
[36] J. Hawkins and D. George, “Hierarchical Temporal Memory Concepts, Theory, and
Terminology” Numenta Inc, Available: http://www.numenta.com/ ,2006.
[37] J. Hawkins and B. Sandra, “On Intelligence”, New York : Owl Books, 2004.
[38] D. George, “How the Brain Might Work: A Hierarchical and Temporal Model for
Learning and Recognition,” Numenta Inc, Available: http://www.numenta.com/, 2008.
[39] K. P. Murphy, “Dynamic Bayesian Networks: Representation, Inference and Learning,”
PhD thesis, University of California, Berkeley, Computer Science Division, 2002.
[40] Stephen C. Johnson, “Hierarchical Clustering Schemes,” Pyschometrika, Vol. 32, No.
3, pp. 241-254, Sept, 1967.
[41] L. I. Kuncheva and L. C. Jain, “Nearest Neighbor Classifier: Simultaneous Editing
and Feature Selection,” Pattern Recognition Letters, Vol.20, No. 11-13, pp.
1149-1156, Nov. 1999.