Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta,...

26
Universit¨ at Hamburg MIN-Fakult¨ at Fachbereich Informatik Hand Gesture Recognition Vision-based Hand Gesture Recognition Sebastian Springenberg Universit¨ at Hamburg Fakult¨ at f¨ ur Mathematik, Informatik und Naturwissenschaften Fachbereich Informatik Technische Aspekte Multimodaler Systeme December 14, 2015 S. Springenberg 1

Transcript of Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta,...

Page 1: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition

Vision-based Hand Gesture Recognition

Sebastian Springenberg

Universitat HamburgFakultat fur Mathematik, Informatik und NaturwissenschaftenFachbereich Informatik

Technische Aspekte Multimodaler Systeme

December 14, 2015

S. Springenberg 1

Page 2: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition

Outline

1. IntroductionMotivationProblems / Challenges

2. Hand Gesture RecognitionConventional ApproachesCNN Approaches

CNN BasicsRGB CNN3D RGB-D CNN

3. Comparison RGB / 3D RGB-D CNN

4. Evaluation: CNNs for HGR

S. Springenberg 2

Page 3: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Introduction - Motivation Hand Gesture Recognition

Motivation

I Hand Gestures are natural and intuitive

I Little cognitive load on the user when interacting with thesystem

I Various applications in HRI and HCI: Sign languageinterpretation, virtual reality, gesture based interaction withsoftware / robots . . .

I Cost-e�cient and portable digital cameras can be used

S. Springenberg 3

Page 4: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Introduction - Problems / Challenges Hand Gesture Recognition

Problems / Challenges

I System has to cope with complex scenes:Di↵erent backgrounds, variable lightning conditions, di↵erentgesture positions / orientations, occlusion

How to deal with it?

I Hand-segmentation vs. minimal preprocessing

I CNNs for robust real-time processing

I Static or continuous gestures?

I RGB or RGB-D information?RGB-D more precise but computationally more expensive

S. Springenberg 4

Page 5: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - Conventional Approaches Hand Gesture Recognition

Conventional Approaches

Hand gesture recognition: A classification task

I Extract relevant features from images

I Pass features on to a classifier (e.g. SVM)

I Conventionally, hand tailored features are extracted

! Classification performance highly dependent on task specificpreprocessing

I Conventional feature detection methods include:HOG (histogram of oriented gradients), FFT, Hu InvariantMoments

S. Springenberg 5

Page 6: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - CNN Basics Hand Gesture Recognition

CNNs

I Biologically inspired approach on image processing

I Instead of exhaustive preprocessing, let system extract featureson its own

I Similar to basic neural networks but with additional properties:I Weight sharingI Receptive fieldsI Subsampling

I Layers involved:I Input layerI Convolutional layerI Subsampling layer (i.e. max-pooling)I Output layer (usually fully connected)

S. Springenberg 6

Page 7: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - CNN Basics Hand Gesture Recognition

2D CNNs

http://cs231n.github.io/convolutional-networks/

[2]S. Springenberg 7

Page 8: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - CNN Basics Hand Gesture Recognition

3D convolution

I Convolution both spatiallyand temporally for actionrecognition

[1]

S. Springenberg 8

Page 9: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

RGB CNN

Nagi et al. use a CNN to classify hand gestures

I Gestures presented with coloured gloves.I RGB images obtained by foot-bot robots as input to the

network

[5]S. Springenberg 9

Page 10: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

RGB CNN cont.

I 6 gestures based on count offingers

I 6000 images of size512⇥ 384 at 6 di↵erentdistances to robot

[5]

S. Springenberg 10

Page 11: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

RGB CNN cont.

PreprocessingI Transform RGB image to YCbCr (chrominance) color spaceI Segment Gloves using a Single Gaussian Model:

Model glove color using mean and covariance of chrominantcolor with bivariate Gaussian

[5]

S. Springenberg 11

Page 12: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

RGB CNN cont.

Colour Segmentation

[5]

S. Springenberg 12

Page 13: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

MPCNN

[5]

S. Springenberg 13

Page 14: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

RGB CNN cont.

Convolutional layer:I

M maps of equal size (Mx ,My)I Kernel of size (Kx ,Ky ) shifted over imageI Neurons in a map share weights but have di↵erent input fields

Max-pooling layer:I Maximum activation over non-overlapping rectangular regions

of size (Kx ,Ky )I Method of downsampling for more invariant features in higher

layers

Classification layer:I MLP for final classificationI One neuron per class with softmax activation function

S. Springenberg 14

Page 15: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - RGB CNN Hand Gesture Recognition

RGB CNN cont.

Results

[5]

S. Springenberg 15

Page 16: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

3D RGB-D CNN

Molchanov et al. use interleaved RGB-D information as input forrobust gesture recognition

I 3D CNN to process multiple frames at once

I Two subnetworks that give a fused prediction of gestureclassification:High-resolution network (HRN) and low-resolution network(LRN)

I Spatio-temporal data augmentation to omit overfitting on thetraining data

S. Springenberg 16

Page 17: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

3D RGB-D CNN

The VIVA challenge Dataset

I 885 intensity and depth video sequences recorded with Kinectcamera

I 19 di↵erent gestures to be classified

I 8 subjects, varying lightning conditions

S. Springenberg 17

Page 18: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

[4]

S. Springenberg 18

Page 19: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

3D RGB-D CNN cont.

Only little preprocessing:

I Align temporal length of video sequences with nearestneighbour interpolation

I Downsample the original image data by a factor of 2

I Compute gradients from intensity channel and interleave imagegradients with depth values

Classification:

I Combine class member-ship probabilities P(C |x ,WH) andP(C |x ,WL) from HRN and LRN:P(C |x) = P(C |x ,WH) ⇤ P(C |x ,WH)

S. Springenberg 19

Page 20: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

3D RGB-D CNN cont.

Training:

I Minimize cost function:

L(W ,D) = � 1|D|

|D|Pi=0

log(P(C (i)|x (i),W ))

S. Springenberg 20

Page 21: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

Data Augmentation

[4]

S. Springenberg 21

Page 22: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Hand Gesture Recognition - CNN Approaches - 3D RGB-D CNN Hand Gesture Recognition

3D RGB-D CNN cont.

Results

[4]

S. Springenberg 22

Page 23: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Comparison RGB / 3D RGB-D CNN Hand Gesture Recognition

Comparison RGB / 3D RGB-D CNN

RGB CNN (Nagi et al.)

+ Fast and e�cient gesture recognition

+ Runs on a mobile robot

- Requires some preprocessing / hand segmentation (see also [3])

- Gloves are used to facilitate hand segmentation

- Only possible to recognize static gestures

S. Springenberg 23

Page 24: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Comparison RGB / 3D RGB-D CNN Hand Gesture Recognition

Comparison RGB / 3D RGB-D CNN cont.

3D RGB-D CNN (Molchanov et al.)

+ No need for extensive hand segmentation and preprocessing

+ Robust classification, no over-fitting (data augmentation)

+ Can recognize gestures involving motion

- Computationally more expensive than 2D RGB CNN

S. Springenberg 24

Page 25: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Evaluation: CNNs for HGR Hand Gesture Recognition

Evaluation: CNNs for HGR

+ Less preprocessing than hand-tailored feature extraction

+ Robust to distortion

+ Good generalization: Not particularly sensible to individual userproperties (hand appearance, style of gesture execution),lighting conditions, background

- While application is possible in real-time, training can take long

- Training computationally expensive (parallelization on a fastGPU)

- Su�cient training data required for good generalization and toavoid overfitting

S. Springenberg 25

Page 26: Vision-based Hand Gesture Recognition - uni-hamburg.de · [4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara. Hand Gesture Recognition with 3D Convolutional

Universitat Hamburg

MIN-FakultatFachbereich Informatik

Evaluation: CNNs for HGR Hand Gesture Recognition

References

[1] S Ji, M Yang, and K Yu.3D convolutional neural networks for human action recognition.IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(1):221–231, 2013.

[2] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Ha↵ner.Gradient-based learning applied to document recognition.Proceedings of the IEEE, 86(11):2278–2323, 1998.

[3] Hsien-I Lin, Ming-Hsiang Hsu, and Wei-Kai Chen.Human hand gesture recognition using a convolution neural network.Automation Science and Engineering (CASE), 2014 IEEE International Conference on, pages 1038–1043, 2014.

[4] Pavlo Molchanov, Shalini Gupta, Kihwan Kim, Jan Kautz, and Santa Clara.Hand Gesture Recognition with 3D Convolutional Neural Networks.pages 1–7, 2015.

[5] Jawad Nagi and Frederick Ducatelle.Max-pooling convolutional neural networks for vision-based hand gesture recognition.2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pages 342–347,2011.

S. Springenberg 26