HMM based Automatic Arabic Sign Language Translator using

39
Hidden Markov Model based Automatic Arabic Sign Language Translator using Kinect Omar Amin, Hazem Said, Ahmed Samy, Hoda El Korashy¥ Teaching Assistant, Computer Engineering Department Ain Shams University. Software Developer at Robovics. Assistant Professor, Computer Engineering Department, Ain Shams University. ¥Professor, Computer Engineering Department, Ain Shams University.

Transcript of HMM based Automatic Arabic Sign Language Translator using

Hidden Markov Model based Automatic Arabic Sign Language Translator

using KinectOmar Amin†, Hazem Said‡, Ahmed Samy†, Hoda El Korashy¥

†Teaching Assistant, Computer Engineering Department Ain Shams University.

†Software Developer at Robovics.

‡Assistant Professor, Computer Engineering Department, Ain Shams University.

¥Professor, Computer Engineering Department, Ain Shams University.

Outline

• Introduction• Problem Statement• Related Work

• Proposed System• System Description

• Experimental work.

• Conclusion.

2

Problem Introduction

Source : http://wfdeaf.org/human-rights/crpd/sign-language 3

• There are about 70 million deaf people who use sign language as their first language or mother tongue

Research Effort

4

• Data Source• Sensor Based Systems

• Camera Based Systems

• Research Focus• Isolated SLR (Sign Language

Recognition).• Continuous SLR.• Scalable SLR.• Signer Independence.• Posture Recognition.

Sensor Based Systems

5

• Using electromyographybased sensors to measure the electrical activity of muscles at rest and during contraction, and then these measurements are used to detect the sign being performed

Sensor Based Systems

6

• Using Data gloves (i.e. Cyber glove) to capture fingers positions and orientation, to be used to recognize hand shape and signs.

Camera Based Systems

7

• Normal RGB Camera (Usually using colored gloves)

• Stereo System (2 RGB Cameras)

• Kinect Sensor

• Algorithms used• Hidden Markov Model.

• Conditional Random Fields.

• Dynamic time warping.

• Recurrent neural networks.

Research Effort

8

Proposed System Block Diagram

9

Kinect

10

Kinect

11

A Kinect sensor (also called a Kinect) is a physical device that contains cameras, a microphone array, and an accelerometer as well as a software pipeline that processes color, depth, and skeleton data.

Kinect Skeleton Tracking

12

Kinect provides data about 20 different Skeleton joints, that includes:

• 3D accurate position for each joint.

• Joints orientation.

Go-Stop Detector

13

Go-Stop Detector

14

• Detects the start and end of each sign using a threshold to differentiate between signing and non signing space

Signing space

Go-Stop Detector

15

• A Threshold is decided to differentiate between signing space and non signing space based on hands 3d position.

• Three subsequent frames in the signing space or non signing space to flag a start or end of the sign.

Go-Stop Detector

16

Sign Recorder

17

Preprocessing System

18

Preprocessing System

19

Feature Extraction

20

• Features captured from skeleton stream

1. Right hand joint x, y, and depth.

2. Left hand joint x , y, and depth.

3. HIP Center joint x, y, and depth.

Feature Vector

21

• Feature Vector consist of 6 values per skeleton frame.

Feature Number Feature Value

1 Right Hand x – Hip Center x

2 Right Hand y – Hip Center y

3 Right Hand depth – Hip Center depth

4 Left Hand x – Hip Center x

5 Left Hand y – Hip Center y

6 Left Hand depth – Hip Center depth

We need the Hip center Joint to calculate hands positions relative to a static point to compensate for signer position in front of the Kinect.

Linear Resampling

22

Kinect camera records skeleton at the rate of 30 frames/seconds. However, this is the average rate. Practically, time period measured between two consecutive samples show variations from 30ms to 100ms.

Trajectory Smoothing

23

• To decrease the effect of noisy sensors measurements (spikes).

• Next slide : Demo for the trajectory smoothing for one component

Trajectory Smoothing

24

Hidden Markov Model Classifier

25

Hidden Markov Model

26

Hidden Markov Model

27

• To build a Hidden Markov model we need:

Hidden Markov Model

28

• Each hidden Markov model has a Topology

Hidden Markov Model

29

• Our Hidden States Emission Probability Distribution function is

6-D Gaussian distribution

Training Set Generation

30

• For each sign out of the 40 signs, a long video containing 60 samples have been recorded and segmented using the go stop detector into 60 annotated samples per sign to generate the training set and test set.

• These annotated samples are used as observations sequence from which HMMs are created using Baum-Welch Algorithm.

Hidden Markov Model

31

• In Sign Language context: Hands positions in 3d space

Observation

Hidden State6-D Gaussian distributionSingle Skeleton Frame

Hidden Markov Model

32

• Evaluation Algorithm

Hidden Markov Model Classifier

33

Experimental Results

34

• Go Stop Detector• Reliable Segmentation for long video.

• Minimum transition time : 300 ms

Experimental Results

35

• Hidden Markov Model Classifier output Performance (online mode)

Person Test Set Size per Sign Classification output

Original Signer 20 95.125%

Different Signer 20 92.5%

• Hidden Markov Model Classifier (offline mode)

Person Test Set Size per Sign Classification output

Original Signer 20 99.25%

Experimental Results

36

• Hidden Markov Classifier Performance

• Algorithm used for classification is the Forward-Backward algorithm.

Sign Timing Time needed to classify (ms)

Average Sign time 12.68 ms

Maximum 20.2 ms

Minimum 8.75 ms

Experimental Results

37

• Hidden Markov Model Hidden state count.

Conclusion

38

• A System has been developed to automatically segment a live video streams into isolated signs using Kinect and translate these signs into text.

• Performance for signer dependent is 95.125% and the signer independent is 92.5%.

Thank you!

39