Gesture recognition by a 3d camera and user impersonation ...

67
PROJECT REPORT ON Gesture recognition by a 3d camera and user impersonation via a humanoid DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION ENGINEERING INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH SIKSHA ‘O’ ANUSANDHAN UNIVERSITY, ODISHA SESSION: 2008-12

Transcript of Gesture recognition by a 3d camera and user impersonation ...

Page 1: Gesture recognition by a 3d camera and user impersonation ...

PROJECT REPORT ON

Gesture recognition by a 3d camera and user impersonation via a

humanoid

DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION ENGINEERING

INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH SIKSHA ‘O’ ANUSANDHAN UNIVERSITY, ODISHA

SESSION: 2008-12

Page 2: Gesture recognition by a 3d camera and user impersonation ...

PROJECT REPORT ON

GESTURE RECOGNITION BY A 3D CAMERA AND USER

IMPERSONATION VIA A HUMANOID

SUBMITTED IN PARTIAL FULFILLMENT FOR THE AWARD OF BACHELOR’S DEGREE IN ELECTRICAL AND ELECTRONICS ENGINEERING

SUBMITTED BY PIYUSH ROUTRAY 0811015112

SUBRAT RATH 0811014022 SANKALP MOHANTY 0811014025 GHANSHYAM BHUTRA 0811014039

UNDER THE GUIDANCE OF

ASST. PROF. FARIDA ASHRAF ALI ASST. PROF. RESHMA BHOI

DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION ENGINEERING

INSTITUTE OF TECHNICAL EDUCATION AND RESEARCH SIKSHA ‘O’ ANUSANDHAN UNIVERSITY, ODISHA

SESSION: 2008-12

Page 3: Gesture recognition by a 3d camera and user impersonation ...

DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION

ENGINEERING

Certificate

This is to certify that the thesis entitled “ Gesture Recognition by a 3D

camera and User Impersonation via a Humanoid ” submitted by PIYUSH

ROUTRAY has completed his major project, in the academic year 2011-2012

for the partial fulfillment of Bachelor of technology under Siksha ‘O’

Anusandhan University, under the department of Electronics and

Instrumentation Engineering, of Institute of Technical Education & Research,

Bhubaneswar.

Date: 16th

April, 2012

Prof. G.S.C. Mishra Asst. Prof F. A. Ali

HOD (E&I) Project Guide

Page 4: Gesture recognition by a 3d camera and user impersonation ...

INSTITUTE OF TECHNICAL EDUCATION AND

RESEARCH BHUBANESWAR

SIKSHA ‘O’ ANUSANDHAN UNIVERSITY BHUBANESWAR

DECLARATION

WE DECLARE THAT EVERY PART OF THE PROJECT REPORT SUBMITTED IS

GENUINELY OUR OWN WORK AND HAS NOT BEEN SUBMITTED FOR THE AWARD

OF ANY OTHER DEGREE. WE ACKNOWLEDGE THAT, IF ANY SORT OF

MALPRACTICE IS DETECTED IN RELATION TO THIS REPORT, WE SHALL BE HELD

LIABLE FOR IT.

PIYUSH ROUTRAY

0811015112

SUBRAT RATH

0811014022

SANKALP MOHANTY

0811014025

GHANSHYAM BHUTRA

0811014039

Page 5: Gesture recognition by a 3d camera and user impersonation ...

INSTITUTE OF TECHNICAL EDUCATION AND

RESEARCH BHUBANESWAR

ACKNOWLEDGEMENT

WE WISH TO IMPART OUR HEARTFELT GRATITUDE AND SINCEREST

APPRECIATION TO THE FOLLOWING PERSONS WHO IN ONE WAY OR ANOTHER

HAVE CONTRIBUTED AND SHARED THEIR IMMEASURABLE ASSISTANCE TO THE

SUCCESSFUL COMPLETION OF THIS PROJECT REPORT:

TO Dr. G.S.C. MISHRA, HOD, E&I DEPARTMENT;

TO Mrs. FARIDA ASHRAF ALI, PROF., PROJECT GUIDE, E&I DEPARTMENT;

ABOVE ALL, TO ALL THE FACULTY MEMBERS OF OUR DEPARTMENT AND

OUR FRIENDS, FOR THEIR SUPPORT.

PIYUSH ROUTRAY

0811015112

SUBRAT RATH

0811014022

SANKALP MOHANTY

0811014025

GHANSHYAM BHUTRA

0811014039

Page 6: Gesture recognition by a 3d camera and user impersonation ...

LIST OF FIGURES

Fig No. Description Page No.

1.1 Process Overview 6

1.2 Complete Work Flow 6

3.1 Difference between Legged and Wheeled Robot 12

3.2 SolidWorks Software 14

3.3 Individual structure of Humanoid 15

3.4 Complete Humanoid 16

3.5, 3.6 Adams Software Model 17

3.7 Basic Components used 18

3.8 Arm with Gripper 18

3.9 Complete Upper Body 19

3.10 Two Arms with Circuit Boards 19

3.11 Actual Model vs. Virtual Model 20

4.1, 4.2 KINECT Sensor 22, 23

4.3 Pin Diagram of ATMEGA 16 27

4.4 AVR with USART 29

4.5 XBEE Pinouts 30

4.6 XBEE Specification 31

5.1 Simulink Model 37

5.2 Pulse for Servo 40

6.1 RGB Image 42

Page 7: Gesture recognition by a 3d camera and user impersonation ...

6.2 IR Image 43

6.3 Skeletal Image 45

6.4 Calculation of Link Length 47

6.5 Calculation of Angles 49

6.6, 6.7,

6.8

Graphs for Torque Profile of Motors 50

6.9 Synchronization between User and Virtual Model 51

Page 8: Gesture recognition by a 3d camera and user impersonation ...

ABSTRACT

ON

GESTURE RECOGNITION BY A 3D CAMERA AND USER

IMPERSONATION VIA A HUMANOID

The central idea is being able to analyze 3D images in real time domain and apply similar techniques

available for 2D images like face recognition, gesture recognition, etc. The idea has many potential

real life based applications including entertainment, domestic, industry and etc. We aim at creating a

virtual and a hardware model which would have several practical applications of engineering in every

field possible.

The basic principle of working of the project is based on first building a complete mechanical model

and its assembly. Then the model is integrated with multidisciplinary variables like motion,

acceleration, torque, friction, etc. Then the model is controlled via Real Time parameters which are

given in by the sensor. The final output is complete simulation of the virtual model and the humanoid

using gesture movement of the user.

Name: Subrat Rath - 0811014022

Sankalp Mohanty - 0811014025

Ghanshyam Bhutra - 0811014039

Piyush Routray - 0811015112

Page 9: Gesture recognition by a 3d camera and user impersonation ...

CONTENT

1. INTRODUCTION....................................................................................................... 1

1.1 Humanoid……………………..................................................................................... 2

1.2 Communication............................................................................................................ 3

1.3 KINECT Sensor........................................................................................................... 3

1.4 Servo Motor ………………........................................................................................ 4

1.5 Aim of the Project…………........................................................................................ 5

2. LITERATURE REVIEW…….………………………………………… …............. 7

3. DESIGN OF HUMANOID..….………………………………………… ….............11

3.1 Why Humanoid?……………..................................................................................... 12

3.2 Calculation of Dimensions...…...................................................................................1 3

3.3 Software used for Designing - SolidWorks................................................................ 14

3.4 Virtual Design in SolidWorks.....................................................................................14

3.5 MSC – Adams……………….....................................................................................17

3.6 Hardware Model……………..................................................................................... 18

3.7 Actual (Hardware) Model vs. Virtual (Software) Model…....................................... 20

4. CONTROL ARCHITECTURE………………………………………… ….............21

4.1 KINECT Sensor ……………..................................................................................... 22

4.1.1 Using Kinect................................................................................................ 24

4.1.2 Kinect Drivers.............................................................................................. 24

4.2 ATMEL AVR ……………........................................................................................ 25

Page 10: Gesture recognition by a 3d camera and user impersonation ...

4.2.1 Atmega 16..................................................................................................27

4.2.2 Pin Description.......................................................................................... 28

4.2.3 Using the USART of Atmega 16.....................................................,.........28

4.3 ZIGBEE Communication…...................................................................................... 30

4.3.1 XBEE……..................................................................................................30

4.3.2 XBEE Working Principle...........................................................................31

5. PROCESS INVOLVED……………….……………………………...................... 32

5.1 Data Points from KINECT…................................................................................... 33

5.1.1 Calculation of Joint Angles....................................................................... 34

5.1.2 Calculation of Link Lengths...................................................................... 34

5.2 Denavit-Hartenberg Methodology............................................................................ 35

5.2.1 Simulink Model……………..................................................................... 37

5.3 Modes of Operation of ZIGBEE...............................................................................38

5.4 Memory Addressing Technique................................................................................39

5.5 Controlling the Servo Motor….................................................................................40

6. CODES IMPLEMENTED……………………………………….……......………..41

6.1 Coding in KINECT………………………………………………………………….42

6.1.1 RGB Image……………………………………………………………. ….42

6.1.2 IR Image……………………..................................................................….43

6.1.3 Skeletal Image……………………………………………………………..44

6.1.4 Finding out the link lengths………………………………………………..45

Page 11: Gesture recognition by a 3d camera and user impersonation ...

6.1.5 Finding out the angles…..………………………………………………..47

6.2 Coding in ADAMS – MATLAB….……………………………………….………. 50

6.3 Coding in Microcontrollers…………………………………………………………52

7. CONCLUSION ………………………………………….………………………...55

8. BIBLIOGRAPHY..……………………………………….………………………...56

Page 12: Gesture recognition by a 3d camera and user impersonation ...

1

CHAPTER 1

INTRODUCTION

Going by the present scenario, it can be very well noticed that not in the very distant

future, humanoids seem to be the best alternative for human labour. The present economy of

labour all around the globe is pushing this issue at a tremendous rate. Introduction and

development of robots namely NAO, ASIMO, etc in the market shows us the rapid

development in the field of Humanoid robotics technology. Machines are devices that make

the work of a man easier. Ranging from large scale machines like car crushers to small scale

machines like staplers, it is classified into diverse groups. Similarly, Humanoids are machines

built and programmed for a dedicated purpose, the only difference being the fact that it

resembles a human being.

Page 13: Gesture recognition by a 3d camera and user impersonation ...

2

1.1 Humanoid

Going by the dictionary meaning, Humanoid means “having an appearance or character

resembling that of a human ". It can be said that humanoids are complex machines resembling

a human in terms of size, shape, degree of freedom, etc. The boundary of application for

humanoids is shared by human themselves. The motivation for studying humanoid robots in

particular arises from diverse sociological and commercial interests ranging from the desire to

replace humans in hazardous occupations (de-mining, nuclear power plant inspection,

military inventions, etc.), to the restoration of motion in the disabled(dynamically controlled

lower-limb prostheses, rehabilitation robotics and functional neural stimulation). With the

advent of new methods, it can adapt to more and more complicated environments and even

surpass human at speed and accuracy.

Locomotion, the ability of a body to move from one place to another is a defining

characteristic of animal life. It is accomplished by manipulating the body with respect to the

environment. In the natural setting, locomotion takes on many forms, whether it's the

swimming of amoebas, flying of bits, or walking of humans. The diversity of animal

locomotion is truly astounding and surprisingly complex. The same is true in objects crafted

by man: air-planes have wings that create lift for flight, tanks have tracks for traversing

uneven terrain, and automobiles have wheels for rolling efficiently- and robots are now

walking on their own two legs!

The present day humanoids move and work with a predefined set of instructions and motions

to perform a specific task. An important and current topic of discussion revolves around

gesture imitation. Imitation refers to the act of impersonating a user's actions, be it hand

movement or leg movement. Achieving perfect imitation of a human requires tremendous

calculations and harmonious work of many different aspects of the system. As we go on to

develop more and more complicated control algorithms, better and more efficient functional

ties can be realised. Building, designing and working with practical robots requires

Page 14: Gesture recognition by a 3d camera and user impersonation ...

3

knowledge in three major engineering fields: mechanical, electrical and software. These days,

mobile robots are being called on to perform more and more complex and practical tasks.

Autonomous robots place special demands on their mobility system because of the

unstructured and highly varied environment the robot might drive through and the face that

even the best sensors are poor in comparison to a human's ability to see, feel and balance.

This means that the mobility system of a robot that relies on those sensors will have much

less info about the environment and will encounter obstacles that it must deal on its own. The

best mobility system for a robot to have is one that effectively accomplishes the required task;

without regard to how well a human course uses it. A common form of mobile robot today is

semiautonomous, where the robot has some sensors and acts partially on its own, but there is

always a human in the control loop through a link i.e. Telerobotic. We are using the technique

for a teleoperated robot - where there are no sensors on the bot that it uses to make decisions.

1.2 Communication

Next topic of focus is communication. It is a process by which two or more entities can

exchange or share information. Communication is done through a channel or a path which

may or may not have a physical existence. Communication between two or more points

which are not physically connected is called wireless communication. Communication in

wireless can be of few meters such as remote control operation or as far as millions of

kilometres such as deep space radio communication. Wireless communication is generally

done via Radio frequency or by microwave or by infrared. Communication can be from point

to point, point to multipoint or broadcasting. In this project, for wireless communication,

Zigbee and Xbee are implemented. ZIGBEE is a wireless technology developed for

applications mostly based on low cost, low power and wireless sensor networks. It follows

ZigBee protocol for communication which carries all the benefits of the 802.15.4 protocol

with added networking functionality. It can operate in the frequency ranges of 2.4-2.484GHz,

902-928MHz and 8680-868.6MHz. ZigBee protocol was developed by Zigbee Alliance. The

key feature of ZigBee protocol is that it supports mesh networks. The hardware devices which

have been used for ZigBee communication are Xbee trans-receivers. They work within a

range of 300meters and require a supply voltage of 2.8-3.4 volts. They are cheap, robust and

consume very less power.

Page 15: Gesture recognition by a 3d camera and user impersonation ...

4

1.3 KINECT Sensor

Sensor is a converter that measures any physical quantity and converts it into a signal which

can be read by an observer or by an instrument. Here the sensor used is a camera. Basically

camera is a device that records and stores images. These work in the visible spectrum of the

EM spectrum. The traditional cameras operate in a two-dimensional space i.e. the images are

viewed as a planar representation of the space we are in. But with the advent of technology,

there are techniques which are used for 3-D imaging. It creates and enhances the illusion of

depth in an image. These days, with the use of image processing and colour perception,

sensors which can provide a 3D experience have been developed. A rather significant

development can be seen with KINECT sensor which is a motion sensing device developed

by Microsoft for the XBOX 360 gaming console which is used in this project. It employs the

principle of Gesture Recognition of the user‟s movements. Gesture is simply any posture of

the user with some intention. It is far easier to point to an object rather than verbally describe

its exact location. Gestures are recognized in by mapping of individual camera images into

MATLAB and are compared to predefined conditions.

1.4 Servo Motor

For realization of gestures via user impersonation, a model is required. The model‟s major

component is a Servo Motor. It is a motor which forms part of servomechanism which is a

device that uses error-sensing negative feedback to correct the performance of a mechanism.

The motor is paired with an encoder to provide position/speed feedback. Servos are controlled

by sending a PWM (Pulse Width Modulation) signal i.e. a series of repeating pulses with

variable width. Generally servos are connected through a standard three-wire connection: two

wires for DC power supply and one for control, carrying the pulses. The signal is easily

generated by simple electronics or by microcontrollers. The angle of movement is determined

by the duration of pulse applied to the control wire. When these servos are commanded to

move, they will move to the position and hold it. If an external force pushes against the servo

while it is holding its position, it will resist from moving from that position. The maximum

Page 16: Gesture recognition by a 3d camera and user impersonation ...

5

amount of force the servo can exert depends on the torque rating of it. Moving on to the

model‟s description, there are 18 DOF (Degrees of Freedom), the complete description of

which has been illustrated in the following pages.

1.5 Aim of the Project:

Model a humanoid (Software and Hardware).

Embed the characteristics of a human being into it.

Take input from a 3D camera.

Control the humanoid with the Real Time input variables from the camera.

Page 17: Gesture recognition by a 3d camera and user impersonation ...

6

Figure 1.1

Figure 1.2

Page 18: Gesture recognition by a 3d camera and user impersonation ...

7

CHAPTER 2

LITERATURE REVIEW

Presented in this chapter are reviews about some of the related scientific papers,

journals and published IEEE papers.

Vitor M. F. Santos, Filipe M. T. Silva. “Engineering Solutions to Build an

Inexpensive Humanoid Robot Based on a Distributed Control Architecture”.2005 5th

IEEE-RAS International Conference on Humanoid Robots.

This paper presents the main steps to design a low cost fully autonomous humanoid

platform and the set of solutions proposed, thus being similar to most of our mechanical

modeling requirements. The main scope of the project beneath this paper is to carry out

research on control, navigation and perception, whilst offering opportunities for under and

pos-graduate students to apply engineering methods and techniques. The main features of the

22 degrees-of-freedom robot include the distributed control architecture, based on a CAN

bus, and the modularity at the system level. Ours being a 18DOF bot, accepts most

inspiration from this project. Although some issues are yet to be addressed, the stage of

development is already mature for practical experiments and to obtain the first conclusions on

the potential of the proposed solutions.

Page 19: Gesture recognition by a 3d camera and user impersonation ...

8

Milton Ruas, Filipe M. T. Silva, Vitor M. F. Santos. “A Low-Level Control

Architecture for a Humanoid Robot”.

This paper proposes general low-level control architecture for a small-size humanoid

robot using on off-the-shelf technologies. The main features of this implementation are the

distributed control approach and the relevance given to the sensorial information. Some

practical issues on servomotor control are given since that turned out necessary before

entering higher levels of control. Particular attention is given to the low-level control of RC

servomotors and the enhanced performance achieved by software compensation. The

distributed set of controller units is the key element towards a control system that

compensates for large changes in reflected inertia and providing position and velocity

control. Furthermore, a kind of intermediate level control is implemented as a local controller

based on force sensing, providing robust and adaptive behavior to changes in a slope surface.

ElvedinKljuno, Robert L.Williams II. “Humanoid Walking Robot: Modeling, Inverse

Dynamics, and Gain Scheduling Control”. Hindawi Publishing Corporation Journal of

Robotics.

This article presents reference-model-based control design for a 10 degree-of-freedom

bipedal walking robot, using nonlinear gain scheduling. The main goal is to show

concentrated mass models can be used for prediction of the required joint torques for a

bipedal walking robot. Relatively complicated architecture, high DOF, and balancing

requirements make the control task of these robots difficult. Although linear control

techniques can be used to control bipedal robots, nonlinear control is necessary for better

performance. The emphasis of this work is to show that the reference model can be a bipedal

walking model with concentrated mass at the center of gravity, which removes the problems

related to design of a pseudo-inverse system. Another significance of this approach is the

reduced calculation requirements due to the simplified procedure of nominal joint torques

calculation. Kinematic and dynamic analysis is discussed including results for joint torques

and ground force necessary to implement a prescribed walking motion. An inverse plant and

a tracking error linearization based controller design approach are described. A novel

combination of a nonlinear gain scheduling with a concentrated mass model for the MIMO

bipedal robot system is proposed.

James Kuffner, Koichi Nishiwaki, Satoshi Kagami, Masayuki Inaba, Hirochika Inoue.

“Motion Planning for Humanoid Robots”. 11th Int‟l Symp. of Robotics Research

(ISRR 2003)

This paper dwelled upon prospects of biped humanoids as back as 2003! In order to

improve the autonomy and overall functionality of these robots, reliable sensors, safety

mechanisms, and general integrated software tools and techniques are needed. We believe

that the development of practical motion planning algorithms and obstacle avoidance

software for humanoid robots represents an important enabling technology. This paper gives

an overview of some of our recent efforts to develop motion planning methods for humanoid

robots for application tasks involving navigation, object grasping and manipulation, footstep

Page 20: Gesture recognition by a 3d camera and user impersonation ...

9

placement, and dynamically-stable full-body motions. Most primitive techniques for

humanoid robot development are discussed in this paper.

Stefan Waldherr, Roseli Romero, Sebastian Thrun. “A Gesture Based Interface for

Human-Robot Interaction”2000 Kluwer Academic Publishers.

The thesis project basically dwells with gesture recognition of service robots. Much has not

been discussed about the mechanical structure, rather concentrates on image processing. It

specifically divided the job in „tracking‟, „gesture recognition‟ and „integration‟. Tracking

involves following the operator through the office or hall and characterising by face and shirt

colour. The shirt colour is assumed uniform and chromatic colour tracking is performed to

minimise the colour variation in operators‟ skin. This also prevents any influence due to

variation in lighting condition. Gesture recognition or pose analysis involves acquiring arm

poses from the mounted camera and then, analysis of arm poses for inferring meaning or

temporal template matching. The temporal template matcher integrates the multiple images

and then compares with its database. Dynamic and static response system in the process thus

involved is verified through rigorous experimentation. Integration deals with ordering the

robot to do the task, in this case, cleaning of office place. Different poses of arms tells the

robot to perform the intended task.

Ralf Salomon, John Weissmann. “Gesture Recognition for Virtual Reality

Applications Using Data Gloves and Neural Networks”.

This paper does not involve image processing. It rather uses electronic signals generated from

a hand glove to interpret expectations from the robot. The basic idea is to have different

permutations of angles presented by 18 sensor points in a single palm. The angles of two

joins in a finger, between fingers, thumb rotation, palm arch, wrist pitch and wrist-yaw form

nodes. While processing, nodes are interconnected and compared with previously stored hand

geometry data. However, the paper has very little experimental results and thus the technique

has to be verified.

David Kortenkamp, Eric Hubber, R. Peter Bonasso. “Recognising and interpreting

gestures on mobile Robot”. Robotics and Automation Group, NASA

The paper presents a very similar work to our area of work. They have presented work on

real-time, 3D gesture recognition system and mounted it over mobile robot. However, the

robot is not a humanoid, but a wheeled system. The robot is intended to interpret the orders

and not just following them. The pattern understanding mechanism adopted is proximity

space method. In this method the processor searches for the head of the operator, and then

subsequently links shoulder, upper arm and fore arm to the head via three links. The

movement vectors thus produced are analysed and proximity space is inferred. The method

acquires and reacquires the images override actions for inconsistent movement vectors.

Reactive action packaging specifies how and when to carry routine procedures through

conditional sequencing. The use of 3D image helps to perform the exact task out of wide

range of possible interpretations of corresponding images.

Page 21: Gesture recognition by a 3d camera and user impersonation ...

10

Mark Becker, EfthimiaKefalea, EricMael,Christop Von DerMalsburg, MikePagel, J

Triesch,Jan C. VorbrUggen, Rolf P. Wurtz, StefanZadel.GripSee: “A Gesture-

controlled Robot forObject Perception and Manipulation”

GripSee is a small step towards the bigger goal of preparing service robots. It

basically deals in improving the perception quality of the present robot system. However, the

paper concentrates on work limited to a particular environment only. However, there remains

requirement of high spatial precision for high resolution colour sensitivity demands.

Developed in C++ FLAVOR, the programming directs a dedicated computer for image

processing as well as another for arm movement and functioning. A particular signal of two

fingers directs the robot to the object it has to grip. The image of the object is analysed in

order to calculate the best possible surface to capture using the two gripping mechanisms of

the robot. It follows a predefined hierarchy of hand tracking, hand posture analysis, object

fixation, object recognition and grip planning before trajectory planning and gripping the

object. Similarly, it tracks the hand, analyses the posture(s) and then drops the object.

Christian Plagemann, VarunGanapati, Daphne Koller, Sebastian Thrun. “Real-time

identification and Localization of Body parts from Depth Images” 2010

www.hindawi.com

In this research, after acquiring the image through depth detection stereo vision, the

processor has to further process the image to undermine the action. The interest point method

identifies geodesic extrema on the surface mesh and then 3D vector orientation takes place.

The range of data thus acquired, facilitates segmentation of human body from the background

structure and also disambiguated visually similar poses. Real time high frame rate calculation

module prevents human-robot interaction from suffering any undue time delays. The research

also aims at maximising the repeatability of the detector, so that likelihood of detecting a

once-detected interest point is high. It has also discussed AGEX interest point method and

sliding window approach for identification of parts.

Luciano Spinello, Kai O. Arras, Rudolph Triebel, Roland Siegwart. “A layered

Approach to people detection in 3D Range Data”. 2010 www.aaai.org

Taking into account, the drastic changes in body pose, distance from sensor, self-

occlusion and occlusion by other objects, this research work subdivides a person into parts

defined by different height levels, and uses highly specialised classifiers for each part. It does

not necessarily choose ground as the reference frame, but rather tracks the person excluding

the background, whatsoever. The segmentation is done on point cloud segmentation manner

and then segments are rearranged to create the image. AdaBoost machine learning algorithm

approach is used in the process.

Page 22: Gesture recognition by a 3d camera and user impersonation ...

11

CHAPTER 3

DESIGN OF HUMANOID

Being inspired by the biology around is perhaps the best way to develop a

mechanical machine to cater to the needs of its humane masters. The greatest requirement in

the development procedure of the present day robots is perhaps the effective manoeuvring

skill. However attractive it may look to copy but even the simplest of near perfect biological

process are too complex to emulate. Considering various options (discussed later), the subject

of the project was chosen to be HUMANOID.

Page 23: Gesture recognition by a 3d camera and user impersonation ...

12

3.1 Why Humanoid?

Wheeled robots might be the easiest to accomplish considering numerous factors such

as low cost, easier electronic circuits, simplest geared dc motors, rugged structure and

unmatchable stability. However, certain tracks of manoeuvre demand finely engineered

legged robots, as:

Figure 3.1

Page 24: Gesture recognition by a 3d camera and user impersonation ...

13

3.2 Calculation of Dimensions:

The average height of a human being is 5 feet 3 inches. When conceiving a humanoid

platform countless decisions have to be made. Specifications must be defined and applied to

improve limits both on skills and overall objectives .In what ways the physical and functional

requirements are related to the robot dimensions, the mobility skills and performance of the

selected tasks like walking, turning, kicking a ball etc are predetermined. There are many

factors such as constraints in dimension, economic considerations, physical tangibility, final

objective or goal, limitation in availability of resources, which have a huge impact in

determining the robot‟s structure for construction. For the dimensioning of the humanoid

following steps have been followed:

1. First the height of the humanoid was decided based on the objective of the robot.

2. A scaling ratio was found out for the humanoid.

3. The human body part lengths were scaled down using the scaling ratio and finally link

lengths of the humanoid were found out.

The height was decided to be about 80cm and using the scaling ratio the individual

link lengths were obtained as follows:

LINK NAME LENGTH (in mm)

Upper arm 140

Lower arm 125

Upper leg 252

Lower leg 185

Body 250

Feet 85

Table 3.1

After the structure height (800mm) and remainder body proportions, the next issue

that comes is the determination of degrees of freedom (DOF). As per the end objective of the

project and complications in the structure, it was decided to have 18 degrees of

freedom(DOF) for the humanoid.

Page 25: Gesture recognition by a 3d camera and user impersonation ...

14

3.3 Software used for Designing –SolidWorks:

Solid Works is a 3D mechanical CAD (computer-aided design) program that runs on

Microsoft Windows and is being developed by Dassault Systèmes SolidWorks Corp., a

subsidiary of Dassault Systèmes. SolidWorks is a Parasolid-based solid modeler, and utilizes

a parametric feature-based approach to create models and assemblies.

Figure 3.2

3.4 Virtual Design in SolidWorks:

The model was first divided into different links. Every link was modeled individually and

then finally all the links were mated so as to form a complete model.

The links include:

Body.

Shoulder Joint.

Upper Arm.

Lower Arm.

Pelvis Joint.

Upper Leg.

Lower Leg.

Feet.

Page 26: Gesture recognition by a 3d camera and user impersonation ...

15

Figure 3.3

BODY

UPPER ARM

SHOULDER

JOINT

LOWER ARM

LOWER LEG

UPPER LEG

FEET

Page 27: Gesture recognition by a 3d camera and user impersonation ...

16

COMPLETE BODY

Figure 3.4

Page 28: Gesture recognition by a 3d camera and user impersonation ...

17

3.5 MSC – Adams:

Adams multi-body dynamics software enabled us to easily create and test virtual

prototypes of mechanical systems in a fraction of the time. Unlike most CAD embedded

tools, Adams was used to incorporate real physics by simultaneously solving equations for

kinematics, statics, quasi-statics, and dynamics. We also found how to import MATLAB

code into the Adams model. Using this software, all the DOFs were enabled in the virtual

model. The model was configured to take input from MATLAB and the plant was exported

which resulted in a „ .m „ file for use in MATLAB.

Figure 3.5

Figure 3.6

Page 29: Gesture recognition by a 3d camera and user impersonation ...

18

3.6 Hardware Model:

After completion of solid work 3D models of individual parts, the next step was to

construct the real hardware models. Various decisions had to be made regarding the choice of

material to be used for construction and thus aluminium was chosen as it light as well as

strong. The whole structuring of humanoid was done keeping in mind that motion between

the joints should be frictionless and the net weight of the upperbody should be lesser than the

lower body. L shaped aluminium channels were used,which gave immense strength to the

inner layers. Lock nuts were used so as to secure the joints.

Finally the servo motors rated 17kgcm were implanted in the humanoid and metallic

arms were used for coupling.

Figure 3.7

Figure 3.8

Page 30: Gesture recognition by a 3d camera and user impersonation ...

19

Figure 3.9

Figure 3.10

Page 31: Gesture recognition by a 3d camera and user impersonation ...

20

3.7: Actual (Hardware) Model vs. Virtual (Software) Model

Figure 3.11

This shows a comparison between the actual and the virtual model. In this image, the motors

except for the gripper have not been fixed. Due to the lack of resources and time, the working

is going to be demonstrated using just the upper-body parts and the locomotion via wheeled

bot.

Page 32: Gesture recognition by a 3d camera and user impersonation ...

21

CHAPTER 4

CONTROL ARCHITECTURE

The control architecture is the suite of control required for the behavioral

synchronization between human and humanoid robot. For the convenience of study, this part

has been divided into three parts, first section is about Kinect Sensor, and the second section

includes the AVR controllerand the third section contains the information about Zigbee

communication.

Page 33: Gesture recognition by a 3d camera and user impersonation ...

22

4.1 KINECT Sensor:

Kinect, originally known by the code nameProject Natal is a motion sensinginput device by

Microsoft for the Xbox 360video game console. Based around a webcam-style add-on

peripheral for the Xbox 360 console, it enables users to control and interact with the Xbox

360 without the need to touch a game controller, through a natural user interface using

gestures and spoken commands.Kinect was launched in North America on November 4,

2010,in Europe on November 10, 2010, in Australia, New Zealand and Singapore on

November 18, 2010, and in Japan on November 20, 2010.

Kinect builds on software technology developed internally by Rare, a subsidiary of Microsoft

Game Studios owned by Microsoft, and on range camera technology by Israeli developer

PrimeSense, which developed a system that can interpret specific gestures, making

completely hands-free control of electronic devices possible by using an infrared projector

and camera and a special microchip to track the movement of objects and individuals in three

dimension. This 3D scanner system called Light Coding employs a variant of image-based

3D reconstruction. The Kinect sensor is a horizontal bar connected to a small base with a

motorized pivot and is designed to be positioned lengthwise above or below the video

display. The device features an "RGB camera, depth sensor and multi-array microphone

running proprietary software",which provide full-body 3D motion capture, facial recognition

and voice recognition capabilities. The depth sensor consists of an infraredlaser projector

combined with a monochrome CMOS sensor, which captures video data in 3D under any

ambient light conditions. The sensing range of the depth sensor is adjustable, and the Kinect

software is capable of automatically calibrating the sensor based on the physical

environment, accommodating for the presence of the user.

Figure 4.1

Page 34: Gesture recognition by a 3d camera and user impersonation ...

23

Figure 4.2

Described by Microsoft personnel as the primary innovation of Kinect, the software

technology enables advanced gesture recognition, facial recognition and voice recognition.

Kinect sensor outputs video at a frame rate of 30 Hz. The RGB video stream uses 8-bit VGA

resolution (640 × 480 pixels) with a Bayer color filter, while the monochrome depth sensing

video stream is in VGA resolution (640 × 480 pixels) with 11-bit depth, which provides

2,048 levels of sensitivity. The Kinect sensor has a practical ranging limit of 1.2–3.5 m (3.9–

11 ft) distance when used with the Xbox software. The area required is roughly 6m², although

the sensor can maintain tracking through an extended range of approximately 0.7–6 m (2.3–

20 ft). The sensor has an angular field of view of 57° horizontally and 43° vertically, while

the motorized pivot is capable of tilting the sensor up to 27° either up or down. The

horizontal field of the Kinect sensor at the minimum viewing distance of ~0.8 m (2.6 ft) is

therefore ~87 cm (34 in), and the vertical field is ~63 cm (25 in), resulting in a resolution of

just over 1.3 mm (0.051 in) per pixel. Because the Kinect sensor's motorized tilt mechanism

requires more power than the Xbox 360's USB ports can supply, the device makes use of a

proprietary connector combining USB communication with additional power.

The kinect sensor has been designed to work with the XBOX 360 gaming station, so we take

the help of third party developers and open source drivers which make it compatible for use

with a computer.

Page 35: Gesture recognition by a 3d camera and user impersonation ...

24

Microsoft released Kinect software development kit for windows 7 in june 2011 which

allows developers to write Kinecting apps in C++/CLI, C#, or Visual Basic .NET. Thus, it

involves immense scope of research and development being fairly new leap in the field of

depth image processing. It includes the following features:

1. Raw sensor streams: Access to low-level streams from the depth sensor, color camera

sensor, and four-element microphone array.

2. Skeletal tracking: The capability to track the skeleton image of one or two people

moving within the Kinect field of view for gesture-driven applications.

3. Advanced audio capabilities: Audio processing capabilities include sophisticated

acoustic noise suppression and echo cancellation, beam formation to identify the

current sound source, and integration with the windows speech recognition API.

4. Sample code and documentation.

4.1.1 Using Kinect

Hardware connections:

The Kinect is provided with a normal USB connector and split to a normal power adapter.

Using it over a prolonged period of time gets the sensor heated up so regular break intervals

of one hour should be taken so as to avoid any damage.

4.1.2 Kinect Drivers

There are three options to get the driver working on Windows platform:

CLNUI (untested).

Open-Kinect.

OpenNI/NITE.

Page 36: Gesture recognition by a 3d camera and user impersonation ...

25

Here we are using OpenNI/NITE drivers and this includes the following software:

OpenNI 1.0.0.25 Unstable.

Primesense NITE 1.3.0.18 Unstable.

SensorKinect 0.4.

Microsoft Visual C++ Redistributable.

Visual Studio 2010.

Although there are a variety of versions of the above software, we are using the above

mentioned ones because these are compatible to be used with MATLAB. For using it, the

function files have to be linked with mex files which are based on Visual Studio. From the

file exchange database of MATLAB Central, some C++ wrapper functions for Kinect and

OpenNI library were downloaded and linked into MATLAB by compiling the cpp files.

The main aim of using Kinect is to track a user and detect the movements so that the gesture

can be impersonated via a humanoid. For this purpose, skeletal tracking of the human body is

done. At the heart of Kinect, lies a time of flight camera that measures the distance of any

given point from the sensor using the time taken by near-IR light to reflect from the object. In

addition to it, an IR grid is projected across the scene to obtain deformation information of

the grid to model surface curvature. Cues from RGB and depth stream from the sensor are

used to fit a stick skeleton model to the human body which has been described later.

4.2 ATMEL AVR:

The AVR is a modified Harvard architecture8-bitRISC single chip microcontroller

which was developed by Atmel in 1996. The AVR was one of the first microcontroller

families to use on-chip flash memory for program storage, as opposed to one-time

programmable ROM, EPROM, or EEPROM used by other microcontrollers at the time.

Flash, EEPROM, and SRAM are all integrated onto a single chip, removing the need for

external memory in most applications. Some devices have a parallel external bus option to

allow adding additional data memory or memory-mapped devices. Almost all devices (except

Page 37: Gesture recognition by a 3d camera and user impersonation ...

26

the smallest TinyAVR chips) have serial interfaces, which can be used to connect larger

serial EEPROMs or flash chips.

Current AVRs offer a wide range of features:

Multifunction, bi-directional general-purpose I/O ports with configurable, built-in

pull-up resistors

Internal, self-programmable instruction flash memory up to 256 kB (384 kB on

XMega)

In-system programmable using serial/parallel low-voltage interfaces

debugWIRE uses the /RESET pin as a bi-directional communication channel

to access on-chip debug circuitry. It is present on devices with lower pin

counts, as it only requires one pin.

Internal data EEPROM up to 4 kB

Internal SRAM up to 16 kB

Analog comparator

10 or 12-bit A/D converters, with multiplex of up to 16 channels

12-bit D/A converters

A variety of serial interfaces, including

Synchronous/asynchronous serial peripherals (UART/USART) (used with RS-

232, RS-485, and more)

Serial Peripheral Interface Bus (SPI)

Universal Serial Interface (USI) for two or three-wire synchronous data

transfer

Multiple power-saving sleep modes

Lighting and motor control (PWM-specific) controller models

USB controller support

Ethernet controller support

LCD controller support

Page 38: Gesture recognition by a 3d camera and user impersonation ...

27

4.2.1 Atmega 16

The high-performance, low-power Atmel 8-bit AVR RISC-based microcontroller

combines 16KB of programmable flash memory, 1KB SRAM, 512B EEPROM, an 8-channel

10-bit A/D converter, and a JTAG interface for on-chip debugging. The device supports

throughput of 16 MIPS at 16 MHz and operates between 4.5-5.5 volts.

By executing instructions in a single clock cycle, the device achieves throughputs

approaching 1 MIPS per MHz, balancing power consumption and processing speed.

PIN DIAGRAM

Figure 4.3

Page 39: Gesture recognition by a 3d camera and user impersonation ...

28

4.2.2 Pin Descriptions

VCC - Digital supply voltage.

GND - Ground.

Port A (PA7..PA0) - Port A serves as the analog inputs to the A/D Converter. Port A also

serves as an 8-bit bi-directional I/O port, if the A/D Converter is not used.

Port B (PB7..PB0) - Port B is an 8-bit bi-directional I/O port with internal pull-up resistors

(selected for each bit). The Port B output buffers have symmetrical drive characteristics with

both high sink and source capability.

Port C (PC7..PC0) - Port C is an 8-bit bi-directional I/O port with internal pull-up resistors

(selected for each bit). The Port C output buffers have symmetrical drive characteristics with

both high sink and source capability.

Port D (PD7..PD0) - Port D is an 8-bit bi-directional I/O port with internal pull-up resistors

(selected for each bit). The Port D output buffers have symmetrical drive characteristics with

both high sink and source capability.

RESET - Reset Input. A low level on this pin for longer than the minimum pulse length will

generate a reset, even if the clock is not running.

XTAL1 - Input to the inverting Oscillator amplifier and input to the internal clock operating

circuit.

XTAL2 - Output from the inverting Oscillator amplifier.

AVCC - AVCC is the supply voltage pin for Port A and the A/D Converter. It should be

externally connected to VCC, even if the ADC is not used. If the ADC is used, it should be

connected to VCC through a low-pass filter.

AREF - AREF is the analog reference pin for the A/D Converter.

4.2.3 Using the USART of ATMEGA16

A Universal Asynchronous Receiver/Transmitter, abbreviated UART is a type of

"asynchronous receiver/transmitter", a piece of computer hardware that translates data

between parallel and serial forms. UARTs are commonly used in conjunction with

communication standards such as EIA, RS-232, RS-422 or RS-485. The universal

designation indicates that the data format and transmission speeds are configurable and that

Page 40: Gesture recognition by a 3d camera and user impersonation ...

29

the actual electric signaling levels and methods (such as differential signaling etc.) typically

are handled by a special driver circuit external to the UART.

A UART is usually an individual (or part of an) integrated circuit used for serial

communications over a computer or peripheral device serial port. UARTs are now commonly

included in microcontrollers. A dual UART, or DUART, combines two UARTs into a single

chip. Many modern ICs now come with a UART that can also communicate synchronously;

these devices are called USARTs (universal synchronous/asynchronous receiver/transmitter).

Like many microcontrollers AVR also have a dedicated hardware for serial communication

this part is called the USART - Universal Synchronous Asynchronous Receiver Transmitter.

This special hardware makes your life as programmer easier. You just have to supply the data

you need to transmit and it will do the rest. As you saw serial communication occurs at

standard speeds of 9600, 19200 bps etc and this speeds are slow compared to the AVR CPUs

speed. The advantage of hardware USART is that you just need to write the data to one of the

registers of USART and your done, you are free to do other things while USART is

transmitting the byte.

Also the USART automatically senses the start of transmission of RX line and then inputs the

whole byte and when it has the byte it informs you (CPU) to read that data from one of its

registers.

The USART of AVR is very versatile and can be setup for various different modes as

required by your application. In this tutorial I will show you how to configure the USART in

a most common configuration and simply send and receive data. Later on I will give you my

library of USART that can further ease you work. It will be little complicated (but more

useful) as it will have a FIFO buffer and will use interrupt to buffer incoming data so that you

are free to anything in your main () code and read the data only when you need. All data is

stored into a nice FIFO (first in first out queue) in the RAM by the ISR.

The USART of the AVR is connected to the CPU by the following six registers.

UDR - USART Data Register: Actually this is not one but two register but when we

read it we will get the data stored in receive buffer and when we write data to it goes

into the transmitter‟s buffer.

UCSRA - USART Control and status Register A : As the name suggests it is used to

configure the USART and it also stores some status about the USART. There are two

more of these kinds the UCSRB and UCSRC.

UBRRH and UBRRH : This is the USART Baud rate register, it is 16BIT wide so

UBRRH is the High Byte and UBRRL is Low byte. But as we are using C language it

is directly available as UBRR and compiler manages the 16BIT access.

So the connection of AVR and its internal USART can be visualized as follows.

Figure 4.4

Page 41: Gesture recognition by a 3d camera and user impersonation ...

30

UDR : USART Data Register

UCSRA : USART Control and Status Register A

UCSRB : USART Control and Status Register B

UCSRC : USART Control And Status Register C

UBRR : USART Baud Rate Register:UBRRH is the High Byte and UBRRL is Low

byte.

4.3 ZIGBEE Communication:

ZigBee is a standard that defines a set of communication protocols for low-data-rate, very

low power, short-range wireless networking. ZigBee is a specification for a suite of high level

communication protocols using small, low-power digital radios based on an IEEE 802

standard for personal area networks. ZigBee-compliant radios may operate on one of three

different radio bands: the 800 MHz, 900 MHz, or 2.4 GHz frequencies. The technology

defined by the ZigBee specification is intended to be simpler and less expensive than other

WPANs, such as Bluetooth. ZigBee is targeted at radio-frequency (RF) applications that

require a low data rate, long battery life, and secure networking.

4.3.1 XBEE

This is the hardware that we are using which supports ZIGBEE communication.

The XBee and XBee-PRO OEM RF Modules were engineered to meet IEEE 802.15.4

standards and support the unique needs of low-cost, low-power wireless sensor

networks.

Figure 4.5

Page 42: Gesture recognition by a 3d camera and user impersonation ...

31

The modules require minimal power and provide reliable delivery of data between

devices.

The modules operate within the ISM 2.4 GHz frequency band and are pin-for-pin

compatible with each other.

4.3.2 XBEE Working Principle

The XBEE RF modules interface to a host device through a logic-level asynchronous

serial port. Through its serial port, the module can communicate with any logic and

voltage compatible UART.

Data enters the module UART through pin 3 as an asynchronous serial signal. The

signal should be idle high when no data is being transmitted.

Each data byte has a start bit (low), 8 data bits (least significant bit first) and a stop bit

(high).

Figure 4.6

Page 43: Gesture recognition by a 3d camera and user impersonation ...

32

CHAPTER 5

PROCESS INVOLVED

The dynamic process of behavioural synchronization through a humanoid begins

with derivation of real time information about the joints of the user be it prismatic (linear) or

revolute (rotational). With degree of freedom as high as 18 the task becomes extremely

tedious and requires tremendous processing power. One of the major problems in monitoring

multiple joint information is the definition of the reference frame with respect to which

information about the other frames can be derived.

Page 44: Gesture recognition by a 3d camera and user impersonation ...

33

Even though the reference for individual joints may vary it is convenient to process all

the joints with a single reference point. The Kinect Camera provides this information to the

central control block using a combination of infrared imaging and RGB camera. The infrared

laser creates a depth profile of the area in display and the RGB camera returns RGB matrix.

Since the Kinect sensor is specifically designed for gaming purposes it intrinsically calculates

the distance between the user and itself with a defined reference frame. The reference frame

for the Kinect sensor is a predefined location approximately at the center of the view. The

horizontal right hand side of observation denotes the x axis and transverseaxisdenotes y axis.

The z axis is the linear distance between the Kinect and user, the farther the user the higher

the z value.

5.1 Data Points from KINECT:

Using the Kinect we monitor the real time positions and orientation of 15 different data

points in the user‟s body, namely:

Head

Neck

Left Shoulder

Left elbow

Left hand

Right Shoulder

Right elbow

Right hand

Torso

Left Hip

Left Knee

Left Foot

Right Hip

Right knee

Right foot

Page 45: Gesture recognition by a 3d camera and user impersonation ...

34

The sensor generates a 2D matrix of 225 x 7 dimension with the above mentioned

data points for a single person anda maximum tracking capability of 15 people. The seven

columns are User ID, Tracking Confidence, x, y,z (coordinates in mm) and the next X,Y (the

pixel value of the corresponding data point). Using this information we calculate:

The joint angles in different planes between different links.

The link lengths between data points.

5.1.1 Calculation of Joint Angles

Having raw joints is not enough for further operations, because of that on the next

step the vector between joints should be constructed.As an example below a description of

construction of vector between joints of right hand has been given:

Connecting the right elbow, right hand and right elbow and right shoulder determines

the angle for right elbow.

Connecting the right shoulder, right elbow and right shoulder and neck determines the

angle for right shoulder.

Now the angle between the vectors is calculated by using a simple geometrical

operation such as angle between two vectors i.e.

𝑢. 𝑣 = cos(𝜃) 𝑢 𝑣

5.1.2 Calculation of Link Lengths

Using the x, y and z coordinates derived from Kinect sensor and using the formula for

distance calculation the link lengths are found.

𝑙𝑒𝑛𝑔𝑡ℎ = (𝑥2 − 𝑥1)2 + (𝑦2 − 𝑦1)2 + (𝑧2 − 𝑧1)2

Page 46: Gesture recognition by a 3d camera and user impersonation ...

35

After calculating the joint angles and link lengths, the next step in the behavioral

synchronization would be to realize these real time parameters in the desired form. This step

includes conversion of the parameters to actual servo angles. This step is useful for tuning

purpose between the real angle values of joints and robot servo values. Another reason for the

necessity of this step is because the real human joint angle values range from 0~360°, when

the robot servo motor can accept values from 0~255.

Now that the destination coordinates of the individual tool frames for the humanoid

robot have been derived, two simultaneous steps follow, embedding the gesture movements

into the virtual (software) and real (hardware) model.

5.2 Denavit-Hartenberg Methodology:

The virtual model is designed based on principles of kinematic modeling by Denavit-

Hartenberg methodology for robotic manipulators. There are total 18 DOF (Degrees of

Freedom). Each arm has 5 DOF and each leg has 4 DOF.The destination coordinates arefed

into the ADAMS plant which has been configured to take input in the form of angular

velocities and output the current joint angles. The ADAMS plant is executed in MATLAB

(Simulink) environment with a function file to control it.There are two separate methods to

control the motion of a humanoid i.e. DC (Direct Control) and CC (Command

Control).Direct Control performs the conversion of 3D joint coordinates of the user to servo

angles whereas the Command Control converts information to a single command for the

robot‟s understanding. The information that is sent is the destination positions for the tool

frame and the calculation of joint angles is left for the model.

Since the DOF for each arm is 3, the tool frame coordinates – x, y and z depends on 3

angles 𝜃1, 𝜃2𝑎𝑛𝑑𝜃3 which can be expressed as:

𝑥𝑦𝑧 = 𝑇

𝜃1

𝜃2

𝜃3

Here „T‟ is the transformation matrix that relates the coordinates with the joint angles.

Page 47: Gesture recognition by a 3d camera and user impersonation ...

36

𝑥𝑦 𝑧

= 𝐽

𝜃1

𝜃2

𝜃3

„J‟ denotes the jacobian matrix and 𝑥 , 𝑦 𝑎𝑛𝑑𝑧 are the product of error signal of the

corresponding axis and the proportionality constant𝑘𝑝 and𝜃1 , 𝜃2

𝑎𝑛𝑑𝜃3 are the joint anglular

velocities.

𝑥 = (𝑥𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 − 𝑥) × 𝑘𝑝

𝑦 = (𝑦𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 − 𝑦) × 𝑘𝑝

𝑧 = (𝑧𝑑𝑒𝑠𝑡𝑖𝑛𝑎𝑡𝑖𝑜𝑛 − 𝑧) × 𝑘𝑝

To evaluate the angular velocities, we use the expression,

𝜃1

𝜃2

𝜃3

= 𝐽−1 𝑥𝑦 𝑧

Page 48: Gesture recognition by a 3d camera and user impersonation ...

37

5.2.1 Simulink Model

Figure 5.1

Page 49: Gesture recognition by a 3d camera and user impersonation ...

38

Moving on to the hardware model by analyzing the simulation output, it was observed

that the command control strategy was not ideal for hardware based platform as the solution

space for the simulation model may not be similar with the model. So in order to avoid

singularity points in the movement, direct control strategy is implemented. For the same, the

Kinect sensor is configured not to calculate the destination for the tool frame but the

individual joint angles that are required for behavioral synchronization. After the joint angles

have been derived, there comes a need to communicate these angles in a suitable format for

the microcontrollers to recognize and implement. A wireless communication technique was

employed according to the ZIGBEE standard protocol via XBEE trans-receiver modules.

5.3 Modes of Operation of ZIGBEE:

As mentioned earlier, the technology defined by the ZigBee specification is intended

to be simpler and less expensive than other WPANs, such as Bluetooth. ZigBee is targeted at

radio-frequency (RF) applications that require a low data rate, long battery life, and secure

networking.

One of the Xbee pair is connected to the PC using Xbee USB adapter board and the

other one is interfaced through the development board using a microcontroller. The module

can communicate with any logic and voltage compatible UART through a logic –level

asynchronous serial port. The data enters the module‟s UART from the serial port of the PC

through the DI pin as an synchronous serial signal. The signal should idle be high when no

data is being transmitted. Each data byte consists of a start bit(low), 8 databits (least

significant being first) and a stop bit.

There are two modes of operation:

1.Transparent mode

Page 50: Gesture recognition by a 3d camera and user impersonation ...

39

In this mode all UART data received through the DI pin is queued up for RF

transmission. Data is buffered in the DI buffer until the data is packetized and transmitted.

When RF data is received the data is sent out through the DO pin.

2. API (Application Programming Interface)

In API mode all the data entering and leaving the module is contained in frames that

defines the operations or events within the module.

The default operation which is used by Xbee is transparent mode. The combination of

XCTU software and Xbee adapter is used to configure the baud rate, channel number and

network ID of Xbeemodule .The advantage of using Zigbee is that it consumes less power

than WiFi and has longer range than Bluetooth. It operates in the Industrial, Scientific and

Medical (ISM) thus making it the best option for robot communication.

5.4 Memory Addressing Technique:

Now that the information to be sent and method of communication is ready, the next

step is finding out the memory addressing technique so that the data transmitted should be in

synchronization with the data received. In order to address the 18 actuated joints with a single

duplex channel of 8 bit word length we had to involve an addressing strategy. The number of

bits required to address 18 individual motors would ideally be 5 bits and to transmit 8 bits of

data to each of the motors makes the word length of 13 bits which is not available. So instead

of delivering the word in a single transmission the information is broken in 2 nibbles of 4 bit

length and 2 consecutive transmissions are required to successfully transmit the 8 bit of data.

With this split we had to assign 1 bit to distinguish between the higher and the lower nibble

which again created a problem as the word length of a transmission is limited to 8 bits only.

A new way of addressing had to be involved to understand it we imagine a room with 2 doors

each with a different key to access it. Only when you have successfully unlocked both the

locks with 2 separate keys you can access the room. Metaphorically the motors are the room

and the keys are the address to unlock it. 2 separate 4 bit address for each motor are used to

send the higher and lower nibble of data from the source. This scheme allows us to address a

Page 51: Gesture recognition by a 3d camera and user impersonation ...

40

maximum of 256 different addressees and transmit information of 8 bits to each one in 2

consecutive transmission.

5.5 Controlling the Servo Motor:

After the reception of the control signal at the remotely located manipulators the

microcontrollers controlling the servos decodes the servo angles from the control signal and

implements it. Controlling a servo using a microcontroller requires no external driver like H-

bridge only a control signal needs to be generated for the servo to position it in any fixed

angle. The standard frequency for control signal is 50Hz and the duty cycle controls the

angle.

Figure 5.2

The timing for standard servos for efficient control of the servo angle is as follows:

0.388ms = 0 degree.

1.264ms = 90 degrees. (Neutral position)

2.14ms = 180 degrees.

The control signal is 50Hz (i.e. the period is 20ms) and the width of

positive pulse controls the angle. This implementation of the servo angle

completes 1 cycle of the process as it ends with the beginning of the kinect

sensor inputting the joint information to the control system.

Page 52: Gesture recognition by a 3d camera and user impersonation ...

41

CHAPTER 6

CODE IMPLEMENTED

Coding takes up a major role in deciding the operation of any work process. Here,

the coding takes place in various stages, namely:

For KINECT (in MATLAB).

For ADAMS.

For the Microcontrollers implanted in Humanoid.

All the codes along with the outputs have been illustrated in the following pages.

Page 53: Gesture recognition by a 3d camera and user impersonation ...

42

6.1 Coding in KINECT:

From the functional coding in MATLAB, the following parameters were extracted:

RGB Image.

IR Image.

Skeleton Image of the User.

6.1.1 RGB Image.

Code – addpath('Mex')

SAMPLE_XML_PATH='Config/SamplesConfig.x

ml';

KinectHandles=mxNiCreateContext(SAMPLE_X

ML_PATH);

figure;

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2

1]);

subplot(1,2,1),h1=imshow(I);

subplot(1,2,2),h2=imshow(I,[0 9000]);

colormap('jet');

for i=1:90

Figure 6.1

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

D=mxNiDepth(KinectHandles); D=permute(D,[2 1]);

set(h1,'CDATA',I);

set(h2,'CDATA',D);

drawnow;

end

% Stop the Kinect Process

mxNiDeleteContext(KinectHandles);

Page 54: Gesture recognition by a 3d camera and user impersonation ...

43

6.1.2 IR Image.

Code – addpath('Mex')

SAMPLE_XML_PATH='Config/SampleIRConfig.xml';

% Start the Kinect Process

KinectHandles=mxNiCreateContext(SAMPLE_XML_PATH);

figure;

J=mxNiInfrared(KinectHandles); J=permute(J,[2 1]);

h=imshow(J,[0 1024]);

for i=1:9000

J=mxNiInfrared(KinectHandles); J=permute(J,[2 1]);

set(h,'Cdata',J);

end

% Stop the Kinect Process

mxNiDeleteContext(KinectHandles);

Figure 6.2

Page 55: Gesture recognition by a 3d camera and user impersonation ...

44

6.1.3 Skeleton Image.

Code –

addpath('Mex')

SAMPLE_XML_PATH='Config/SamplesConfig.xml';

KinectHandle=mxNiCreateContext(SAMPLE_XML_PATH);

while(Pos(1)==0);

mxNiUpdateContext(KinectHandles);

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

Pos= mxNiSkeleton(KinectHandles);

set(h,'Cdata',I); drawnow;

end

hh=zeros(1,9);

while(Pos(1)>0)

mxNiUpdateContext(KinectHandles);

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

set(h,'Cdata',I); drawnow;

if(hh(1)>0);

for i=1:9, delete(hh(i)); end

end

hold on

y=Pos(1:15,7);

x=Pos(1:15,6);

hh(1)=plot(x,y,'r.');

hh(2)=plot(x([13 14 15]),y([13 14 15]),'g');

hh(3)=plot(x([10 11 12]),y([10 11 12]),'g');

hh(4)=plot(x([9 10]),y([9 10]),'m');

hh(5)=plot(x([9 13]),y([9 13]),'m');

hh(6)=plot(x([2 3 4 5]),y([2 3 4 5]),'b');

hh(7)=plot(x([2 6 7 8]),y([2 6 7 8]),'b');

hh(8)=plot(x([1 2]),y([1 2]),'c');

hh(9)=plot(x([2 9]),y([2 9]),'c');

drawnow

end

mxNiDeleteContext(KinectHandles);

Page 56: Gesture recognition by a 3d camera and user impersonation ...

45

Figure 6.3

Using the skeletal image shown above, two parameters were found:

Link lengths.

Angle between required joints.

6.1.4 Finding out the link lengths.

Code –

addpath('Mex')

SAMPLE_XML_PATH='Config/SamplesConfig.xml';

KinectHandles=mxNiCreateContext(SAMPLE_XML_PATH);

figure,

Pos= mxNiSkeleton(KinectHandles);

Page 57: Gesture recognition by a 3d camera and user impersonation ...

46

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

h=imshow(I);

while(Pos(1)==0);

mxNiUpdateContext(KinectHandles);

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

Pos= mxNiSkeleton(KinectHandles);

set(h,'Cdata',I); drawnow;

end

hh=zeros(1,9);

while(Pos(1)>0)

mxNiUpdateContext(KinectHandles);

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

set(h,'Cdata',I); drawnow;

hold on

y=Pos(1:15,7);

x=Pos(1:15,6);

sx= Pos(6,3);

sy= Pos(6,4);

sz= Pos(6,5);

ex= Pos(7,3);

ey= Pos(7,4);

ez= Pos(7,5);

hx = Pos(8,3);

hy = Pos(8,4);

hz = Pos(8,5);

l1 = (((sz-ez)^2)+((sy-ey)^2)+((sx-ex)^2))^0.5

l2 = (((ez-hz)^2)+((ey-hy)^2)+((ex-hx)^2))^0.5

hh(1)= plot(x,y,'r.');

hh(2)=plot(x([13 14 15]),y([13 14 15]),'g');

hh(3)=plot(x([10 11 12]),y([10 11 12]),'g');

hh(4)=plot(x([9 10]),y([9 10]),'m');

hh(5)=plot(x([9 13]),y([9 13]),'m');

hh(6)=plot(x([2 3 4 5]),y([2 3 4 5]),'b');

hh(7)=plot(x([2 6 7 8]),y([2 6 7 8]),'b');

hh(8)=plot(x([1 2]),y([1 2]),'c');

hh(9)=plot(x([2 9]),y([2 9]),'c');

drawnow

end

mxNiDeleteContext(KinectHandles);

Page 58: Gesture recognition by a 3d camera and user impersonation ...

47

Figure 6.4

6.1.5 Finding out the angles.

Code –

addpath('Mex')

SAMPLE_XML_PATH='Config/SamplesConfig.xml';

KinectHandles=mxNiCreateContext(SAMPLE_XML_PATH);

figure,

Pos= mxNiSkeleton(KinectHandles);

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

h=imshow(I);

while(Pos(1)==0);

mxNiUpdateContext(KinectHandles);

Page 59: Gesture recognition by a 3d camera and user impersonation ...

48

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

Pos= mxNiSkeleton(KinectHandles);

set(h,'Cdata',I); drawnow;

end

hh=zeros(1,9);

while(Pos(1)>0)

mxNiUpdateContext(KinectHandles);

I=mxNiPhoto(KinectHandles); I=permute(I,[3 2 1]);

set(h,'Cdata',I); drawnow;

Pos= mxNiSkeleton(KinectHandles,1);

if(hh(1)>0);

for i=1:9, delete(hh(i)); end

end

hold on

y=Pos(1:15,7);

x=Pos(1:15,6);

rshoulder = [ Pos(6,3) Pos(6,4) Pos(6,5) ];

lshoulder = [ Pos(3,3) Pos(3,4) Pos(3,5) ];

relbow = [ Pos(7,3) Pos(7,4) Pos(7,5) ];

rhand = [ Pos(8,3) Pos(8,4) Pos(8,5) ];

rhip = [ Pos(13,3) Pos(13,4) Pos(13,5) ];

torso = [ Pos(9,3) Pos(9,4) Pos(9,5) ];

head = [ Pos(1,3) Pos(1,4) Pos(1,5) ];

rshouldern = [ Pos(6,3) Pos(6,4) ];

lshouldern = [ Pos(3,3) Pos(3,4) ];

relbown = [ Pos(7,3) Pos(7,4) ];

x2 = findslope(rshouldern,lshouldern);

y2 = findslope(rshouldern,relbown);

angle1 = slopeangle(x2,y2);

angle2 = hypot(angle1,0)-76;

torsom = [ Pos(9,4) Pos(9,5) ];

headm = [ Pos(1,4) Pos(1,5) ];

x1 = findslope(torsom,headm);

rshoulderm = [ Pos(6,4) Pos(6,5) ];

relbowm = [ Pos(7,4) Pos(7,5) ];

y1 = findslope(rshoulderm,relbowm);

Page 60: Gesture recognition by a 3d camera and user impersonation ...

49

angle = slopeangle(x1,y1);

angle3 = hypot(angle,0)-2;

anc = [ angle2 angle3 ]

hh(1)=plot(x,y,'r.');

hh(2)=plot(x([13 14 15]),y([13 14 15]),'g');

hh(3)=plot(x([10 11 12]),y([10 11 12]),'g');

hh(4)=plot(x([9 10]),y([9 10]),'m');

hh(5)=plot(x([9 13]),y([9 13]),'m');

hh(6)=plot(x([2 3 4 5]),y([2 3 4 5]),'b');

hh(7)=plot(x([2 6 7 8]),y([2 6 7 8]),'b');

hh(8)=plot(x([1 2]),y([1 2]),'c');

hh(9)=plot(x([2 9]),y([2 9]),'c');

drawnow

end

mxNiDeleteContext(KinectHandles);

Figure 6.5

Page 61: Gesture recognition by a 3d camera and user impersonation ...

50

6.2 Coding in ADAMS - MATLAB:

The complete virtual model was embedded with all the multidisciplinary values and the plant

was exported to MATLAB. This links the model between ADAMS and MATLAB so that

after executing this code, the Simulink model is generated, the execution of which drives the

virtual model so that the desired results can be obtained.

After simulating the model, the torque profiles for the joints of right arm were extracted:

Figure 6.6

Figure 6.7 Figure 6.8

Page 62: Gesture recognition by a 3d camera and user impersonation ...

51

It can be observed from Fig that the torque required is at the maximum position i.e. the crest

of the graph. From Fig , it can be seen that the torque profile changes according to the

position of the hand and Fig represents the elbow joint. It was programmed to have a

continuous to-and-fro motion and due to this the graph is obtained as a sine wave with

increasing amplitude.

After that, input was taken in from the KINECT camera and it was fed into the model so that

there is synchronization between the user and the humanoid.

Figure 6.9

Page 63: Gesture recognition by a 3d camera and user impersonation ...

52

6.3 Coding in the Microcontrollers:

For driving two servo motors one Atmega microcontroller, one BEC (Battery Eliminator

Circuit) is used. For the different microcontrollers, different address has to be assigned which

was explained in the Process Involved section. The basic difference between the coding of

different microcontrollers is just the addressing value. Shown below is the code which is

implemented:

#include<avr/io.h>

#include<util/delay.h>

#include<xslcd.h>

#include <avr/interrupt.h>

voidUSARTInit(uint16_t); // Function declarations.

charUSARTReadChar(void);

voidUSARTWriteChar(char);

voidUSARTInit(uint16_t ubrr_value) // Function for initializing USART.

{

UBRRL = ubrr_value;

UBRRH = (ubrr_value>>8);

UCSRC=(1<<URSEL)|(3<<UCSZ0);

UCSRB=(1<<RXEN)|(1<<TXEN);

}

charUSARTReadChar(void) // Function for Reading Value from USART.

{

while(!(UCSRA & (1<<RXC)));

return UDR;

}

int main(void) // Main function starts.

{

unsigned char data;

unsigned char add1,add2;

unsigned char vl,vh,value;

TCCR1A|=(1<<COM1A1)|(1<<COM1B1)|(1<<WGM11);

TCCR1B|=(1<<WGM13)|(1<<WGM12)|(1<<CS11)|(1<<CS10);

ICR1=2499; // Registers are set with values for initializing timers.

value = 0x00;

InitLCD(); // LCD and USART are initialized.

USARTInit(51);

LCDClear();

Page 64: Gesture recognition by a 3d camera and user impersonation ...

53

LCDWriteStringXY(0,0,"Initialising...");

_delay_ms(160);

LCDClear();

LCDWriteStringXY(0,0,"M1=");

LCDWriteStringXY(0,1,"M2=");

DDRD|=(1<<PD4)|(1<<PD5);

while(1)

{

data = USARTReadChar(); // Value is read from USART.

_delay_ms(100);

add1 = data & 0xf0;

switch (add1) // The higher nibble is compared to first 4 bits.

{

case 0x00:

vl = data & 0x0f;

data = USARTReadChar(); // If it is satisfied, next value is read from USART.

add2 = data & 0xf0;

if (add2 == 0x80) // After comparing it to the next value, if the

{ // conditions are satisfied, the information about angles

vh = data<<4; // is extracted and fed to the motor.

value = vl | vh;

if (value < 180)

{

OCR1B = 58+(1.55*value);

LCDWriteIntXY(4,0,value,3);

}

else

LCDWriteStringXY(4,0,'Invalid Angle');

break;

}

else

break;

break;

case 0x10:

vl = data & 0x0f;

data = USARTReadChar();

add2 = data & 0xf0;

if (add2 == 0x90)

{

vh = data<<4;

value = vl | vh;

if (value < 180)

{

OCR1A = 56+(1.55*value);

LCDWriteIntXY(4,1,value,3);

Page 65: Gesture recognition by a 3d camera and user impersonation ...

54

}

else

LCDWriteStringXY(4,1,'invalid angle');

break;

}

else

break;

break;

default:

add1 = 0xff;

}

}

return 1;

}

This is the code for programming one of the MCU board. Based upon the address assigned to

the MCU, the code changes.

Page 66: Gesture recognition by a 3d camera and user impersonation ...

55

CHAPTER 7

CONCLUSION

The implemented architecture suggests simple, low cost and robust solution for

controlling humanoid with human gestures. The complete system setup requires the

synchronization of software and hardware. Some problems were faced in the field of setting

up the hardware i.e. motors and actual structure.Thus for demonstration of control via human

gestures, the upper-body was mounted on a mobile platform with four wheels with

differential drive system to maneuver. For the future work, the implementation of humanoid

will be expanded to a bipedal system for control based onvarious concepts such as Zero

Moment Point (ZMP) concept, Force Rotation Indicator (FRI), Support Polygon, etc.

Page 67: Gesture recognition by a 3d camera and user impersonation ...

56

BIBLIOGRAPHY

[1] Vitor M. F. Santos, Filipe M. T. Silva, “Engineering Solutions To Build An

Inexpensive Humanoid Robot Based On A Distributed Control

Architecture”.2005 5th Ieee-Ras International Conference On Humanoid

Robots.

[2] Milton Ruas, Filipe M. T. Silva, Vitor M. F. Santos. “A Low-Level Control

Architecture For A Humanoid Robot”.

[3] Elvedinkljuno, Robert L.Williams Ii. . “Humanoid Walking Robot: Modeling,

Inverse Dynamics, And Gain Scheduling Control”. Hindawi Publishing

Corporation Journal Of Robotics.

[4] James Kuffner, Koichi Nishiwaki, Satoshi Kagami, Masayuki Inaba,

Hirochika Inoue. “Motion Planning For Humanoid Robots”. 11th Int‟l Symp.

Of Robotics Research (Isrr 2003)

[5] Stefan Waldherr, Roseli Romero, Sebastian Thrun. “A Gesture Based

Interface For Human-Robot Interaction”. 2000 Kluwer Academic Publishers.

[6] David Kortenkamp, Eric Hubber, R. Peter Bonasso. “Recognising and

interpreting gestures on mobile Robot”. Robotics and Automation Group,

NASA.

[7] Mark Becker, EfthimiaKefalea, EricMael,Christop Von DerMalsburg,

MikePagel, J Triesch,Jan C. VorbrUggen, Rolf P. Wurtz,

StefanZadel.GripSee: “A Gesture-controlled Robot forObject Perception and

Manipulation”

[8] Christian Plagemann, VarunGanapati, Daphne Koller, Sebastian Thrun. “Real-

time identification and Localization of Body parts from Depth Images” 2010

www.hindawi.com

[9] Luciano Spinello, Kai O. Arras, Rudolph Triebel, Roland Siegwart. “A

layered Approach to people detection in 3D Range Data”. 2010 www.aaai.org