[9] Multiple-Hand-Gesture Tracking Using Multiple Cameras

7/27/2019 [9] Multiple-Hand-Gesture Tracking Using Multiple Cameras

1/6

Multiple-Hand-Gesture Tracking using Multiple CamerasAkira Utsumi and Jun Ohya

ATR Media Integration & Communication Research Laboratories2-2 Hikari-dai Seika-cho Soraku-gun Kyoto, 619-0288, Japan

AbstractW e p ro p os e a method of tracking 3D posit ion,posture , and shapes of hum an hands f rom mul t iple-viewpoint images. Self-occlusion and hand-hand oc -clusion are serious problems in the vision-based hand

tracking. Our system employs mul t iple-v iewpoint andviewpoint selection mechanism to reduce these prob-lems. Each hand posit ion is tracked with a Ka lma nfil ter and the motion vectors are updated with imagefeatures in selected images that do not include hand-hand occlusion. 3D hand postures are estimated w itha smal l number of reliable image features. These fea-tures are extracted based o n distance trans form ation,and they are robust against changes in hand shape andself-occlusion. Finally, a best view i m a g e is selectedfo r each hand for shape recognition. Th e shape recog-niti on process is based on a Four ier descrip tor.

Our system can be used as a user interface devicein a virtual environment, replacing glove-type devicesand overcoming most of the disadvantages of contact-type devices.1 Introduction

Hand gestures provide a useful interface for humansto interact with humans as well as machines. In highDOF manipulation tasks such as the manipulation of3-D objects in virtual scenes, a traditional interfaceof a keyboard and mouse is neither intuitive nor easyto operate. For such a task, we consider direct ma-nipulation with hand gestures as an ideal alternativemethod . This allows a user to directly indicate 3-Dpoints an d issue manipulation commands with his/herown hand.

The idea led to many gesture-based systems usingglove-type sensory devices in the early days of virtualreality research. Such contact -type devices, however,are troublesome to put on and take off, and contin-uously wearing such devices for a long time fatiguesusers.

To overcome these disadvantages, vision researcherstried to develop non-contact type systems to detect

human hand motion [ l ,2, 3, 4, 51. However, these sys-tems had instability peculiar to vision-based systems.The most significant problem is occlusion. Vision sys-tems conventionally require the matching of detectedfeature points between images to reconstruct and tr ack3-D information. However, for moving non-rigid ob-jects like a human hand, the detection and matchingof feature points a re difficult to accomplish correctly.

To avoid such instability, we developed a multiple-viewpoint system based on DT (distance transforma-tion) features [6]. In this paper, we will describe signif-icant extensions of the system to deal wit h more flex-ible information including the motion of both hands.Our system utilizes DT based features from each ofmultiple cameras to reconstruct 3-D position and pos-ture information.

The tracking process is based on Kalman filtering.Some researchers have also proposed a two-hand track-ing system based on Kalman filtering [ 7 ] ,but most oftheir system can only track positions. Our system cancontinuously recognize position, orientation, and theshape of the left a nd right human hands. Hand shapecan be detected on a particular 2-D view that is se-lected based on the 3-D posture information. Thissimplifies the description and recognition process ofthe hand shapes.

We have developed a command interface system us-ing hand traject ory and th e transition of hand s hape.Th e system allows a user to freely create a new object,move it, resize it, change its color, and connect it withother objects in a virtual scene. We will demonstratethe impact of our system using this sampl e applicationin a later section.2 System Overview

The diagram of our system is shown in Fig. 1.Th e system tracks two-hand motion by using sequen-tial images in multiple viewpoints. Economically, itmakes sense to reduce the number of processors, sowe designed t he system t o use fewer processors t hanthe number of cameras. Th e system first selects view-

4730-7695-0149-4/99$10.000 999IEEE

Authorized licensed use limited to: UR Lorraine. Downloaded on June 26, 2009 at 12:25 from IEEE Xplore. Restrictions apply.


2/6

I Gesture Recognition ApplicationsFigure 1: System Diagram

points to be processed based on the estimated handpositions (derived from Kalman filters) so that thereis less occlusion probability.

Then , simple 2-D image featu res (image skeletons(maximum points in DT image) and primary orien-tations) are extracted from each selected image. Assome selected images may still include hand-hand oc-clusion, the system tries to detect the occlusion toignore the image in furth er processing. Th e systemintegrates information from all 'non-o~cluded~magesto up date motion vectors of Kalman filters that track3D hand motions and to estimate hand rotation an-gles.

In the hand shape recognition process, the bestviewpoint with the most frontal view is selected foreach hand. Shape recognition is based on silhouettecontours characterized with a Fourier descriptor.

The results of 3-D position and gesture recognitionare sent t o gesture recognition process, and comm andscorresponding to the recognition result are issued tothe application program. The current experimentalsystem has five cameras and three image processingresources (Fig. 2). Processing speed is about 10 Hz.3 2-D mage processing (feature ex-

As previously m entioned, processing images ar e se-lected from all input images based on the estimated3-D hand positions. Though we omit th e details ofthis process, both occlusion probability and observa-tion (appearance) probability are taken into account.

For each selected image , as shown in Fig.1, distance

traction process)

: !, ~ ~


3/6

4 Two-Hand Tracking Using View-4.1 3-D Position Tracking Using KalmanFiltering

point SelectionF =

Z

,X/ c i

- 1 0 0 1 00 1 0 0 10 0 1 0 00 0 0 1 00 0 0 0 1

- 0 0 0 0 0

Figure 5 : Observation ModelFig. 5 shows our assumed observation model. Here,

the world coordinates are three dimensional and theimage plane is two dimensional. One hand (left orright) located at ( X h , , Y h , , Z h , ) is observed with acamera c , at ( X c , , Y c , ) . Here, j denotes either leftor right. We consider this observation to contain aGaussian error (its covariance matrix is C on the im-age plane).

Here, Lh , , , is the distance between hand h, andcamera c ,; 1, is the focal length of camera 2. $Oh,,, isthe angle between the epi-polar line and the Y - Zplane. eh,,, is the angle between the Z axis and theprojection of the epi-polar line to the Y - 2 plane.RVI,"RB, , ,denotes the rotation matrix that rotates

0 '01001 -

(4)

with t..e camerastate Ci =[ X,; Y,, ZC, 0 0 1' and the epi-polar axis 8 h i , i , 9 h j , i as follows.

Here, H is the observation matri x.1 0 0 0 0 0" = i o 1 0 0 0 0 1

e is the observation error (average [ 0 0 ]I, ovari-ance matrix E h , ,,).

The size of the observation error becomes largerwith increased distance from the observing camera.We express the error size as follows.

4 1, (7)hJCh,,z,t =-2L-

Here, we approximate the distance with Eh, ,%,$ th edistance between C , an d X h , , t ) instead of Lh,,z, t , be-cause the actual distance Lh,,,,t is not applicable.

Using the observation results and the estimation( ( 2 ) , (3)), the st ate of hand h, can be estimated asfollows.i h , , t = k h ,, t ( s l : , t x h , , t $-

th e direct(on of epi-polar line to parallel the Z axis.X h , , t , the st ate of hand h, can be described as,

Assuming the hand motion has a constant velocity, RVhJRohJ H' RVhJReh, ,.)'Cz) .a-- 1

Sh,,t = Sh,,t $-Here, X h , , Y h , and Z h , are the velocity of the theLet us consider th e situation where human h, is ob-

served with N cameras. Let X h , , t - l be the estimationof x h , at time t -1 (the covariance matrix of X h , , t - lis S h , , t - l ) . Then, the state at t can be calculated as

X h , , t =F X h , , t - l (2)(3 )

Here, F is a transition matrix as described below; Qis the covariance matrix of the error occurring in thetransit on.

hand h, for the X , Y and 2 xes, respectively. RVh, L t R e h , 1 t ('i:,z,t) ( R V h , z t R@h , I e ) ' '(1A In equation (8) and (9), E, expresses summation

A for all available (non-occluded) image features for one

follows. - A and right) tracking model on each 2-D image plane.N ( X h , t , s h , t ) is projected onto each image plane

hand.A We correspond the image features and the (left

A 3D Gaussian distribution of the tracking modelwith weak perspective projection for a comparison.The comparison is performed with Mahalanobis dis-tance basis. This f eature correspondence issue is ex-tensively discussed in [8].

- AS h , t =F S h , t -1 F' $- Q.

475

Authorized licensed use limited to: UR Lorraine. Downloaded on June 26 2009 at 12:25 from IEEE X lore. Restrictions a l .


4/6

Fig. 6 shows a tracking results for an example im-age sequence. Here, two hands a re crossing each otherin X direction with small distance. Fig. 7 shows theselected three viewpoints for each frame. Here, theviewpoints marked with x and/or 0 are used for theshape recognition process also (Section 4.3).

As can be seen, two hand s are tracked properly withswitching processing viewpoints according to t he esti-mated hand positions.

+Right Hando+ero::*.p--o- ++*+*+++

-60120 I

I60 t

-40 i133 143 153 163 173 183

STEPFigure 6: Tracking Result s

133 143 153 163STEP

173 183

Figure 7: Camera Selection Results4.2 Posture (rotation angle) estimation

As mentioned above, the 3-D pose of a human handis stably reconstructed from multiple-camera informa-tion. -

/

Figure 8: Ellipsoidal Palm ModelThe rotation angles for the hand orientations are

estimated based on th e distance transf ormation valuesat the hand position (COG) [ 6 ] .Those values are notstrongly affected by hand deformation. To simplifythe estimation of the rotation angle, we employ anellipsoidal model (F ig. 8). Here, we represent a humanpalm by an ellipsoid. By assuming a weak perspectiveprojection, the width s observed by the camera located

at 0 = 0 can be described in a simple equation asfollows. 1s = -das in20+bcos20 , (10 )where L is the hand-camera distance and a and b areconstants; 0 is the rotation angle of the palm. Weestimated a and b using a sample dat a set.Allowing for Gaussian error in the observation,we can estimate the rotation angle at the 0 valuethat maximizes the probability P( I, .. , k 10) of thehand widths (COG distance values) for m camerasSI,. ., being observed. ,

p(s1,-..,srnI0)p ( s i i e - e C 8 ) (11)k l

(12)This approach eliminates the need for stereo-matching in pose estimation. It also increases the sta-bility of t he estimation.Fig. 9 shows an example of rotation angle estima-tion for five different hand shapes of one hand . Here,each dot denotes the estimated angle and the brokenline shows the magnetic sensor output as a reference.

150 I

0 200 400 600 800 1000 1200

noRnnoFigure 9: Rotation Angle Detection4.3 Shape Recognition by Best-View Se-lection

After the hand posture has been determined, theimage with th e most perpendicular axis to the handplane is selected for the hands from available (non-occluded) images. The sha pe recognition process isthen performed (Fig. 11).

Fig. 10 shows the viewpoint selection. Here, wedeal with one hand. A subject rotated his hand with-out moving its position. .The same three cameras wereused in the experiment. The best camera is selectedbased on the hand rotation angle, and the selectedviewpoint varies according to hand rotation.

In the previous example (Fig. 7) , x denotes theselected camera for the right hand and 0 denotes thecamera for the left ha nd.

476



5/6

Table 2: Command ListI CO,~,Z,E.,,rl I Hand Poaition I Shape Tianait.ionCreate I Out,sidc I 7 * 1Grab & MoveResizeChange Color & T e x t . i i r rDeleteConnectSepalateSt. et,cl,

E 18 0 0 20 40 60 80 100 120(frame)

Figure10: Selected Viewpoints (Camera no.)

Isaide 7 ..Inside 7 2Inside 7 f 4Ins ide 7 . + 5Ins ide 7 6Imide 7.3

7 fzinide (Both Hands)

We extract th e contour of t he hand silhouette fromthe selected image and express it with a Fourier de-scriptor [9]. ach gesture is distinguished from othersusing the lower five Fourier parameters.

Fig. 12shows seven hand shapes tha t can be recog-nized in the current implementation and the extrac tedcontours. The number of recognizable shapes can beincreased.

Table 1 shows the recognition results for sampleimages (about 300 frames per each shape).

Figure 11: Hand shape recognition

Figure 12 : Seven hand shapes a nd extra cted contours5 Direct Manipulation for Virtual

Scene CreationWe developed a system where a user can interac-

tively create virtual graphical scenes based on ourhand gestu re recognition system. In this system, auser can create virtual 3-D objects with primitiveshapes and change the objects positions, sizes, col-ors, etc. with hand gestures. Table 2 shows a list ofsuch commands. In the table, Hand Position denoteswhether a users hand is inside or outside of the targetobjects and Shape Transition denotes the change ofhand sh ape to issue or sta rt t he command ( the numberdenotes the hand shape in Fig. 12 ) .Drawing

In the Create command, a user can specify theshape of an object using hand trajectory. First , theuser star ts the command by closing his/her hand (7* 1) and then draws a trajectory by moving his/her

Inside: Inside of Object, Outside: Outside of Object

hand (Fig. 13) . When the user finishes the command(opening his/her hand again), a primitive object iscreated (Fig. 13, right).

Figure 13: Create CommandPointingIn all comm ands except th e Create command, a

user specifies a target object (or a group of objects)by moving his/her hand into the target in 3-D virtualspace (Fig. 15) . After pointing, the user can issue acommand by changing his/her hand shape (Fig. 14).

Slider Figure 14: Delete CommandFor the commands Grab & Move, Resize and

Change Color & Texture, the user controls attrib utesof the target (position, size, color, etc.) by movinghis/her hand after he/she issues the command. Thisis similar to the slide control in a window-base GUI.The user can quit the command by changing his/herhand shape to a neutral shape (7 ) (Fig. 15, right).

All commands except for Stretch can be indepen-dently issued with both the left and right hands (Fig.16) . In t he command Stretch, a user can control theshape of objec t(s) with grabbing an object wit h bothhands and relatively moving his/her hands relatively(Fig. 17) .

More than one hundred subjects tried our system,and most of them could use the system well. Somesubjects could not use the system well because theyhad difficulties in making some of the pre-defined handshapes.

We believe that the advantages of a direct manipu-iation system using a multi-camera based non-contacttype device have been confirmed through our experi-ments.

47 7



6/6

Input FramesCorrect Answer

Recognition Rate(%)

Figure 16: Independent manipulation with both hands6 Conclusion

We have proposed a two-hand tracking systembased on multiple cameras. Th e position, posture andshape of hands are reconstructed with reliable imagefeatures in selected images. This viewpoint selectionmechanism can effectively reduce both self-occlusionand hand-hand occlusion problems without requiringa detailed 3-D modeling. It also reduces the compu-tatio nal costs for both the model construction and itsreconstruction. We confirmed the stability of the re-sults through experiments using real images.

This system can be used for enhancing human-computer interac tions in a wide variety of applicationdomains. Th e direct manipulation system shows anexample of how the recognition results can be used.Future research will include an integration of this sys-tem with our human tracking project to extend thesystems environment.References[ l] Baback Moghaddam and Alex Pentland. Maximumlikelihood detection of faces and hands. In Proc.

of International Workshop on Automati c Face- andGesture-Recognition, pages 122-128, 1995.

Input Hand Shapeshape1 shape2 shape3 shape4 shape5 shape6 shape7

299 298 300 297 299 297 298299 295 271 275 261 297 295100 99.0 90.3 92.6 87.3 100 99.0

Figure 17: Stretch CommandPaul A. Hadfield Roberto Cipolla and Nicholas J.Hollinghurst. Uncalibrated stereo vision with pointingfor a man-machine interface. In Proc. of IAPR Work-shop o n Machine Vision Applications, pages 163-166,1994.James M. Rehg and Takeo Kanade. Visual trackingof high dof articulated structures: an application tohuman hand tracking. In Computer Vision-ECCV 94,LNCS vol. 801, pages 35-46, 1994.Yoshio Iwai, Yasushi Yagi, and Masahiko Yachida. Es-timation of hand motion and position from monocularimage sequence. In Proc. of ACCV95, volume 11, pagesJames Davis and Mubarak Shah. Determining 3-dhand motion. In Asilomar Conference in Signals, Sys-tems and Computers, pages 1262-1266, 1994.Akira Utsumi, Tsutomu Miyasato, Fumio Kishino, andRyohei Nakatsu. Hand gesture recognition system us-ing multiple cameras. In 13th International Conferenceon Pattern Recognition, pages 219-224, 1996.Ali Azarbayejani and Alex Pentland. Real-time self-calibrating stereo person tracking using 3-d shape esti-mation from blob features. In 13th International Con-ference on Pattern Recognition, pages 627-632, 1996.Ingemar J. Cox. A review of statical da ta associationtechniques for motion correspondence. InternationalJournal of Computer Vision, 10:1:53-66, 1993.Yoshinori Uesaka. A new fourier descriptor applica-ble to open curves. Trans. Inst. Elec. Inf. Com. Eng.Japan, 567-A(3):166-173, 3 1984.

230-234, 1995.

478

[9] Multiple-Hand-Gesture Tracking Using Multiple Cameras

Documents

Transcript of [9] Multiple-Hand-Gesture Tracking Using Multiple Cameras