[IEEE 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN) - Xi'an,...

Interaction System of Treadmill Games based on Depth Maps and CAM-Shift

HuHai

Department of Electronics and Information Engineering Huazhong University of science and technology

Wuhan, China e-mail: [email protected]

Abstract-Because of the non-contact and substantivity, recognition from images have become a hotspot in somatic

technology. Especially, recognition with non-markers lets

human body become remote controller without any accessories

and attracts widespread attention. This paper chose CAM

Shift algorithm to track bare hands for its calculate speed and

stability. This article introduced depth information to

traditional CAM-Shift and developed and extended the

algorithm in foreground segmentation, and auto-initialization.

This paper mainly segmented depth map to get the mask off

code of the foreground, and made use of the 3D information of the depth maps to initialize the searching window. Then, the

paper built up Interaction System of Treadmill Games based

on depth maps and CAM-Shift. The system realized the man

machine interaction with non-contact and no-markers by

getting player's depth maps and color images with a

attachment of Xbox-Kinect from Microsoft, 3D tracking the

player's bare hands and analyzing and recognizing the

movements of the hands. Then the experiments testified this

system satisfies the real time request, and the reaction is acute

and correct.

Keywords-CAM-Shift; object tracking; depth map; Kinect

I. INTRODUCTION

Treadmill is the most effective way of doing aerobic exercise and keeping fit for all categories of personnel. It is divided into mechanical and electric. But currently electric treadmills commonly used in homes and fitness centers only provide a passive way of running which forces runner's speed according to the previous settings. This training way is boring and fatigable. With the Wii, especially the Wii Fit sweeping across the world, we notice that fun games can add the effect of exercise training fabulously. On one hand, games make the sport more entertaining and more acceptable. On the other hand, the energy consumption caused by exercise reduces the game time, so it can prevent the game addiction.

Somatosensory games which combine fitness and entertainment are new electronic games operated by physical movement. This technology provides users with accessible, easy and free human-computer interaction experience. So players can naturally react what they saw and get started faster as it is no longer necessary to learn the special operation of the game. Although Wii can make players sweated easier than Xbox [1], medical studies have shown

978-1-61284-486-2/111$26.00 ©2011 IEEE

219

Li Bin, Huang BenXiong, Cui Yi

Department of Electronics and Information Engineering Huazhong University of science and technology

Wuhan, China e-mail: [email protected]

that the energy it consumes far less than ordinary treadmill does. That is to say, treadmill as professional sports equipment is still needed to fitness or weight loss. Now there are many game-type treadmills, such as TD-2000E treadmill from Jinxing Technology. Most of these products are connect with computer by USB port and operated through the buttons on the front control panel of treadmill still. This original key input greatly restricts the movement of hands, and emergence of misuse may be occurred through running process, so the game experience will be down. Maybe this is one of the reasons that game-type treadmill cannot be popular. Tracking method based on video [2] can solve the problem above. It makes use of advanced computer vision technology, gets rid of mechanical restrictions and promotes the entertainment experience of camera games. The system proposed in this paper captures depth maps and color images by Kinect device from Microsoft, and tracks hands without markers through pretreatment of depth maps and CAM-Shift algorithm.

II. CAM-SHIFT TRACKING ALGORITHM

CAM-Shift tracking algorithm based on color performs well in solving the bottom problems of computer vision. Due to its robust and real-time quality, CAM-Shift has become a basic tracking method which can adapt to the continuous variation of the shape and size of the target, compute fast and has strong anti-jamming capability, guaranteeing the stability and real-time of the system. The algorithm is shown in Fig. l.

Figure I. CAM-Shift algorithm flowchart

The basic idea of the CAM-Shift [3] is letting every frame of video go through Mean-Shift searching [4,5,6], and taking the result i.e. the position and scale of the searching window of last frame as the initial window of current frame. Iterating like this can achieve continuous tracking.

III. IMPROVED CAM-SHIFT ALGORITHM BASED ON DEPTH MAP

Though CAM-Shift algorithm is simple and efficient, it still has a lot of shortcomings, e.g. semi-automatic initialization, low tracking accuracy and notable color markers. Thus, further improvement is needed. Present CAM-Shift algorithm only uses the 20 information, and cannot segment the human movement from background accurately, making the accuracy and stability of the tracking result decline. If depth information can be used, many problems will be solved for most moving objects and background are separated in real scene.

This paper chose Kinect, an accessory of Xbox which interprets 3D scene information from a continuouslyprojected infrared structured light, to get depth maps and color images. The device features an RGB camera, depth sensor and multi-array microphone. The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions [7,8].

A. Foreground Segmentation

Because the CAM-Shift algorithm is based on color images, tracking error will easily occur when there is similar color in background. Considering the object is usually separated from the surrounding environment in depth, and has fixed moving range (treadmill pedal), so threshold segmentation in depth map can accurately distinguish the player from the background.

Depth segmentation is shown in Fig. 2: Fig. 2(a) and Fig. 2(b) are input depth map and corresponding color image. Fig. 2( c) shows the gray histogram of the depth map, and we can find that depth concentrate on three ranges: [0, 50] for the front control panel of the treadmill, because it is the nearest things to the Kinect; [100, 175] for the human body and armrest of the treadmill, they are a little further from the Kinect; most grayscale distributes in [180, 255], standing for the background which contains a number of irrelevant information, as it is the furthest things from the Kinect.

Fig. 2(d) is the human image after depth threshold segmentation, and in the picture it can easily find that the control panel and background are get fid of. Fig. 2( e) is the binary mask of Fig. 2(d). As shown in Fig. 2(t), this paper takes expansion process to ensure that the whole human body can be segmented from color image. In the end, color image Fig. 2(b) is segmented successfully with the help of mask Fig. 2(t), and it is shown in Fig. 2(g).

It is noticeable that making use of depth map can quickly and easily remove the interference from background, and is convenient for the following color tracking.

220

Figure 2. Segmentation based on depth map

B. Automatic Initialization

Existing CAM-shift methods are generally initialized by artificially calibration, or searching through the full image with the color model which is stored in the system in advance. But this will bring the problem that the system can not control where the search window will converge to as there are multiple similar targets to track.

As the Fig. 3, we get another group of pictures. Comparing Fig. 3(a) and Fig. 2(a), we can easily find that when the hands are raised forward, the depth of the hands is significantly smaller, because of the distance from hands to Kinect being nearer. So this paper takes this as the initial sign.

It is also accessible to get XOY position of the hands from the depth map, as Fig. 3(c), and take this as the initial searching window in color image, as Fig. 3(d).

Figure 3. Initialization based on depth map

IV. SYSTEM STATEMENT

The system in this paper is only on the basis of original treadmill, Kinect and computer monitor. Since the devices are simple and easy to get, it well apply to home or gym users.

Fig. 4(a) shows the rendering image of the system. T�e System uses the Kinect which settled in front of the treadmill as input device to capture motion images of the player. D�ta frames transmit to the Motion Capture Module whIch tracking moving hands by improved CAM-Shift tracker. Then Motion Recognition Module analyzes the position data and transmits it to the Game Engine to control the game character. As is shown in Fig. 4(b), the system consists of Human Motion Capture Subsystem, Motion Recognition Module and Communication Platform. Human Motion Capture Subsystem includes the three modules: Image Acquisition Module, Image Processing Module and CAMShift Tracking Module. The function of the Subsystem is to capture depth maps and corresponding color images, and then analyze them to get a series of position parameters of both hands. Image segmentation pre-processing is undertaken in Image Processing Module. The system's initialization and movement tracking are the jobs of CAMShift Tracking Module. Motion Recognition Module analyzes these motion parameters, determines whether the user has made an defined action or extracts motion information, and then transports the results to Game Engine, so it can control the character's movement or change the state in the game.

Game Engine defines the specific meaning of each action, such as turning left, turning right and punching, etc. This paper will not go into specific implementation about game engine

221

Figure 4. System overview

This paper built the interaction system of the treadmill game in laboratory. Tester did exercise in the real treadmill, with both hands making varies movement. Kinect captured motion images and host analyzed data in real time. System working environment is: CPU Pentium RD 1.6GHz x 2, RAM l.OGB, Windows XP, Visual C++. The resolution of Kinect is 640 x 480, and image update frequency is 30FPS.

The system adopts a fast bottom way to identify the action according to the tracking results. For example, in the system of this paper, there is a small circle on each side of the screen, standing for turning button. When the center distance of the hand and the circle is smaller than a threshed, i.e. the hand seems to touch the button on the screen, the button is pressed down. System automatically cumulates time and exports a response per K frame. So that longer the hand touch the button, larger the steering angle is. The action will be ended when the hand leaves the button. In addition, system can be set not to cumulate time like altering a switch.

v. RESULT AND CONCLUSION

This paper tested the system and the original CAM-Shift tracking method respectively in laboratory. Fig. 5 shows the results picked-up every 10 frames.

The pictures in left row are tracking results of the system proposed by this paper, in middle are the input depth maps, and in right row are results of the original CAM-Shift algorithm. This examination proved the system in this paper can exactly track movement of player's hands in real-time

and make corresponding response (the small circle discoloration) according to the results. Besides, compared to the original CAM-Shift method, this system has higher stability and accuracy.

In order to quantitatively analyze the system tracking performance, this paper also tested two scenes, each includes 200 frames, as shown in TABLE I. According to the statistics, it can be found that the system can stably track hands when player is swinging arms normally or dramatically. But as to CAM-Shift algorithm just search a larger area than the result window of last frame, it is hard to locate the right target when the target is moving too fast or the result of last frame is incorrect.

TABLE! SYSTEM PERFORMANCE TABLE

Scene Error Rate Miss Rate I 2.5% 4 %

2 10% 5.5%

The system process speed is 37.7ms/frame, CPU occupancy rate is approximately 40%.

The contrast and analysis above prove that the game interaction system of treadmill can accurately track player's hands and respond in real-time in a relatively complicated environment. In addition, the action define interface is open to suit development of more video games. This system improved CAM-Shift tracking method based on depth map and has evidently higher quality.

REFERENCES

[I] Schliimer Thomas, Popponga Benjamin, Henze Niels, "Gesture Recognition with a Wii Controller," Proceedings of the 2m! international conference on Tangible and embedded interaction. New York, USA 2008,pp. 11-14

[2] Falstein N, "A Grand Unified Game Theory," Miller Freeman, 1999 Game Developers Conference Proceedings, San Fransisco, USA 1999. pp. 229-239

[3] Bradski G R, "Computer Vision Face Tracking for use in a Perceptual User Interface," Intel Technology Journal, 1998, 2nd Quarter

[4] Isard M and Blake A, "Condensation-conditional Density Propagation for Visual Tracking," International Journal on Computer Vision, 1998, 29(1) pp. 5-28

[5] Fukunaga K. and Hostetler L D., 'The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition.," IEEE Transactions on Information Theory, 1975, 21: pp. 32-40

[6] Cheng Yizong, "Mean Shift, Mode Seeking, and Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8): pp. 790-799

[7] Prime Sense LTD and Tel Aviv, "Integrated Processor for 3D Mapping," United States, US 2010/0007717 AI. 2010.

[8] Prime Sense LTD, "Method and System for Object Reconstruction," PCT, WO 2007/043036 A 1. 2007.

Figure 5. Tracking results

222

[IEEE 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN) - Xi'an,...

Documents

Transcript of [IEEE 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN) - Xi'an,...