[IEEE 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN) - Xi'an,...
Transcript of [IEEE 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN) - Xi'an,...
Interaction System of Treadmill Games based on Depth Maps and CAM-Shift
HuHai
Department of Electronics and Information Engineering Huazhong University of science and technology
Wuhan, China e-mail: [email protected]
Abstract-Because of the non-contact and substantivity, recognition from images have become a hotspot in somatic
technology. Especially, recognition with non-markers lets
human body become remote controller without any accessories
and attracts widespread attention. This paper chose CAM
Shift algorithm to track bare hands for its calculate speed and
stability. This article introduced depth information to
traditional CAM-Shift and developed and extended the
algorithm in foreground segmentation, and auto-initialization.
This paper mainly segmented depth map to get the mask off
code of the foreground, and made use of the 3D information of the depth maps to initialize the searching window. Then, the
paper built up Interaction System of Treadmill Games based
on depth maps and CAM-Shift. The system realized the man
machine interaction with non-contact and no-markers by
getting player's depth maps and color images with a
attachment of Xbox-Kinect from Microsoft, 3D tracking the
player's bare hands and analyzing and recognizing the
movements of the hands. Then the experiments testified this
system satisfies the real time request, and the reaction is acute
and correct.
Keywords-CAM-Shift; object tracking; depth map; Kinect
I. INTRODUCTION
Treadmill is the most effective way of doing aerobic exercise and keeping fit for all categories of personnel. It is divided into mechanical and electric. But currently electric treadmills commonly used in homes and fitness centers only provide a passive way of running which forces runner's speed according to the previous settings. This training way is boring and fatigable. With the Wii, especially the Wii Fit sweeping across the world, we notice that fun games can add the effect of exercise training fabulously. On one hand, games make the sport more entertaining and more acceptable. On the other hand, the energy consumption caused by exercise reduces the game time, so it can prevent the game addiction.
Somatosensory games which combine fitness and entertainment are new electronic games operated by physical movement. This technology provides users with accessible, easy and free human-computer interaction experience. So players can naturally react what they saw and get started faster as it is no longer necessary to learn the special operation of the game. Although Wii can make players sweated easier than Xbox [1], medical studies have shown
978-1-61284-486-2/111$26.00 ©2011 IEEE
219
Li Bin, Huang BenXiong, Cui Yi
Department of Electronics and Information Engineering Huazhong University of science and technology
Wuhan, China e-mail: [email protected]
that the energy it consumes far less than ordinary treadmill does. That is to say, treadmill as professional sports equipment is still needed to fitness or weight loss. Now there are many game-type treadmills, such as TD-2000E treadmill from Jinxing Technology. Most of these products are connect with computer by USB port and operated through the buttons on the front control panel of treadmill still. This original key input greatly restricts the movement of hands, and emergence of misuse may be occurred through running process, so the game experience will be down. Maybe this is one of the reasons that game-type treadmill cannot be popular. Tracking method based on video [2] can solve the problem above. It makes use of advanced computer vision technology, gets rid of mechanical restrictions and promotes the entertainment experience of camera games. The system proposed in this paper captures depth maps and color images by Kinect device from Microsoft, and tracks hands without markers through pretreatment of depth maps and CAM-Shift algorithm.
II. CAM-SHIFT TRACKING ALGORITHM
CAM-Shift tracking algorithm based on color performs well in solving the bottom problems of computer vision. Due to its robust and real-time quality, CAM-Shift has become a basic tracking method which can adapt to the continuous variation of the shape and size of the target, compute fast and has strong anti-jamming capability, guaranteeing the stability and real-time of the system. The algorithm is shown in Fig. l.
Figure I. CAM-Shift algorithm flowchart
The basic idea of the CAM-Shift [3] is letting every frame of video go through Mean-Shift searching [4,5,6], and taking the result i.e. the position and scale of the searching window of last frame as the initial window of current frame. Iterating like this can achieve continuous tracking.
III. IMPROVED CAM-SHIFT ALGORITHM BASED ON DEPTH MAP
Though CAM-Shift algorithm is simple and efficient, it still has a lot of shortcomings, e.g. semi-automatic initialization, low tracking accuracy and notable color markers. Thus, further improvement is needed. Present CAM-Shift algorithm only uses the 20 information, and cannot segment the human movement from background accurately, making the accuracy and stability of the tracking result decline. If depth information can be used, many problems will be solved for most moving objects and background are separated in real scene.
This paper chose Kinect, an accessory of Xbox which interprets 3D scene information from a continuouslyprojected infrared structured light, to get depth maps and color images. The device features an RGB camera, depth sensor and multi-array microphone. The depth sensor consists of an infrared laser projector combined with a monochrome CMOS sensor, which captures video data in 3D under any ambient light conditions [7,8].
A. Foreground Segmentation
Because the CAM-Shift algorithm is based on color images, tracking error will easily occur when there is similar color in background. Considering the object is usually separated from the surrounding environment in depth, and has fixed moving range (treadmill pedal), so threshold segmentation in depth map can accurately distinguish the player from the background.
Depth segmentation is shown in Fig. 2: Fig. 2(a) and Fig. 2(b) are input depth map and corresponding color image. Fig. 2( c) shows the gray histogram of the depth map, and we can find that depth concentrate on three ranges: [0, 50] for the front control panel of the treadmill, because it is the nearest things to the Kinect; [100, 175] for the human body and armrest of the treadmill, they are a little further from the Kinect; most grayscale distributes in [180, 255], standing for the background which contains a number of irrelevant information, as it is the furthest things from the Kinect.
Fig. 2(d) is the human image after depth threshold segmentation, and in the picture it can easily find that the control panel and background are get fid of. Fig. 2( e) is the binary mask of Fig. 2(d). As shown in Fig. 2(t), this paper takes expansion process to ensure that the whole human body can be segmented from color image. In the end, color image Fig. 2(b) is segmented successfully with the help of mask Fig. 2(t), and it is shown in Fig. 2(g).
It is noticeable that making use of depth map can quickly and easily remove the interference from background, and is convenient for the following color tracking.
220
Figure 2. Segmentation based on depth map
B. Automatic Initialization
Existing CAM-shift methods are generally initialized by artificially calibration, or searching through the full image with the color model which is stored in the system in advance. But this will bring the problem that the system can not control where the search window will converge to as there are multiple similar targets to track.
As the Fig. 3, we get another group of pictures. Comparing Fig. 3(a) and Fig. 2(a), we can easily find that when the hands are raised forward, the depth of the hands is significantly smaller, because of the distance from hands to Kinect being nearer. So this paper takes this as the initial sign.
It is also accessible to get XOY position of the hands from the depth map, as Fig. 3(c), and take this as the initial searching window in color image, as Fig. 3(d).
Figure 3. Initialization based on depth map
IV. SYSTEM STATEMENT
The system in this paper is only on the basis of original treadmill, Kinect and computer monitor. Since the devices are simple and easy to get, it well apply to home or gym users.
Fig. 4(a) shows the rendering image of the system. T�e System uses the Kinect which settled in front of the treadmill as input device to capture motion images of the player. D�ta frames transmit to the Motion Capture Module whIch tracking moving hands by improved CAM-Shift tracker. Then Motion Recognition Module analyzes the position data and transmits it to the Game Engine to control the game character. As is shown in Fig. 4(b), the system consists of Human Motion Capture Subsystem, Motion Recognition Module and Communication Platform. Human Motion Capture Subsystem includes the three modules: Image Acquisition Module, Image Processing Module and CAMShift Tracking Module. The function of the Subsystem is to capture depth maps and corresponding color images, and then analyze them to get a series of position parameters of both hands. Image segmentation pre-processing is undertaken in Image Processing Module. The system's initialization and movement tracking are the jobs of CAMShift Tracking Module. Motion Recognition Module analyzes these motion parameters, determines whether the user has made an defined action or extracts motion information, and then transports the results to Game Engine, so it can control the character's movement or change the state in the game.
Game Engine defines the specific meaning of each action, such as turning left, turning right and punching, etc. This paper will not go into specific implementation about game engine
221
Figure 4. System overview
This paper built the interaction system of the treadmill game in laboratory. Tester did exercise in the real treadmill, with both hands making varies movement. Kinect captured motion images and host analyzed data in real time. System working environment is: CPU Pentium RD 1.6GHz x 2, RAM l.OGB, Windows XP, Visual C++. The resolution of Kinect is 640 x 480, and image update frequency is 30FPS.
The system adopts a fast bottom way to identify the action according to the tracking results. For example, in the system of this paper, there is a small circle on each side of the screen, standing for turning button. When the center distance of the hand and the circle is smaller than a threshed, i.e. the hand seems to touch the button on the screen, the button is pressed down. System automatically cumulates time and exports a response per K frame. So that longer the hand touch the button, larger the steering angle is. The action will be ended when the hand leaves the button. In addition, system can be set not to cumulate time like altering a switch.
v. RESULT AND CONCLUSION
This paper tested the system and the original CAM-Shift tracking method respectively in laboratory. Fig. 5 shows the results picked-up every 10 frames.
The pictures in left row are tracking results of the system proposed by this paper, in middle are the input depth maps, and in right row are results of the original CAM-Shift algorithm. This examination proved the system in this paper can exactly track movement of player's hands in real-time
and make corresponding response (the small circle discoloration) according to the results. Besides, compared to the original CAM-Shift method, this system has higher stability and accuracy.
In order to quantitatively analyze the system tracking performance, this paper also tested two scenes, each includes 200 frames, as shown in TABLE I. According to the statistics, it can be found that the system can stably track hands when player is swinging arms normally or dramatically. But as to CAM-Shift algorithm just search a larger area than the result window of last frame, it is hard to locate the right target when the target is moving too fast or the result of last frame is incorrect.
TABLE! SYSTEM PERFORMANCE TABLE
Scene Error Rate Miss Rate I 2.5% 4 %
2 10% 5.5%
The system process speed is 37.7ms/frame, CPU occupancy rate is approximately 40%.
The contrast and analysis above prove that the game interaction system of treadmill can accurately track player's hands and respond in real-time in a relatively complicated environment. In addition, the action define interface is open to suit development of more video games. This system improved CAM-Shift tracking method based on depth map and has evidently higher quality.
REFERENCES
[I] Schliimer Thomas, Popponga Benjamin, Henze Niels, "Gesture Recognition with a Wii Controller," Proceedings of the 2m! international conference on Tangible and embedded interaction. New York, USA 2008,pp. 11-14
[2] Falstein N, "A Grand Unified Game Theory," Miller Freeman, 1999 Game Developers Conference Proceedings, San Fransisco, USA 1999. pp. 229-239
[3] Bradski G R, "Computer Vision Face Tracking for use in a Perceptual User Interface," Intel Technology Journal, 1998, 2nd Quarter
[4] Isard M and Blake A, "Condensation-conditional Density Propagation for Visual Tracking," International Journal on Computer Vision, 1998, 29(1) pp. 5-28
[5] Fukunaga K. and Hostetler L D., 'The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition.," IEEE Transactions on Information Theory, 1975, 21: pp. 32-40
[6] Cheng Yizong, "Mean Shift, Mode Seeking, and Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8): pp. 790-799
[7] Prime Sense LTD and Tel Aviv, "Integrated Processor for 3D Mapping," United States, US 2010/0007717 AI. 2010.
[8] Prime Sense LTD, "Method and System for Object Reconstruction," PCT, WO 2007/043036 A 1. 2007.
Figure 5. Tracking results
222