Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP...

7
Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti Dep. of Life, Health and Environmental Sciences University of L’Aquila Via Vetoio Coppito, 67100, L’Aquila, Italy Email: (danilo.avola,andrea.petracca)@univaq.it Email: (giuseppe.placidi,matteo.spezialetti)@univaq.it Luigi Cinque, Stefano Levialdi Dep. of Computer Science Sapienza University Via Salaria 113, 00198, Rome, Italy Email: (cinque,levialdi)@di.uniroma1.it Abstract Hand gesture interfaces provide an intuitive and natu- ral way for interacting with a wide range of applications. Nowadays, the development of these interfaces is supported by an increasing number of sensing devices which are able to track hand and finger movements. Despite this, the phys- ical and technical features of many of these devices make them unsuitable for the implementation of interfaces ori- ented to the everyday desktop applications. Conversely, the LEAP motion controller has been specifically designed to interact with these applications. Moreover, this latter device has been equipped with a hand skeletal model that provides tracking data with a high level of accuracy. This paper describes a novel approach to define and rec- ognize hand gestures. The proposed method adopts free- hand drawing recognition algorithms to interpret the track- ing data of the hand and finger movements. Although our approach is applicable to any hand skeletal model, the over- all features of that provided by the LEAP motion controller have driven us to use it as a reference model. Extensive preliminary tests have demonstrated the usefulness and the accuracy of the proposed method. Keywords: hand gesture definition, hand gesture recogni- tion, feature extraction, LEAP motion controller. 1. Introduction The diffusion of consumer sensing devices has pro- moted the development of novel Human-Computer Inter- faces (HCIs) able to track body and/or hand movements. These interfaces process the captured sensing information to provide body and/or hand models through which recog- nize poses, movements, and gestures. The accuracy of the model (and of the recognition process) depends on the spe- cific application. In fact, the interfaces designed for appli- cations in the field of rehabilitation require greater accuracy than those designed for entertainment. These interfaces can be classified as Natural User Interfaces (NUIs), or Haptic Interfaces (HIs). The term NUIs is referred to interfaces in which the interaction between human and machine is con- trolled by poses and movements of the body (and its parts) without using any tool or wearing any device. While, the term HIs is related to interfaces in which the interaction is controlled by signals emitted from sensors equipping a some kind of body suite and/or glove. Although HIs can be extremely accurate, their use is restricted to some special- ized fields (e.g., robotic, manufacturing) due to high costs, and hard customization. Moreover, different contexts (e.g., serious games) require that users can use these interfaces without physical constraints, cumbersome devices, or un- comfortable tools. In our context, we were interested in managing a hand skeletal model to interact (by one or two hands) with everyday desktop applications (e.g., data man- agement, icon browsing), for this reason we were focused in investigating the more suitable current NUI. A first ex- ample of NUI is represented by the early Computer Vision (CV) based Motion Capture (MoCap) systems which were equipped with one or more RGB cameras [22, 15]. Al- though these systems still prove their effectiveness and ef- ficiency, most of them can not be considered genuine NUIs since their tracking algorithms are based on visual expedi- ents (e.g., markers, coloured suites and gloves). The recent introduction of high-resolution and high-speed RGB cam- eras has supported the implementation of newer markerless CV based MoCap systems able to track even subtle move- ments of the hand articulation [17, 20]. Despite this, the use of these systems (early and newer) is not particularly suitable to interact with everyday desktop applications. In fact, as well known, these systems require more than a cam- era (i.e., view) to recognize and track a body and its hands. This last aspect introduces some hard technical issues, in- 1 260

Transcript of Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP...

Page 1: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

Markerless Hand Gesture Interface Based on LEAP Motion Controller

Danilo Avola, Andrea PetraccaGiuseppe Placidi, Matteo Spezialetti

Dep. of Life, Health and Environmental SciencesUniversity of L’Aquila

Via Vetoio Coppito, 67100, L’Aquila, ItalyEmail: (danilo.avola,andrea.petracca)@univaq.it

Email: (giuseppe.placidi,matteo.spezialetti)@univaq.it

Luigi Cinque, Stefano LevialdiDep. of Computer Science

Sapienza UniversityVia Salaria 113, 00198, Rome, Italy

Email: (cinque,levialdi)@di.uniroma1.it

Abstract

Hand gesture interfaces provide an intuitive and natu-ral way for interacting with a wide range of applications.Nowadays, the development of these interfaces is supportedby an increasing number of sensing devices which are ableto track hand and finger movements. Despite this, the phys-ical and technical features of many of these devices makethem unsuitable for the implementation of interfaces ori-ented to the everyday desktop applications. Conversely, theLEAP motion controller has been specifically designed tointeract with these applications. Moreover, this latter devicehas been equipped with a hand skeletal model that providestracking data with a high level of accuracy.

This paper describes a novel approach to define and rec-ognize hand gestures. The proposed method adopts free-hand drawing recognition algorithms to interpret the track-ing data of the hand and finger movements. Although ourapproach is applicable to any hand skeletal model, the over-all features of that provided by the LEAP motion controllerhave driven us to use it as a reference model. Extensivepreliminary tests have demonstrated the usefulness and theaccuracy of the proposed method.

Keywords: hand gesture definition, hand gesture recogni-tion, feature extraction, LEAP motion controller.

1. Introduction

The diffusion of consumer sensing devices has pro-moted the development of novel Human-Computer Inter-faces (HCIs) able to track body and/or hand movements.These interfaces process the captured sensing informationto provide body and/or hand models through which recog-nize poses, movements, and gestures. The accuracy of themodel (and of the recognition process) depends on the spe-

cific application. In fact, the interfaces designed for appli-cations in the field of rehabilitation require greater accuracythan those designed for entertainment. These interfaces canbe classified as Natural User Interfaces (NUIs), or HapticInterfaces (HIs). The term NUIs is referred to interfaces inwhich the interaction between human and machine is con-trolled by poses and movements of the body (and its parts)without using any tool or wearing any device. While, theterm HIs is related to interfaces in which the interactionis controlled by signals emitted from sensors equipping asome kind of body suite and/or glove. Although HIs can beextremely accurate, their use is restricted to some special-ized fields (e.g., robotic, manufacturing) due to high costs,and hard customization. Moreover, different contexts (e.g.,serious games) require that users can use these interfaceswithout physical constraints, cumbersome devices, or un-comfortable tools. In our context, we were interested inmanaging a hand skeletal model to interact (by one or twohands) with everyday desktop applications (e.g., data man-agement, icon browsing), for this reason we were focusedin investigating the more suitable current NUI. A first ex-ample of NUI is represented by the early Computer Vision(CV) based Motion Capture (MoCap) systems which wereequipped with one or more RGB cameras [22, 15]. Al-though these systems still prove their effectiveness and ef-ficiency, most of them can not be considered genuine NUIssince their tracking algorithms are based on visual expedi-ents (e.g., markers, coloured suites and gloves). The recentintroduction of high-resolution and high-speed RGB cam-eras has supported the implementation of newer markerlessCV based MoCap systems able to track even subtle move-ments of the hand articulation [17, 20]. Despite this, theuse of these systems (early and newer) is not particularlysuitable to interact with everyday desktop applications. Infact, as well known, these systems require more than a cam-era (i.e., view) to recognize and track a body and its hands.This last aspect introduces some hard technical issues, in-

1

260

Page 2: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

cluding calibration, synchronization, and data processing.The recent proliferation of consumer Time of Flight (ToF)and Structured Light (SL) range imaging cameras [7] hasallowed us to solve some of the above issues. In particular,these devices have allowed developers to implement NUIshaving a 3D perception of the observed scene by using asingle device. In fact, these cameras provide both a set ofRGB maps and a set of depth maps for each second cap-tured within their Field of View (FoV). Although this newgeneration of NUIs is profitably used in different applica-tion fields (such as: entrainment, rehabilitation, and move-ment analysis), some practical (e.g., size, shape) and tech-nical (e.g., resolvable depth, depth stream) aspects do notpromote their use in interacting with the mentioned class ofapplications. The LEAP motion controller [13] is a latestNUI that allows us the implementation of advanced handgesture recognition systems, its physical and technical fea-tures make it an ideal tool to interact with any kind of desk-top application. In fact, the LEAP motion controller is alight and tiny device that has been designed to be placedon a desk. Its working area is defined by the 8 cubic feet ofspace above itself, the device has been cleverly conceived totrack palm and finger movements since this is the commonhand pose of a user while interacting with a desktop appli-cation. The new version of the LEAP motion API (version2.0, [14]) introduces a new hand skeletal model that pro-vides additional information about hands and fingers andalso improves overall tracking data. Their model allows thedevice to predict the positions of fingers and hands that arenot clearly in view, the hands can often cross over each otherand still be tracked. Despite this, the LEAP motion con-troller remains a device having a single viewpoint, this im-plies that occlusions (or inaccurate evaluations) can occurwhen users perform complex hand poses or subtle motions,especially those involving non-extended fingers.

This paper describes a novel approach to define and rec-ognize hand gestures. The skeletal model of the LEAP mo-tion controller provides tracking data in which palm and fin-ger movements are expressed by a set of 3D spatial infor-mation (i.e., space (x,y,z)) over the time (i.e., time t). Ourmain idea has been to project these 3D spatial and temporalinformation within their 2D reference planes (i.e., planes(x,y), (x,z), (y,z), over time t). In this way, any 3D handgesture can be interpreted through the analysis of its projec-tions on the related 2D reference planes. Each projectioncan be seen as a freehand drawing whose features can beextracted through algorithms belonging to the sketch recog-nition field [11, 4]. This paper is focused in showing theapproach independently from the specific device, for thisreason here we are not interested in occlusion resolution.

The rest of the paper is structured as follows. Section 2discusses basic background about freehand drawing recog-nition. Section 3 presents an overview of the framework,

and introduces the preliminary tests. Finally, Section 4 con-cludes the paper and shows future directions.

2. Background

In this paper we present an ongoing research work, thisimplies that in this section we are not interested in com-paring the proposed approach with others, our intent is toprovide a survey of the works that more than others havecontributed to define our method. The freehand drawingprocessing faces different issues including shape recogni-tion and style identification. The first one regards the abilityof a system in distinguishing a set of hand-drawn symbolsconstituting a graphical library. The second one concernsthe ability of the above system in recognising each hand-drawn symbol independently of the style (e.g., bold, solid)used by users in tracing it. Both issues can be addressedby studying a set of mathematical features (i.e., feature vec-tor) able to characterize every hand-drawn symbol, its style,including a certain degree of perturbation.

2.1. Freehand Drawing Processing

A first remarkable work that has supported the imple-mentation of our feature vector is presented in [9]. In thispaper the authors propose a robust and extensible approachto recognize a wide range of hand-drawn 2D graphical sym-bols. Their method recognizes symbols independently ofsizes, rotations, and styles. Their feature vector is basedon the computation of the convex hull and of three specialpolygons from it derived: largest triangle, largest quadri-lateral, and enclosing rectangle. Different works have in-herited the above feature vector to customize and extendthe library of symbols. Among others, the works presentedin [3] and [2] provide an interesting viewpoint to general-ize the definition and recognition of any set of hand-drawn2D graphical symbols. The first work describes the Featurecalculation Bid Decision (FcBD) system, which adopts anagent-based architecture to introduce new symbols inside adefined library. Their system implements some new mea-sures and adds them to the feature vector presented in [9].Moreover, it provides a practical strategy to manage con-flicts that occur when two similar symbols are introduced.The second work can be considered an application of thatjust described. In this case, the authors have adopted theFcBD system to define a fixed library of symbols, thenthey have implemented a simple CV based MoCap sys-tem equipped with a single RGB camera to recognize them.The users could perform hand gestures by using a suitabletool or a coloured glove. This approach can be consideredan early version of that we propose in the present paper,where the use of the RGB camera reduces the efforts to a

261

Page 3: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

Fig. 1. The framework design is composed by three layers: Data Pre-Processing Layer (DPP-L),Feature Extraction and Recognition Layer (FER-L), and Definition and Storage Layer (DS-L).

single reference plane (i.e., (x,y)). An agent-based archi-tecture has been also designed in [8], where the authorsdefine a framework for interpreting hand-drawn symbolsin a context-driven fashion, exploiting heterogeneous tech-niques for the recognition of each symbol. Their frameworkhas been adopted to derive AgentSketch, a multi-domainsketch recognition system able to work in on-line and off-line mode. As shown by the just introduced works, manyauthors are focused on conceiving freehand drawing recog-nition systems to identify more than a single set of graphi-cal symbols. In fact, the implementation of a recognizer toidentify a specific set of symbols is a time-consuming oper-ation. Often, the introduction of a single new symbol mayrequire the whole re-implementation of the system. An-other work that we have analysed is shown in [1], wherethe authors presents SketchREAD, a multi-domain sketchrecognition engine capable of recognizing hand-drawn di-agrammatic sketches. Their system is based on a suitabledescription of the shapes according to the related domain,moreover SketchREAD does not require of training data orre-implementation processes. Two last meaningful worksare described in [10] and [5]. The first introduces LADDERwhich has been the first language to describe how sketcheddiagrams in a domain are drawn, displayed, and edited.The last details a framework through which users can de-fine every set of hand-drawn 2D symbols. The symbols aredefined and recognized by using a novel Sketch ModelingLanguage: SketchML. The proposed framework adopts theSketchML to formalize and manage the gestures.

3. The Framework Architecture

As shown in Figure 1, this section describes the frame-work architecture that implements the proposed approach todefine and recognize hand gestures. Although the methodis applicable to any hand skeletal model, that provided bythe LEAP motion controller has addressed our requirementssince its real time tracking data has a high level of accu-racy. The rest of the section is structured as follows. Afirst sub-section will present main details about the deviceand hand skeletal model. The second, third, and fourth sub-sections will explain each one of the three pipeline layersthat compose the architecture, respectively. Finally, a lastsub-section will discuss the preliminary tests. Overall, theframework process the hand gestures to extract the infor-mation of the trajectory of palm and fingertip movements.These information are treated as graphical objects and sub-sequently interpreted by means of freehand drawing algo-rithms. This interpretation provides an univocal classifica-tion of each gesture. Note that, in this phase we are inter-ested in recognizing a set of separated gestures, so we havenot introduced mechanisms to distinguish the beginning andend of gestures. Moreover, even the issue of disambiguationof similar gestures is left to future extension of the presentwork. Despite this, it should be observed that a first level ofdisambiguation is guaranteed by the same algorithms thatperform the freehand drawing recognition, since also in thatcase it is important the ability of a system in distinguishingbetween two similar shapes (e.g., ellipse and circle).

262

Page 4: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

The first layer, Data Pre-Processing Layer (DPP-L),takes as input each frame generated by the LEAP motioncontroller, and extracts the 3D spatial (i.e., (x,y,z)) and tem-poral (i.e., t) information of the hand model. Subsequently,these 3D spatial information are projected inside their 2Dreference planes ((x,y), (x,z), and (y,z)) and the same time(t) is associated to each plane. The second layer, FeatureExtraction and Recognition Layer (FER-L), takes as inputthe 2D spatial and temporal information of each plane, andadopts, on each of them, a freehand drawing recognitionalgorithm. The purpose of this algorithm is to provide asresult three shapes (one for each plane) according to a de-fault set of shapes stored within a repository. The combinedinterpretation of these shapes provides a classification of thehand gesture. The last layer, Definition and Storage Layer(DS-L), is responsible for the definition and storage of theshapes contained within the repository. Actually, the work-ing of the whole framework starts from this layer, since in afirst phase one or more libraries of symbols have to be cre-ated. When a user builds a library, each introduced symbolis processed by an approach similar to that used to recog-nize it, where the three computed shapes are related to eachother and stored within the repository as identifying featuresof the performed gesture.

3.1. LEAP Motion Controller

The LEAP motion controller is a device equipped withthree IR emitters and two CCD cameras that has been de-signed to support hand gesture based interfaces having ahigh accuracy in detecting hand and fingertip position. Al-though the raw data is currently not conventionally acces-sible, the provided hand model incorporates a rich set ofinformation [21]. In particular, each frame is connected toa complex set of data structures and methods of which wereport a main subset:

- Frame Data: FrameID, Timestamp, Hands, Fingers,Tools, and Gestures;

- Hand Data: HandID, Direction, Palm normal, Palmposition, Palm velocity, Sphere center, Sphere radius,Translation, Rotation axis, Rotation angle, and FingersIDs;

- Finger and Tool Data: PointableID, Belongs to (handor tool), Classified as (finger or tool), Length, Width,Direction, Tip position, and Tip velocity;

- Gesture Data: containing spatial and temporal in-formation about a fixed set of hand gestures: circles,swipes, key taps, and screen taps.

The LEAP motion controller employs a right-handedCartesian coordinate system with origin centered at the topof the device. All the distances are computed in millime-tres, the time in seconds (or microseconds), and the angles

in radians. The frame data contains quantitative informationabout the current frame, including how many hands and fin-gers have been detected. The device also recognizes toolsas pencils or pens which can be used to interact with ap-plications. The frame data also reports if a hand movementbelongs to a default gesture (e.g., swipes), in fact the cur-rent version of the hand model provides a minimal set ofgestures which are not editable or expandable. We haveadopted all these information to set up the starting state ofour data structure which is initialized when at least a handor a tool are detected. The hand data contains the physicalcharacterization of a detected hand. It reports the directionfrom the palm position toward the fingers (direction), thenormal vector to the palm (palm normal), the center positionof the palm from the device (palm position), and the rate ofchange of the palm position (palm velocity). In addition, itprovides the center and the radius of a virtual sphere fit tothe curvature of the hand (sphere center and sphere radius,respectively). Finally, the hand data also reports the changeof position of a hand between the current frame and a spec-ified frame (translation), the axis of rotation derived fromthe change in orientation of the hand between the currentframe and a specified frame (rotation axis), and the angle ofrotation around the specified axis derived from the changein orientation of this hand between the current frame and aspecified frame. The finger and tool data reports the lengthand width of each finger or tool detected (i.e., object projec-tion on the (x,z) plane). In addition, it contains the directionin which each finger or tool is pointing (direction), the tipposition from the device origin (tip position), and the rate ofchange of the tip position (tip velocity). Hands and relatedfingers are connected each other by a simple identification(IDs) mechanism. The palm and fingertip tracking data rep-resent the core information of the proposed definition andrecognition algorithm. Finally, we did not have enabled thegesture recognition engine of the device (disabled for de-fault) since we did not need of those information.

3.2. Data Pre-Processing Layer

The Data Pre-Processing Layer (DPP-L) is composedof two sub-modules: the 4D Data Structure Manager (4DDSM), and the 4D Data Structure Processing (4D DSP).The first sub-module takes as input the stream of framescoming from the device, and seeks information about hands(i.e., palm and fingertips) or tools. When al least a handor a tool is found the sub-module creates and initializes ourdata structure. Hereinafter we will not treat the interactionwith a tool, since this case is similar to one of a hand witha single extended finger. The second sub-module takes asinput the 3D spatial and temporal information of the palmand fingertips, and projects each of them within of three re-lated 3D reference planes. In order to explain the working

263

Page 5: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

Fig. 2. An example of gesture performed by using a hand and a single finger: (a) its representationwithin the 3D space, (x,y,z), over the time, (t), (b) its projections on the 2D reference planes, (x,y),(x,z), and (y,z), over the time, (t).

of the DPP-L, in this section we describe the processing ofa gesture performed by a hand with a single extended fin-ger. The generalization of the method is due to the fact thatpalm and fingers are treated separately during the recogni-tion process, and only in the final step all the recognition in-formation are correlated to interpret the gesture. As shownin Figure 2a, when a user performs the above gesture, the4D DSM computes the information contained within the setof frames, and supports the definition of the stroke σ:

σ = β[(x, y, z), t] (1)

Where, β represents a function of the 3D spatial coordinates((x,y,z), derived by fingertip position data) over the time (t,derived by fingertip velocity data). In particular, ts and terepresent the start and the end times of the gesture, respec-tively. Subsequently, the stroke is supplied to the 4D DSPwhich, as shown in Figure 2b, derives the three projectedstrokes according to their reference planes:

σ(x,y) = β1[(x, y), t] (2)

σ(x,z) = β2[(x, z), t] (3)σ(y,z) = β3[(y, z), t] (4)

Where, βi (i = 1,..,3) represents a function of the 2D spatialcoordinates (planes (x,y), (x,z), and (y,z), respectively) overthe time (t, same time for each one). The just introduced ex-ample describes the simplest case in which the single indexfinger is extended in a stationary pose. Note that, in case ofthe whole hand the framework is initialized when at least aopen hand is centered over the device with all five extendedfingers. Moreover, here occlusions are not treated.

3.3. Feature Extraction and Recognition Layer

The Feature Extraction and Recognition Layer (FER-L)is composed of two sub-modules: the Feature ExtractionModule (FE-M), and the Hand Gesture Recognition Module(HGR-M). The first sub-module takes as input the three pro-jected strokes associated to the center position of the palm(i.e., a group of three correlated strokes), and all five fin-gertips (i.e., five groups each one having three correlatedstrokes). Note that, our notion of hand gesture is based ontracking data of palm and fingers. Although the gesture am-biguity resolution is not a focus of the present paper, prelim-inary empirical observations have highlighted that the track-ing of the palm (in addition to that of the fingers) representsa supplementary feature to univocally define and recognisea gesture. The above sub-module analyses each projectedstroke, of each group, to identify the type of the shape thathas been indirectly “drawn”. In this way, the movementof each fingertip, as well as that of the center of the palm,can be identified by three correlated shapes. The final stepis to consider each group of three shapes with the othersto provide an univocal interpretation of the hand gesture.Our freehand drawing recognition algorithm is based on thesame feature vector shown in [9] and extended in [3], belowwe report the main geometrical measures through which thefeature vector has been implemented:

- original measures: convex hull, largest triangle,largest quadrilateral, and enclosing rectangle;

- extended measures: angle ratio, and sketch perimeter.

264

Page 6: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

Fig. 3. Shape library supporting the freehand drawing definition and recognition algorithm: (a) closedand open shapes, (b) recognition of a stroke, on the plane (x,y), composed of three shapes: a closedshape (1), and two open shapes (2 and 3).

The above feature vector is able to recognize a fixed setof closed and open shapes. Although the format of the cur-rent paper does not allow us a complete dissertation of theused freehand drawing recognition algorithm, in Figure 3awe report the vectorial representation of the main shapesthat it recognizes. Note that, the imported feature vector isable to recognize shapes independently of their sizes, ro-tations, and styles. This means that the proposed frame-work is robust enough in interpreting univocally the samehand gesture performed by users having different “tracing”styles. As shown in Figure 3b, each projected stroke can becomposed of more than a shape. This last aspect has beensolved by adopting an algorithm able to detect how manyclosed and open “regions” compose a single stroke [5]. Sub-sequently, the same algorithm compares each “region” withthe shape library to identify the set of shapes that composethe stroke. From the work presented in [5] we have also in-herited the SketckML, a structured language to describe, byconstructs, the shapes, their features, and their constraints.In this way, we have introduced within the framework a suit-able tool to manage each open and closed shape. As a result,the SketchML can be seen as the language to represent eachtreated hand gesture. When a user performs a hand ges-ture above the LEAP motion controller, each fingertip andthe center of the palm are represented by a set of shapes.These shapes are described by the SketchML language. Thesecond sub-module takes as input these descriptions, andmatches them with all ones previously acquired during thehand gesture editing phase.

3.4. Definition and Storage Layer

The Definition and Storage Layer (DS-L) includes theHand Gesture Definition Editor (HGD-E), which allows

users to define any hand gesture library. When the frame-work is in editing mode, a user can perform a hand gestureand a related feature vector is created. The framework al-lows users to repeat the gesture several time to obtain a morereliable vector. However, we have implemented a parameter(i.e., a mathematical norm) to set the accuracy of the frame-work. This means that when the difference between twoconsecutive feature vectors is lower than the norm, the handgesture is accepted and stored within the gesture library.

3.5. Preliminary Tests

In this paper we present an ongoing research work, forthis reason the preliminary tests have been summarized asa set of technical qualitative observations aimed in confirm-ing the usefulness of the proposed approach. First of all,the idea to transform the 3D spatial and temporal informa-tion into a set of 2D projected strokes is simple and affec-tive. The use of a freehand drawing recognition algorithmto classify the projected strokes works properly. We haveinitially tested the framework by performing (on differentreference planes) the set of 2D graphical symbols inheritedfrom the works that have provided the feature vector (see [9]and [3]). Subsequently, we have performed a set of com-bined hand gestures (as that shown in Figure 2a). In bothcases the framework has stored on the database a suitableset of SketchML descriptions able to identify univocally thehand gestures. Also in this case the paper format does notallow us an exhaustive explication of the experiments, be-sides in this phase we were not interested in designing apractical hand gesture library for a specific application, outintent has been to check the work of the method and of theframework. However, the limit of the approach is the set ofopen and closed shapes which could be insufficient to rec-

265

Page 7: Markerless Hand Gesture Interface Based on LEAP …Markerless Hand Gesture Interface Based on LEAP Motion Controller Danilo Avola, Andrea Petracca Giuseppe Placidi, Matteo Spezialetti

ognize each projected stroke. To overcome this aspect weare working on a novel freehand gesture recognition algo-rithm that will continue the work begun in [5].

4. Conclusions and Future Work

The LEAP motion controller is a powerful device to de-velop hand gesture interfaces. Its physical and technicalfeatures make it an ideal tool to interact with any kind ofdesktop application. The new version of the LEAP motionAPI equips the device with a hand skeletal model that pro-vides tracking data with a high level of accuracy. Finally,their model allows the device to predict the position of fin-gers and hands even if they are partially occluded.

This paper describes an ongoing research work to de-fine and recognize hand gestures. The proposed methodadopts freehand drawing recognition algorithms to interpretthe center of the palm and fingertip movements. Althoughthe approach is applicable to any hand skeletal model, thatprovided by the LEAP motion controller satisfies each tech-nical requirement. Preliminary tests have highlighted thatthe method identifies univocally hand gestures.

Currently, we are working on different application fields.In the first one, we are adopting the proposed method to in-terpret the body and arm model provided from other devices(Microsoft Kinect [12] and MYO [16], respectively). In thesecond one, we are defining novel body and hand models tosupport Self-Avatars (SAs) [6], and Virtual Gloves (VGs)based applications [18, 19]. Finally, we are exploring thepossibility to use multiple devices to obtain a reliable skele-tal hand model without prediction mechanisms.

References

[1] C. Alvarado and R. Davis. Sketchread: A multi-domainsketch recognition engine. In Proceedings of the 17th An-nual ACM Symposium on User Interface Software and Tech-nology, UIST ’04, pages 23–32, New York, NY, USA, 2004.ACM.

[2] D. Avola, P. Bottoni, A. Dafinei, and A. Labella. Color-based recognition of gesture-traced 2d symbols. In DMS,pages 5–6, 2011.

[3] D. Avola, P. Bottoni, A. Dafinei, and A. Labella. Fcbd: Anagent-based architecture to support sketch recognition inter-faces. In DMS, pages 295–300, 2011.

[4] D. Avola, L. Cinque, and G. Placidi. Sketchspore: A sketchbased domain separation and recognition system for interac-tive interfaces. In A. Petrosino, editor, Image Analysis andProcessing ICIAP 2013, volume 8157 of Lecture Notes inComputer Science, pages 181–190. Springer Berlin Heidel-berg, 2013.

[5] D. Avola, A. Del Buono, G. Gianforme, S. Paolozzi, andR. Wang. Sketchml a representation language for novelsketch recognition approach. In Proceedings of the 2nd In-

ternational Conference on PErvasive Technologies Relatedto Assistive Environments, PETRA ’09, pages 31:1–31:8,NY, USA, 2009. ACM.

[6] D. Avola, M. Spezialetti, and G. Placidi. Design of an effi-cient framework for fast prototyping of customized human-computer interfaces and virtual environments for rehabili-tation. Comput. Methods Prog. Biomed., 110(3):490–502,June 2013.

[7] S. Berman and H. Stern. Sensors for gesture recognition sys-tems. Systems, Man, and Cybernetics, Part C: Applicationsand Reviews, IEEE Transactions on, 42(3):277–290, May2012.

[8] G. Casella, V. Deufemia, V. Mascardi, G. Costagliola, andM. Martelli. An agent-based framework for sketched symbolinterpretation. J. Vis. Lang. Comput., 19(2):225–257, Apr.2008.

[9] M. J. Fonseca and J. A. Jorge. Experimental evaluationof an on-line scribble recognizer. Pattern Recognition Let-ters, 22(12):1311 – 1319, 2001. Selected Papers fromthe 11th Portuguese Conference on Pattern Recognition -{RECPAD2000}.

[10] T. Hammond and R. Davis. Ladder: A language to describedrawing, display, and editing in sketch recognition. In ACMSIGGRAPH 2006 Courses, SIGGRAPH ’06, New York, NY,USA, 2006. ACM.

[11] L. B. Kara and T. F. Stahovich. An image-based, trainablesymbol recognizer for hand-drawn sketches. Computers &Graphics., 29(4):501–517, Aug. 2005.

[12] Kinect. http://www.xbox.com/it-it/kinect, 2014.[13] LEAPMotion. https://www.leapmotion.com/, 2014.[14] LEAPMotionAPI. https://developer.leapmotion.com/, 2014.[15] T. B. Moeslund, A. Hilton, and V. Kruger. A survey of ad-

vances in vision-based human motion capture and analysis.Computer Vision and Image Understanding, 104(2):90–126,2006.

[16] MYO. https://www.thalmic.com/en/myo/, 2014.[17] I. Oikonomidis, N. Kyriazis, and A. Argyros. Markerless

and efficient 26-dof hand pose recovery. In R. Kimmel,R. Klette, and A. Sugimoto, editors, Computer Vision ACCV2010, volume 6494 of Lecture Notes in Computer Science,pages 744–757. Springer Berlin Heidelberg, 2011.

[18] G. Placidi. A smart virtual glove for the hand telerehabilita-tion. Comput. Biol. Med., 37(8):1100–1107, Aug. 2007.

[19] G. Placidi, D. Avola, D. Iacoviello, and L. Cinque. Overalldesign and implementation of the virtual glove. Comp. Biol.Med., 43(11):1927–1940, Nov. 2013.

[20] D. Tang, T.-H. Yu, and T.-K. Kim. Real-time articulatedhand pose estimation using semi-supervised transductive re-gression forests. Comp. Vis., IEEE International Conferenceon, 0:3224–3231, 2013.

[21] F. Weichert, D. Bachmann, B. Rudak, and D. Fisseler. Anal-ysis of the accuracy and robustness of the leap motion con-troller. Sensors, 13(5):6380–6393, 2013.

[22] Y. Wu and T. S. Huang. Vision-based gesture recogni-tion: A review. In Proceedings of the International GestureWorkshop on Gesture-Based Communication in Human-Computer Interaction, GW ’99, pages 103–115, London,UK, UK, 1999. Springer-Verlag.

266