Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch...

8
Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers 1 , Lode Hoste 2 , Beat Signer 2 and Wolfgang De Meuter 1 1 Software Languages Lab 2 Web & Information Systems Engineering Lab Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussels, Belgium fcfscholl,lhoste,bsigner,wdmeuterg@vub.ac.be ABSTRACT Over the past few years, multi-touch user interfaces emerged from research prototypes into mass market products. This evolution has been mainly driven by innovative devices such as Apple’s iPhone or Microsoft’s Surface tabletop computer. Unfortunately, there seems to be a lack of software engineer- ing abstractions in existing multi-touch development frame- works. Many multi-touch applications are based on hard- coded procedural low level event processing. This leads to proprietary solutions with a lack of gesture extensibility and cross-application reusability. We present Midas, a declar- ative model for the definition and detection of multi-touch gestures where gestures are expressed via logical rules over a set of input facts. We highlight how our rule-based language approach leads to improvements in gesture extensibility and reusability. Last but not least, we introduce JMidas, an in- stantiation of Midas for the Java programming language and describe how JMidas has been applied to implement a num- ber of innovative multi-touch gestures. Author Keywords multi-touch interaction, gesture framework, rule language, declarative programming ACM Classification Keywords D.2.11 Software Engineering: Software Architectures; H.5.2 Information Interfaces and Presentation: User Interfaces General Terms Algorithms, Languages INTRODUCTION More than 20 years after the original discussion of touch- screen based interfaces for human-computer interaction [9] and the realisation of the first multi-touch screen at Bell Labs in 1984, multi-touch interfaces have emerged from re- search prototypes into mass market products. Commercial solutions, including Apple’s iPhone or Microsoft’s Surface Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. TEI’11, January 22–26, 2011, Funchal, Portugal. Copyright 2011 ACM 978-1-4503-0478-8/11/01...$10.00. tabletop computer, introduced multi-touch user interfaces to a broader audience. Various manufacturers currently follow these early adopters by offering multi-touch screen-based user interfaces for their latest mobile devices. There is not only an increased use of multi-touch gestures on touch sen- sitive screens but also based on other input devices such as laptop touchpads. Some multi-touch input solutions are even offered as separate products like in the case of Apple’s Magic Trackpad 1 . Furthermore, large multi-touch surfaces, as seen in Single Display Groupware (SDG) solutions [14], provide new forms of copresent interactions. While multi-touch interfaces offer significant potential for an enhanced user experience, the application developer has to deal with an increased complexity in realising these new types of user interfaces. A major challenge is the recognition of different multi-touch gestures based on continuous input data streams. The intrinsic concurrent behaviour of multi- touch gestures and the scattered information from multiple gestures within a single input stream results in a complex de- tection process. Even the recognition of simple multi-touch gestures demands for a significant amount of work when us- ing traditional programming languages. Furthermore, the reasoning over gestures from multiple users significantly in- creases the complexity. Therefore, we need a clear separa- tion of concerns between the multi-touch application devel- oper and the designer of new multi-touch gestures to be used within these applications. The gesture designer must be sup- ported by a set of software engineering abstractions that go beyond simple low level input device event handling. In software engineering, a problem can be divided into its accidental and essential complexity [1]. Accidental com- plexity relates to the difficulties a programmer faces due to the choice of software engineering tools. It can be reduced by selecting or developing better tools. On the other hand, essential complexity is caused by the characteristics of the problem to be solved and cannot be reduced. While the accidental complexity of today’s mainstream applications is addressed by the use of high-level programming languages such as Java or C#, we have not witnessed the same software engineering support for the development of multi-touch ap- plications. In this paper, we present novel declarative pro- gramming language constructs in order to tackle the acci- dental complexity of developing multi-touch gestures and to enable a developer to focus on the essential complexity. 1 http://www.apple.com/magictrackpad/

Transcript of Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch...

Page 1: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

Midas: A Declarative Multi-Touch Interaction Framework

Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De Meuter1

1Software Languages Lab2Web & Information Systems Engineering Lab

Vrije Universiteit BrusselPleinlaan 2, 1050 Brussels, Belgium

{cfscholl,lhoste,bsigner,wdmeuter}@vub.ac.be

ABSTRACTOver the past few years, multi-touch user interfaces emergedfrom research prototypes into mass market products. Thisevolution has been mainly driven by innovative devices suchas Apple’s iPhone or Microsoft’s Surface tabletop computer.Unfortunately, there seems to be a lack of software engineer-ing abstractions in existing multi-touch development frame-works. Many multi-touch applications are based on hard-coded procedural low level event processing. This leads toproprietary solutions with a lack of gesture extensibility andcross-application reusability. We present Midas, a declar-ative model for the definition and detection of multi-touchgestures where gestures are expressed via logical rules over aset of input facts. We highlight how our rule-based languageapproach leads to improvements in gesture extensibility andreusability. Last but not least, we introduce JMidas, an in-stantiation of Midas for the Java programming language anddescribe how JMidas has been applied to implement a num-ber of innovative multi-touch gestures.

Author Keywordsmulti-touch interaction, gesture framework, rule language,declarative programming

ACM Classification KeywordsD.2.11 Software Engineering: Software Architectures; H.5.2Information Interfaces and Presentation: User Interfaces

General TermsAlgorithms, Languages

INTRODUCTIONMore than 20 years after the original discussion of touch-screen based interfaces for human-computer interaction [9]and the realisation of the first multi-touch screen at BellLabs in 1984, multi-touch interfaces have emerged from re-search prototypes into mass market products. Commercialsolutions, including Apple’s iPhone or Microsoft’s Surface

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.TEI’11, January 22–26, 2011, Funchal, Portugal.Copyright 2011 ACM 978-1-4503-0478-8/11/01...$10.00.

tabletop computer, introduced multi-touch user interfaces toa broader audience. Various manufacturers currently followthese early adopters by offering multi-touch screen-baseduser interfaces for their latest mobile devices. There is notonly an increased use of multi-touch gestures on touch sen-sitive screens but also based on other input devices suchas laptop touchpads. Some multi-touch input solutions areeven offered as separate products like in the case of Apple’sMagic Trackpad1. Furthermore, large multi-touch surfaces,as seen in Single Display Groupware (SDG) solutions [14],provide new forms of copresent interactions.

While multi-touch interfaces offer significant potential foran enhanced user experience, the application developer hasto deal with an increased complexity in realising these newtypes of user interfaces. A major challenge is the recognitionof different multi-touch gestures based on continuous inputdata streams. The intrinsic concurrent behaviour of multi-touch gestures and the scattered information from multiplegestures within a single input stream results in a complex de-tection process. Even the recognition of simple multi-touchgestures demands for a significant amount of work when us-ing traditional programming languages. Furthermore, thereasoning over gestures from multiple users significantly in-creases the complexity. Therefore, we need a clear separa-tion of concerns between the multi-touch application devel-oper and the designer of new multi-touch gestures to be usedwithin these applications. The gesture designer must be sup-ported by a set of software engineering abstractions that gobeyond simple low level input device event handling.

In software engineering, a problem can be divided into itsaccidental and essential complexity [1]. Accidental com-plexity relates to the difficulties a programmer faces due tothe choice of software engineering tools. It can be reducedby selecting or developing better tools. On the other hand,essential complexity is caused by the characteristics of theproblem to be solved and cannot be reduced. While theaccidental complexity of today’s mainstream applications isaddressed by the use of high-level programming languagessuch as Java or C#, we have not witnessed the same softwareengineering support for the development of multi-touch ap-plications. In this paper, we present novel declarative pro-gramming language constructs in order to tackle the acci-dental complexity of developing multi-touch gestures and toenable a developer to focus on the essential complexity.

1http://www.apple.com/magictrackpad/

Page 2: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

We start with a discussion of related work and present therequired software engineering abstractions for multi-touchframeworks. We then introduce Midas, our three-layeredmulti-touch architecture. After describing the implementa-tion of JMidas, a Midas instantiation for the Java program-ming language, we outline a set of multi-touch applicationprototypes that have been realised based on JMidas. A criti-cal discussion of the presented approach and future work isfollowed by some general conclusions.

RELATED WORKRecently, different multi-touch toolkits and frameworks havebeen developed in order to help programmers with the de-tection of gestures from a continuous stream of events pro-duced by multi-touch hardware [7]. The frameworks that weare going to discuss in this section provide some basic soft-ware abstractions for recognising a fixed set of traditionalmulti-touch gestures, but most of them do not support thedefinition of new application-specific multi-touch gestures.

The Sparsh UI framework [11] is an open source multi-touchlibrary that supports a set of built-in gestures including hold,drag, multi point drag, zoom, rotate, spin (two fingers holdand one drags) as well as double tap. The implementationof gestures based on hard-coded mathematical expressionslimits the reusability of existing gestures. The frameworkonly provides limited support to reason about the history ofevents. Sparsh UI provides historical data for each finger onthe touch-sensitive surface by keeping track of events. Assoon as a finger is lifted from the surface, the history relatedto the specific finger is deleted. This makes it difficult to im-plement multi-stroke gestures but on the other hand avoidsany garbage collection issues. Furthermore, Sparsh UI doesnot deal with the resolution of conflicting gestures. Overall,Sparsh UI is one of the more complete multi-touch frame-works providing some basic software abstractions but offerslimited support for multi-stroke and multi-user gestures.

Multi-touch for Java (MT4j)2 is an open source Java frame-work for the rapid development of visually rich applicationscurrently supporting tap, drag, rotate as well as zoom ges-tures. The architecture and implementation of MT4j is sim-ilar to Sparsh UI with two major differences: MT4j offersthe functionality to define priorities among gestures but onthe other hand it does not provide historical data. When-ever an event is retrieved via the TUIO protocol [6], mul-tiple subscribed gesture implementations, called processors,try to lock the event for further processing. The idea of thepriority mechanism is to assign a numeric value to each ges-ture. Gesture processors with a lower priority are blockeduntil processors with a higher priority have tried (and failed)to consume the event. With such an instantaneous prioritymechanism, a processor has to decide immediately whetheran individual event should be consumed or released. How-ever, many gestures can only be detected after keeping trackof multiple events, which limits the usability of the prioritymechanism. Finally, the reuse of gesture detection function-ality is lacking from the architectural design.

2http://mt4j.org

Grafiti [10] is a gesture recognition management frameworkfor interactive tabletop interfaces providing similar abstrac-tions as Sparsh UI. It is written in C# and subscribes to aTUIO input stream for any communication with differenthardware devices. An automated mapping of multiple liststo multiple fingers is also not available and there are no con-structs to deal with multiple users. Therefore, permutationsof multi-touch input have to be performed manually which iscomputationally intensive and any reusability for compositegestures is lacking. Furthermore, the static time and spacevalues are limiting the dynamic environment of multi-touchdevices. The framework allows gestures to be registeredand unregistered at runtime. In Grafiti, conflict resolutionis based on instantaneous reasoning and there is no notion ofuncertainty. The offered priority mechanism is similar to theone in MT4j where events are consumed by gestures and areno longer available for gestures with a lower priority.

The libTISCH [2] multi-touch library currently offers a setof fixed gestures including drag, tap, zoom and rotate. Theframework maintains the state and history of events that areperformed within a widget. This allows the developer toreason about event lists instead of individual events. How-ever, multi-stroke gestures are not supported and an auto-matic mapping of lists to fingers is not available. Incomingevents are linked to the topmost component based on theirx and y coordinates. Local gesture detection is associatedwith a single widget element for the total duration of a ges-ture. A system-wide gesture detection is further supportedvia global gestures. Global gestures are acceptable whenworking on small screens like a phone screen, but these ap-proaches cease to work when multiple users are performingcollaborative gestures. The combination of local and globalgestures is not supported and gestures outside the boundariesof a widget require complex ad-hoc program code.

Commercial user interface frameworks are also introducingmulti-touch software abstractions. The Qt3 cross-platformlibrary provides gestures such as drag, zoom, swipe (in fourdirections), tap as well as tap-and-hold. To add new ges-tures, one has to create a new class and inherit from theQGestureRecognizer class. Incoming events are thenfed to that class one by one as if it would have been di-rectly attached to the hardware API. Also the Microsoft .NETFramework4 offers multi-touch support since version 4.0. Asimple event handler is provided together with traditionalgestures such as tap, drag, zoom and rotate. In addition,Microsoft implemented a two-finger tap, a press-and-holdand a two-finger scroll. However, there is no support for im-plementing more complex customised gestures.

Gestures can also be recognised by comparing specific fea-tures of a given input to the features of previously recordedgesture samples. These so-called template-based matchingsolutions make use of different matching algorithms includ-ing Rubine [12], Dynamic Time Warping (DTW), neuralnetworks or hidden Markov models. Most template-based

3http://qt.nokia.com/4http://msdn.microsoft.com/en-us/library/dd940543(VS.85).aspx

Page 3: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

gesture recognition solutions perform an offline gesture de-tection which means that the effect of the user input will onlybe visible after the complete gesture has been performed.Therefore, template-based approaches are not suitable for anumber of multi-touch gestures (e.g. pinching). Some ges-ture recognition frameworks, such as iGesture [13], supporttemplate-based algorithms as well as algorithms relying ona declarative gesture description. However, these solutionscurrently offer no or only limited support for the continuousonline processing of multi-touch gestures.

While the presented frameworks and toolkits provide spe-cific multi-touch gesture recognition functionality that canbe used by an application developer, most of them show alack of flexibility from a software engineering point of view.In the following, we introduce the necessary software engi-neering abstractions that are going to be addressed by ourMidas multi-touch interaction framework.

Modularisation Many existing multi-touch approaches donot modularise the implementation of gestures. Therefore,the implementation of an additional gesture requires a deepknowledge about already implemented gestures. This is aclear violation of the separation of concerns principle, oneof the main principles in software engineering which dictatesthat different modules of code should have as little overlap-ping functionality as possible.

Composition It should be possible to easily compose ges-tures in order to define more complex gestures. For exam-ple, a scroll gesture could be implemented by composingtwo move up gestures.

Event Categorisation When detecting gestures, one of theproblems is to categorise the events (e.g. events from a spe-cific finger within the last 500 milliseconds). This event cat-egorisation is usually a cumbersome and error-prone task,especially when timing is involved. Therefore, event cate-gorisation should be offered to the programmer as a serviceby the underling system.

GUI-Event Correlation While the previous requirement ad-vocates the preprocessing of events, this requirement ensuresthe correlation between events and GUI elements. In mostof today’s multi-touch frameworks, all events are transferredto the application from a single entry point. The decisionabout which events correlate to which GUI elements is leftto the application developer or enforced by the framework.However, the reasoning about events correlating to specificgraphical components should be straightforward.

Temporal and Spatial Operators Extracting meaningfulinformation from a stream of events produced by the multi-touch hardware often involves the use of temporal and spa-tial operators. Therefore, the underlying framework shouldoffer a set of temporal and spatial operators in order to keepprograms concise and understandable. In current multi-touchframeworks, there is no or limited support for such operatorswhich often leads to complex program code.

MIDAS ARCHITECTUREThe processing of input event streams in human-computerinteraction is a complex task that many frameworks addressby using event handlers. However, the use of event handlershas proven to violate a range of software engineering prin-ciples including composability, scalability and separation ofconcerns [8]. We propose a rule-based approach with spatio-temporal operators in order to minimise the accidental com-plexity in dealing with multi-touch interactions.

The Midas architecture consists of the three layers shownin Figure 1. The infrastructure layer contains the hardwarebridge and translator components. Information from an in-put device is extracted by the hardware bridge and trans-ferred to the translator. In order to support different de-vices, concrete Midas instances can have multiple hardwarebridges. The translator component processes the raw inputdata and produces logical facts which are propagated to thefact base in the Midas core layer. The inference engine eval-uates these facts against a set of rules.

Midas

Application Layer

Core Layer

Infrastructure Layer

GUI

HardwareBridge

* Translator

FactBase

InferenceEngine

RuleBase

Shadows Model

Figure 1. Midas architecture

The rules are defined in the Midas application layer butstored in the rule base. When a rule is triggered, it can invokesome application logic and/or generate new facts. Further-more, GUI elements are accessible from within the reason-ing engine via a special shadowing construct.

Infrastructure LayerThe implementation of the infrastructure layer takes care ofall the details to address the hardware and transforming thelow level input data into logical facts. A fact has a type anda number of attributes. The core fact that every Midas im-plementation has to support is shown in Listing 1.

Listing 1. Core fact1 (Cursor (id ?id) (x ?x) (y ?y) (x−speed ?xs)2 (y−speed ?ys) (time ?t) (state ?s))

This core fact has the type Cursor and represents a singlecursor (e.g. a moving finger) from the input device. The at-tributes id, x, y, x-speed, y-speed and time representthe id, position, movement and time the cursor moved. Theattribute state indicates how the cursor has changed andcan be assigned the values APPEAR, MOVE or DISAPPEAR.

Page 4: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

Core LayerThe Midas core layer consists of an inference engine in com-bination with a fact base and a rule base that is going to bedescribed in the following.

RulesWe use rules as an expressive and powerful mechanism toimplement gesture recognition. Listing 2 outlines the im-plementation of a simple rule that prints the location of allcursors. The part that a rule should match in order to betriggered is called its prerequisites (before the⇒), while theactions to be performed if a rule is triggered are called itsconsequences.

Listing 2. Example rule1 (defrule PrintCursor2 (Cursor (x ?x) (y ?y))3 =>4 (printout t ‘‘A cursor is moving at: ’’ ?x ‘‘,’’ ?y))

The first line shows the definition of a rule with the nameprintCursor. This rule specifies the matching of all factsof type Cursor as indicated on the second line. Upon amatch of a concrete fact, the values of the two x and y at-tributes will be bound to the variables ?x and ?y. Subse-quently, the rule will trigger its actions (after the ⇒) andprint the text “A cursor is moving at:” followed by the x andy coordinates.

Temporal OperatorsAs timing is very important in the context of gesture recog-nition, all facts are automatically annotated with timing in-formation. This information can be easily accessed by usingthe dot operation and selecting the time attribute. Midas de-fines a set of temporal operators to check the relationshipbetween the timing attribute of different facts. The temporaloperators and their definitions are shown in Table 1.

Operator Args DefinitiontEqual f1,f2 ∣f1.time− f2.time∣ < "ttMeets f1,f2 f1.time− f2.time = "tmin

tBefore f1,f2 f1.time < f2.timetAfter f1,f2 f1.time > f2.timetContains f1,f2,f3 f2.time < f1.time < f3.time

Table 1. Temporal operators

Note that the tEqual operator is not defined as the absoluteequality but rather as being within a very small time inter-val "t. This fuzziness has been introduced since input deviceevents seldom occur at exactly the same time. Similar "tmin,the smallest possible time interval, is used to expresses thatf1 happened instantaneously after f2.

Spatial OperatorsIn addition to temporal operators, Midas offers the defini-tion of spatial constraints over matching facts as shown inTable 2. These spatial operators expect that the facts theyreceive have an x and y attribute.

Operator Args DefinitionsDistance f1,f2 euclidianDistance(f1, f2)sNear f1,f2 sDistance(f1, f2) < "ssNearLeftOf f1,f2 "s > (f2.x− f1.x) > 0sNearRightOf f1,f2 "s > (f1.x− f2.x) > 0sInside f1,f2 �(f1, f2)

Table 2. Spatial operators

Again, we have a small distance value "s to specify that twofacts are very close to each other. This distance is set as aglobal variable and adjustable by the developer. Since theinput device coordinates are already transformed by the in-frastructure layer, the value of "s is independent of a spe-cific input device. For sInside, the fact f2 is expected tohave a width and height attribute. The � function checkswhether the x and y coordinates of f1 are within the bound-ing box (f2.x, f2.y)(f2.x+ f2.widtℎ, f2.y+ f2.ℎeigℎt).Note that we also support user-defined operators.

List OperatorsThe implementation of gestures often requires the reason-ing over a set of events in combination with temporal andspatial constraints. Therefore, in Midas we have introducedthe ListOf operator that enables the reasoning over a setof events within a specific time frame. An example of thisconstruct is shown in Listing 3. The prerequisite will matcha set of Cursor events that have the same finger id and oc-curred within 500 milliseconds. Finally, the matching setsare limited to those sets that contain at least 5 Cursors.Note that due to the declarative approach the developer doesno longer have to keep track of the state of the cursors in thesystem and manually group them according to their id.

Listing 3. ListOf construct1 ?myList <− (ListOf (Cursor (same: id)2 (within: 500)3 (min: 5))

A list matched by the ListOf operator is guaranteed to betime ordered. This is required for the multi-touch interac-tion domain as one wants to reason about a specific motionalong the time axis. By default, rule languages do not implya deterministic ordering but allow arbitrary pattern match-ing to cover all possible combinations. The spatial operatorsfrom the previous sections are also defined over lists. Forexample, sAllNearBottomOf expects two lists l1 andl2 and the operator will return true only if all the elementsof the first list are below all elements of the second list.

Movement OperatorsThe result of the ListOf construct is a list which can beused in combination with movement operators. A move-ment operator verifies that a certain property holds for anentire list. For example, the movingUp operation is definedas follows:

movingUp(list)⇐⇒ ∀i, j : i < j ∧ list[i].y > list[j].y

In a similar way, we have defined the movingDown, moving-Left and movingRight operators.

Page 5: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

Application LayerThe application layer consists of a regular program which isaugmented with a set of rules in order to describe the ges-tures. A large part of this program however will deal withthe GUI and some gestures will only make sense if they areperformed on specific GUI objects. As argued in the intro-duction, the reasoning over GUI objects in combination withthe gestures should be straightforward. Therefore, in a Mi-das system the GUI objects are reified as so-called shadowfacts in the reasoning engine’s working memory. This im-plies that the mere existence of the GUI objects automati-cally give rise to the associated fact.

The fields of a GUI object are automatically transformedinto attributes of the shadow fact and can be accessed likeany other fact fields. However, a shadow fact differs fromregular facts in the sense that the values of its attributes aretransparently kept synchronised with the values of the ob-ject it represents. This allows us to reason about applicationlevel entities inside the rule language (i.e. graphical objects).Moreover, from within the reasoning engine the methods ofthe object can be called in the consequence block of a rule.This is done by accessing the predefined Instance fieldof a shadow fact followed by the name and arguments of themethod to be invoked. Listing 4 shows an example of call-ing the setColor method with the argument "BLUE" ona circle GUI element.

Listing 4. Method call on a shadow fact instance1 (?circle.Instance setColor ‘‘BLUE’’)

Finally, the attributes of a shadow fact can be changed byusing the modify construct. When the modify construct isapplied to a shadow fact, the changes are automatically re-flected in the shadowed object.

PrioritiesWhen designing gestures, parts of certain gestures mightoverlap. For example, the gesture for a single click over-laps with the gesture for a double click. If the priority of thesingle click would be higher than the double click gesture,a user would never be able to perform a double click sincethe double click would always be recognised as two singleclick gestures. Therefore, it is important to ensure that thedeveloper has means to define priorities between differentgestures. In Midas, gestures with a higher priority will al-ways be matched before gestures with a lower priority. Anexample of how to use priorities in rules is given in Figure 5.The use of priorities allows the programmer to tackle prob-lems with overlapping gestures. The priority concept furtherincreases modularisation since normally there is no intrusivecode needed in order to separate overlapping gestures.

Listing 5. Priorities1 (defrule PrioritisedRule2 (declare (salience 100))3 <prerequisites>4 =>5 <consequence> )

‘Flick Left’ Gesture ExampleAfter introducing the core concepts of the Midas model, wecan now explain how these concepts can be combined in or-der to specify simple gestures. The Flick Left gesture ex-ample that we are going to use has been implemented innumerous frameworks and interfaces for photo viewing ap-plications. In those applications, users can move from onephoto to the next one by flicking their finger over the photoin a horizontal motion to the left. In the following, we showhow the Flick Left gesture can be implemented in a compactway based on Midas .

One of the descriptions of the facts representing a Flick Leftgesture is as follows: “an ordered list of cursor events fromthe same finger within a small time interval where all theevents are accelerated to the left”. The implementation ofnew gestures in Midas mainly consists of translating suchdescriptions into rules as shown in Listing 6.

Listing 6. Single finger ‘Flick Left’ gesture1 (defrule FlickLeft2 ?eventList[] <−3 (ListOf (Cursor (same: id) (within: 500) (min: 5)))4 (movingLeft ?eventList)5 =>6 (assert (FlickLeft (events ?eventList))))

The prerequisites of the rule specify that there should be alist with events generated by the same finger by making useof the ListOf construct. It further defines that all theseevents should be generated within a time frame of 500 mil-liseconds and that the list must contain at least 5 elements.

Gesture CompositionWe have shown that set operators enable the declarative cat-egorisation of events. The Midas framework supports tem-poral, spatial and motion operators to declaratively specifydifferent gestures. Furthermore, priorities increase the mod-ularisation and shadow facts enable the programmer to rea-son about their graphical objects in a declarative way.

In the Midas framework, it is common to develop complexgestures by combining multiple basic gestures as there is nodifference in reasoning over simple or derived facts. Thisreusability and composition of gestures is achieved by as-serting gesture-specific facts on gesture detection. The reuseand composition of gestures is illustrated in Listing 7, wherea Double Flick Left composite gesture is implemented byspecifying that there should be two Flick Left gestures at ap-proximately the same time.

Listing 7. ‘Double Flick Left’ gesture1 (defrule DoubleFlickLeft2 ?upLeftFlick <− (FlickLeft)3 ?downLeftFlick <− (FlickLeft)4 (sAllNearBottomOf ?downLeftFlick.events5 ?upLeftFlick.events)6 (tAllEqual ?upLeftFlick ?downLeftFlick)7 =>8 (assert (DoubleFlickLeft)))

Page 6: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

The implementation of this new composite gesture in tra-ditional programming languages would require the use ofadditional timers and threads. By adopting a rule-based ap-proach, one clearly benefits from the fact that we only haveto provide a description of the gestures but we do not haveto be concerned on how to derive the gesture.

JMIDAS PROTOTYPE IMPLEMENTATIONWe have implemented a concrete JMidas prototype of theMidas architecture and embedded it in the Java program-ming language. This enables programmers to define theirrules in separate files and to load them from within Java.It also implies that developers can make use of Java objectsfrom within their rules in order to adapt the GUI. We first de-scribe which devices are currently supported by JMidas andthen outline how to program these devices based on JMidas.

Input DevicesIn order to support a large amount of different multi-touchdevices, we have decided to implement an abstraction layercurrently supported by many multi-touch devices. Two im-portant existing abstraction layers providing a common andportable API for input device handling are the TUIO [6] andthe TISCH protocol [2]. We chose TUIO as one of the in-put layers in the JMidas prototype implementation becauseof its features in terms of portability and performance. Thisimplies that any input device supporting the TUIO protocolcan automatically be used in combination with JMidas.

Figure 2. JMidas application on a Stantum SMK 15.4

A first multi-touch device that has been used in combina-tion with the JMidas framework is the Stantum SMK 15.45

shown in Figure 2, a 15.4 inch multi-touch display that candeal with more than ten simultaneous touch points. Unfor-tunately, the Stantum SMK 15.4 does not offer any TUIOsupport and we had to implement our own cross-platformTUIO bridge that runs on Linux, Windows and Mac OS X.

Note that the Midas framework is not limited to multi-touchdevices. A radically different type of input device supportedby the JMidas framework are the Sun SPOTs6 shown in Fig-ure 3. A Sun SPOT is a small wireless sensor device thatembeds an accelerometer to measure its orientation or mo-tion.5http://www.stantum.com6http://sunspotworld.com

Figure 3. JMidas Sun SPOT integration

Previous experiments with the Sun SPOT device have shownthat gestural movements can be successfully recognised byusing a Dynamic Time Warp (DTW) template algorithm [5].However, it was rather difficult to express concurrent gestu-ral movements of multiple Sun SPOTs based on traditionalprogramming languages due to similar reasons as describedfor the multi-touch domain. These difficulties can be over-come by using the temporal operators offered by JMidas anda developer can reason about 3D hand movements tracked bySun SPOTs. The JMidas Sun SPOT support is still experi-mental but we managed to feed the reasoning engine withsimple facts for individual Sun SPOTs and the recognitionresults for composite gestures look promising.

InitialisationThe setup of the JMidas engine is a two step process to beexecuted in the application layer as shown in Listing 8. First,a new JMidas engine is initialised (line 1) which correspondsto setting up the Midas core layer introduced earlier in Fig-ure 1. The second step consist of connecting one or multipleinput sources from the infrastructure layer (lines 2–8). Thisstep also specifies a resolution for the display via the RES Xand RES Y arguments. After this initialisation phase, theengine can be started and fed with rules. In Listing 8 thisis highlighted by calling the start method and loading therules from a file called rules.md (lines 9–10).

Listing 8. JMidas engine initialisation1 JMidasEngine engine = new JMidasEngine();2 TuioListener listener =3 new TuioToFactListener(engine, RES X, RES Y);4 TuioClient client = new TuioClient();5 client.addTuioListener(listener);6 SunSpotListener spListener = new SunSpotListener(engine);7 SunSpotClient spClient = new SunSpotClient();8 spClient.addSunSpotListner(spListener);9 engine.start();

10 engine.loadFile(‘‘rules.md’’);

In JMidas, we use Java annotations to reify objects as logicalfacts. When a class is annotated as Shadowed, the JMidasframework automatically reifies the instances as logical factswith the classname as type and the attributes correspondingto the fields of the object. The field annotation Ignoreis used to exclude specific fields from being automaticallyreified as attributes of the logical fact.

Page 7: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

An example of how to use the Java annotation mechanism isgiven in Listing 9. Since the Circle object is annotated asShadowed, its instances will be reified by JMidas as logicalfacts. The Circle class has the three fields x, y and z,whereas the last field has been annotated as Ignore. Thisimplies that the z field will not be reified as an attribute of thelogical fact. From within the logical rules, the programmercan match objects of type Circle by using the followingexpression: (Circle (x ?x) (y ?y)).

Listing 9. Shadowing GUI elements1 public @Shadowed class Circle {2 int x;3 int y;4 @Ignore5 int z;6 Circle(int id, int x, int y) { ... }7 }

Reasoning EngineJMidas incorporates an event-driven reasoning engine basedon the Rete [3] algorithm. The core reasoning engine usedin JMidas is the Jess [4] rule engine for Java which had tobe slightly adapted for our purpose. The Jess reasoning en-gine has no built in support for temporal, spatial or motionoperations. Moreover, we have extended the engine to dealwith ordered list operations. Other reasoning engines couldhave led to the same result but we opted for the Jess reason-ing engine due to its performance and the fact that it alreadyoffers basic support for shadow facts. However, we felt thatin the context of our work this integration was not conve-nient enough and added several annotations to ensure thatthe synchronisation between facts and classes is handled bythe framework.

Implemented GesturesJMidas implements all of the traditional gestures such asdrag, tap, double tap, stretch and rotate. In addition to thesestandard gestures, we have validated our approach by imple-menting a set of complex gestures shown in Figure 4. Ac-cording to the Natural User Interface Group, (NUI group)7,these complex gestures possibly offer a more ergonomic andnatural way to interact with multi-touch input devices.

Figure 4. Complex gestures implemented in JMidas

Note that we have not empirically evaluated the claims ofthe NUI group since the focus of our work is to prevent7http://nuigroup.com

the programmer from dealing with the accidental complexitywhile implementing multi-touch applications. To the best ofour knowledge, our complex gestures shown in Figure 4 arecurrently not implemented by any other existing multi-touchframework.

DISCUSSION AND FUTURE WORKThe implementation of gesture-based interaction based onlow-level programming constructs complicates the introduc-tion of new gestures and limits the reuse of common inter-action patterns across different applications. Current multi-touch interfaces adopt an event-driven programming modelresulting in complex detection code that is hard to main-tain and extend. In an event-driven application, the controlflow is driven by events and not by the lexical scope of theprogram code, making it difficult to understand these pro-grams. The developer has to manually store and combineevents generated by several event handlers. In Midas, a pro-grammer only has to provide a declarative description of agesture and the underlying framework takes care of how tomatch gestures against events.

Current state-of-the-art multi-touch toolkits provide limitedsoftware abstractions to categorise events according to spe-cific criteria. Often they only offer information that there isan event with an identifier and a position or provide a listof events forming part of a single stroke. However, for theimplementation of many multi-stroke gestures, it is crucialto categorise these events. The manual event classificationcomplicates the event handling code significantly. With thepresented Midas ListOf operator, we can easily extractlists of events with certain properties (e.g. events with thesame id or events within a given time frame).

With existing approaches, it is difficult to maintain tempo-ral invariants. For example, when implementing the clickand double click gestures, the gesture developer has to usea timer which takes care of triggering the click gesture aftera period of time if no successive events triggered the dou-ble click gesture. This results in code that is distributed overmultiple event handlers and, again, significantly complicatesany event handling. We have introduced declarative tem-poral and spatial operators to deal with this complexity. Incontrast to most existing solutions, Midas further supportsthe flexible composition of new complex gestures by simplyusing other gestures as rule prerequisites.

Our Midas frameworks addresses the issues of modularisa-tion (M), composition (C), event categorisation (EC), GUI-event correlation (G) as well as the temporal and spatialoperators (TS) mentioned in the related work section. Ta-ble 3 summarises these software engineering abstractionsand compares Midas with the support of these requirementsin current state-of-the-art multi-touch toolkits. This com-parison with related work shows that, in contrast to Midas,none of the existing multi-touch solutions addresses all therequired software engineering abstractions.

As we have explained earlier, there might be conflicts be-tween different gestures in which case priorities have to be

Page 8: Midas: A Declarative Multi-Touch Interaction Framework · Midas: A Declarative Multi-Touch Interaction Framework Christophe Scholliers1, Lode Hoste2, Beat Signer2 and Wolfgang De

M C EC G TSSparsh-UI +/- +/- +/- +/- -

MT4j +/- - - +/- -Grafiti +/- +/- +/- +/- -

libTISCH +/- - +/- + -Qt4 +/- +/- - +/- -

.NET - - - - -Templates + - - - -

Midas + + + + +

Table 3. Comparison with existing multi-touch toolkits

defined to resolve these conflicts. In current approaches, thisconflict resolution code is scattered over the code of multiplegestures. This implies that the addition of a new gesture re-quires a deep knowledge of existing gestures and results in alimited extensibility of existing multi-touch frameworks [7].In contrast, our Midas framework offers a built-in prioritymechanism and frees the developer from having to manuallyimplement this functionality.

In the future, we plan to develop an advanced priority mech-anism that can be adapted dynamically. We are currentlyalso investigating the JMidas integration of several new inputdevices, including Siftables8 and a voice recognition system.Even though these devices significantly differ from multi-touch surfaces, we believe that the core idea of applying areasoning engine and tightly embedding it in a host languagecan simplify any gesture recognition development.

CONCLUSIONWe have presented Midas, a multi-touch gesture interactionframework that introduces a software engineering method-ology for the implementation of gestures. While existingmulti-touch frameworks support application developers byproviding a set of predefined gestures (e.g. pinch or rotate),our Midas framework also enables the implementation ofnew and composed multi-touch gestures. The JMidas pro-totype highlights how a declarative rule-based language de-scription of gestures in combination with a host language in-creases the extensibility and reusability of multi-touch ges-tures. The presented approach has been validated by imple-menting a number of standard multi-touch gestures as wellas a set of more complex gestures that are currently not sup-ported by any other toolkit. While the essential complexityin dealing with advanced multi-touch interfaces cannot bereduced, we offer the appropriate concepts to limit the ac-cidental complexity. We feel confident that our approachwill support the HCI community in implementing and inves-tigating innovative multi-touch gestures that go beyond thecurrent state-of-the-art.

REFERENCES1. F. P. Brooks, Jr. No Silver Bullet: Essence and

Accidents of Software Engineering. IEEE Computer,20(4):10–19, April 1987.

2. F. Echtler and G. Klinker. A Multitouch SoftwareArchitecture. In Proc. of NordiCHI 2008, 5th Nordic

8http://sifteo.com

Conference on Human-Computer Interaction, pages463–466, Lund, Sweden, 2008.

3. C. L. Forgy. RRete: A Fast Algorithm for the ManyPattern/Many Object Pattern Match Problem. ArtificialIntelligence, 19(1):17–37, 1982.

4. E. Friedman-Hill. Jess in Action: Java Rule-BasedSystems. Manning Publications, July 2003.

5. L. Hoste. Experiments with the SunSPOTAccelerometer. Project report, Vrije UniversiteitBrussel, 2009.

6. M. Kaltenbrunner, T. Bovermann, R. Bencina, andE. Costanza. TUIO: A Protocol for Table-Top TangibleUser Interfaces. In Proc. of GW 2005, 6th Intl.Workshop on Gesture in Human-Computer Interactionand Simulation, Ile de Berder, France, May 2005.

7. D. Kammer, M. Keck, G. Freitag, and M. Wacker.Taxonomy and Overview of Multi-touch Frameworks:Architecture, Scope and Features. In Proc. of Workshopon Engineering Patterns for Multi-Touch Interfaces,Berlin, Germany, June 2010.

8. I. Maier, T. Rompf, and M. Odersky. Deprecating theObserver Pattern. Technical ReportEPFL-REPORT-148043, Ecole Polytechnique Federalede Lausanne, Lausanne, Switzerland, 2010.

9. L. H. Nakatani and J. A. Rohrlich. Soft Machines: APhilosophy of User-Computer Interface Design. InProc. of CHI ’83, ACM Conference on Human Factorsin Computing Systems, pages 19–23, Boston, USA,December 1983.

10. A. D. Nardi. Grafiti: Gesture Recognition mAnagementFramework for Interactive Tabletop Interfaces. Master’sthesis, University of Pisa, 2008.

11. P. Ramanahally, S. Gilbert, T. Niedzielski,D. Velazquez, and C. Anagnost. Sparsh UI: AMulti-Touch Framework for Collaboration andModular Gesture Recognition. In Proc. of WINVR2009, Conference on Innovative Virtual Reality, pages1–6, Chalon-sur-Saone, France, February 2009.

12. D. Rubine. Specifying Gestures by Example. In Proc.of ACM SIGGRAPH ’91, 18th Intl. Conference onComputer Graphics and Interactive Techniques, pages329–337, Las Vegas, USA, August 1991.

13. B. Signer, U. Kurmann, and M. C. Norrie. iGesture: AGeneral Gesture Recognition Framework. In Proc. ofICDAR 2007, 9th Intl. Conference on DocumentAnalysis and Recognition, pages 954–958, Curitiba,Brazil, September 2007.

14. J. Stewart, B. B. Bederson, and A. Druin. SingleDisplay Groupware: A Model for Co-presentCollaboration. In Proc. of CHI ’99, ACM Conferenceon Human Factors in Computing Systems, pages286–293, Pittsburgh, USA, May 1999.