Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 ·...

8
Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew Ziegler and Serge Belongie Department of Computer Science and Engineering University of California, San Diego {aziegler,sjb}@cs.ucsd.edu Abstract In this work we present a novel application of non-rigid surface detection to enable gestural interaction with appli- cable surfaces. This method can add interactivity to tradi- tionally passive media such as books, newspapers, restau- rant menus, or anything else printed on paper. We allow a user to interact with these surfaces in a natural manner and present basic gestures based on pointing and touching. This technique was developed as part of an ongoing ef- fort to create an assisted reading device for the visually im- paired. However, it is suited to general applications and can be used as a practical mechanism for interaction with screen-less wearable devices. Our key contributions are a unique application of non-rigid surface detection, a basic gesturing paradigm, and a proof of concept system. 1. Introduction In recent years the size of portable devices such as smart phones and personal digital assistants have remained essen- tially unchanged since they rely on screens to provide visual feedback to the user. In addition there must be some mecha- nism for the user to provide input. Recently there has been a trend towards using touch screens in these devices, making the screen the sole limiting factor. Screen-less devices provide the ultimate in portability since there is no arbitrary limit on how small they may be- come. However, there is not an obvious method to inter- act with such devices nor a way to provide visual feedback to the user. One approach to cope with this is to replace the screen with a projected user interface and use hand ges- tures for interaction [14]. This approach requires a surface on which the display may be projected and will not work in brightly lit environments. Another strategy is to aban- don displays altogether and rely on the user’s imagination to provide “visual feedback” [7]. This is an interesting con- cept, but puts the burden of imagining the display on the user. Neither of these approaches are accessible to the visu- ally impaired and in this work we develop a novel approach that is useful to the sighted and visually impaired alike. We suggest using whatever surface the user has available to them as the source of visual feedback as well as the input medium. This could be a newspaper, restaurant menu, book, magazine or even the nutrition label on a can of soup. We make only the assumptions that the surface is isometric with the plane, which is equivalent to there being zero Gaussian curvature everywhere on the surface. Our method allows a user to interact with these surfaces in a natural manner through pointing gestures. In this way the presence of a screen-less device becomes less pronounced to the user and from his perspective the previously passive surface becomes interactive. Imagine a visually impaired user sitting in a restaurant touching a menu to have entr´ ee items read aloud. A sighted user who notices an intriguing article in a magazine and uses a hand gesture to virtually clip it. A tourist in a for- eign country reading a newspaper touches a news headline to have it translated audibly into her own language. Our method provides a mechanism for this type of interaction with these surfaces that are already ubiquitous in daily life. In the next section we review related work in more detail. In section 3 we describe our method and the assumptions we make. In section 4 we present a basic gesturing paradigm and describe how metadata can be used to create this inter- active experience. Finally, in section 5 we discuss our proof of concept implementation. 2. Related Work There many examples of systems that allow the user’s hands or fingertips to act as an input device [1, 7, 11, 12, 14, 15, 21, 25]. In [14] Mistry et al. present a wearable projector camera system that attempts to make information that is typically confined to paper and screens tangible through hand ges- tures. In the same vein we describe a method to allow users to interact with information printed on paper. Unlike their method, ours is designed to work with actively deforming surfaces and the information is made seamlessly tangible without visual augmentation. 73

Transcript of Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 ·...

Page 1: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces

Andrew Ziegler and Serge BelongieDepartment of Computer Science and Engineering

University of California, San Diego{aziegler,sjb}@cs.ucsd.edu

Abstract

In this work we present a novel application of non-rigidsurface detection to enable gestural interaction with appli-cable surfaces. This method can add interactivity to tradi-tionally passive media such as books, newspapers, restau-rant menus, or anything else printed on paper. We allow auser to interact with these surfaces in a natural manner andpresent basic gestures based on pointing and touching.

This technique was developed as part of an ongoing ef-fort to create an assisted reading device for the visually im-paired. However, it is suited to general applications andcan be used as a practical mechanism for interaction withscreen-less wearable devices. Our key contributions are aunique application of non-rigid surface detection, a basicgesturing paradigm, and a proof of concept system.

1. IntroductionIn recent years the size of portable devices such as smart

phones and personal digital assistants have remained essen-tially unchanged since they rely on screens to provide visualfeedback to the user. In addition there must be some mecha-nism for the user to provide input. Recently there has been atrend towards using touch screens in these devices, makingthe screen the sole limiting factor.

Screen-less devices provide the ultimate in portabilitysince there is no arbitrary limit on how small they may be-come. However, there is not an obvious method to inter-act with such devices nor a way to provide visual feedbackto the user. One approach to cope with this is to replacethe screen with a projected user interface and use hand ges-tures for interaction [14]. This approach requires a surfaceon which the display may be projected and will not workin brightly lit environments. Another strategy is to aban-don displays altogether and rely on the user’s imaginationto provide “visual feedback” [7]. This is an interesting con-cept, but puts the burden of imagining the display on theuser. Neither of these approaches are accessible to the visu-ally impaired and in this work we develop a novel approach

that is useful to the sighted and visually impaired alike.We suggest using whatever surface the user has available

to them as the source of visual feedback as well as the inputmedium. This could be a newspaper, restaurant menu, book,magazine or even the nutrition label on a can of soup. Wemake only the assumptions that the surface is isometric withthe plane, which is equivalent to there being zero Gaussiancurvature everywhere on the surface. Our method allowsa user to interact with these surfaces in a natural mannerthrough pointing gestures. In this way the presence of ascreen-less device becomes less pronounced to the user andfrom his perspective the previously passive surface becomesinteractive.

Imagine a visually impaired user sitting in a restauranttouching a menu to have entree items read aloud. A sighteduser who notices an intriguing article in a magazine anduses a hand gesture to virtually clip it. A tourist in a for-eign country reading a newspaper touches a news headlineto have it translated audibly into her own language. Ourmethod provides a mechanism for this type of interactionwith these surfaces that are already ubiquitous in daily life.

In the next section we review related work in more detail.In section 3 we describe our method and the assumptions wemake. In section 4 we present a basic gesturing paradigmand describe how metadata can be used to create this inter-active experience. Finally, in section 5 we discuss our proofof concept implementation.

2. Related WorkThere many examples of systems that allow the user’s

hands or fingertips to act as an input device [1, 7, 11, 12,14, 15, 21, 25].

In [14] Mistry et al. present a wearable projector camerasystem that attempts to make information that is typicallyconfined to paper and screens tangible through hand ges-tures. In the same vein we describe a method to allow usersto interact with information printed on paper. Unlike theirmethod, ours is designed to work with actively deformingsurfaces and the information is made seamlessly tangiblewithout visual augmentation.

73

Page 2: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

Figure 1. Illustration of a non-rigid reference frame.

In [7] Gustafson et al. the authors introduce the notion ofan imaginary surface. They challenge the need for any dis-play at all and propose that a user’s imagination can providethe necessary visual feedback. In their system a user ex-tends his thumb and forefinger to create an “L” shape. Thethumb and forefinger represent basis vectors of an imagi-nary planar surface. With his remaining hand the user cangesture to draw on the surface. User studies showed thatthis activity becomes less precise as the user’s drawing handmoves farther from the origin of the surface. In our systemusers have the benefit of the familiar appearance of printedmedia to provide visual feedback and gestures are kept sim-ple with limited reliance on the user’s imagination.

In [25] Zhang presents a system that uses an ordinarypiece of paper to allow users to control a computer by point-ing. This work is similar to ours, but with the computermonitor as the source of visual feedback. The method re-lies on quadrangle detection and uses a homography to mapthe user’s detected fingertip into a rectified fronto-parallelplanar reference frame. This technique requires that at alltimes the paper be rigid, planar, rectangular and no edge canbe fully occluded or out of frame. Our method works witharbitrarily shaped surfaces that may be actively deformingand does not require a complete view of the surface for in-teraction. This is essential when developing a device for thevisually impaired since it will be difficult for these users toframe the surface with the camera completely.

3. Non-Rigid Reference FramesHere we describe the notion of a non-rigid reference

frame. When viewing the perspective projection of an ac-tively deforming non-rigid surface the location of featurescan change dramatically. Examples of this include newspa-pers, books and magazines warping simply from the pres-sure of a person’s hand holding them comfortably. These

surfaces may warp further if a person adjusts his grasp ortouches a point on the surface; this is demonstrated in Fig-ure 3. In our system we want to give users a method ofgestural interaction with such actively deforming surfacesand thus we must find a stable reference frame to detectgestures. To cope with this we seek a point to point map-ping from the perspective projection of a warped surface toa rigid planar reference frame. In effect this mapping cre-ates a stable reference frame in the presence of active de-formation. We refer to this effect as a non-rigid referenceframe which is illustrated in Figure 1.

3.1. Applicable Surfaces

Applicable surfaces are those that are isometric with theplane. These surfaces are commonplace in man-made en-vironments since any inextensible surface cut from a planeis by definition isometric with the plane. This means any-thing made from a sheet of paper is naturally modeled as anapplicable surface. Thus for our system it is a reasonable as-sumption that the surfaces with which users will interact areapplicable surfaces. We use the isometry to find a point topoint mapping from the perspective projection of a warpedapplicable surface.

3.2. Non-Rigid Surface Detection

We use non-rigid surface detection to approximate thepoint to point mapping. Pilet et al. [17] describe a finite el-ement method that gives a good approximation to this map-ping, originally intended for graphical augmentation [18].

The method presented in [17] requires a planar referenceview of the surface and feature point correspondences witha potentially deformed view of the surface. First, a trian-gulated 2D mesh M of hexagonally connected vertices isgenerated to cover the planar reference frame. Using the setC of feature correspondences between the two images Piletet al. sought a transformation TS to warp the undeformedmesh M onto the target image such that the sum of squareddistances of the subset G ⊂ C of inlier correspondencesis minimized while keeping the deformation as smooth aspossible. To accomplish this the locations of detected cor-responding features in the reference view are represented bythe barycentric coordinates of the mesh triangles containingthem. They proceed to define TS as a point to point map-ping parameterized by the state matrix S = (X,Y ) whereX and Y are column vectors containing the pixel coordi-nates of the warped mesh vertices. The mapping is definedto be

TS(p) =

3∑i=1

Bi(p)

[xiyi

](1)

where p is a point on the original surface, the Bi(p) arethe barycentric coordinates from the reference mesh and the

74

Page 3: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

(a) (b) (c)

Figure 2. A user touches the nutrition label on a can of soup. (a) The planar reference mesh. (b) The resulting warped mesh from non-rigidsurface detection. (c) The mapped location of the colored marker’s centroid in the planar reference frame.

[xi, yi]> are the pixel coordinates of the containing trian-

gle’s vertices with respect to S.In our system we seek a point to point mapping from

the warped view back to the planar reference view. Givena point contained within the warped mesh we find thebarycentric coordinates of the point with respect to the ver-tices in the warped mesh and use the known locations ofthe corresponding vertices in the reference mesh to find theinverse mapping. More precisely we define an inverse trans-formation

T−1S (p) =

3∑i=1

Bi(p)

[x′iy′i

](2)

where p is a point contained within the warped mesh, theBi are the barycentric coordinates of the point computedwith respect to the reference vertices and the [x′i, y

′i]T are

the locations of the vertices in the reference mesh. Essen-tially the warped surface is broken up into small local rigidreference frames and T−1S is a piecewise function mappingany point in the warped reference frame onto its correspond-ing position in the reference view. Since a mesh is used, asthe number of facets increases, T−1S approaches the precisepoint to point mapping that we seek.

To find T−1S we must first warp the reference mesh ontothe target image, to do this we use the energy minimiza-tion technique developed in [17] and [18]. In [17] an en-ergy function ε(S) is defined such that when minimizedthe squared distance between corresponding points is mini-mized and the mesh is warped as smoothly as possible. Thefunction is defined as

ε(S) = λDεD(S) + εC(S) (3)

where εC is the correspondence energy, εD is the deforma-tion energy and λD is a constant.

The deformation energy ensures that the mesh is de-formed smoothly, more precisely it should penalize meshconfigurations that are not the result of perspective transfor-mations. Thus εD is defined to be the sum of the squaredsecond derivatives at each node in the mesh since this ap-proximates curvature. As shown in [18] the deformationenergy can be written using finite differences as

εD(S) =1

2

∑(i,j,k)∈E

(−xi+2xj−xk)2+(−yi+2yj−yk)2

(4)where E is the set of all triplets (i, j, k) such that verticesvi, vj , vk are collinear. As shown in [18] it is possible towrite equation 4 in matrix notation allowing the use of thefast semi-implicit minimization scheme described in [9].

The correspondence energy takes into consideration theprecise registration of the mesh in the warped view. In [17]it is defined as

εC(S) = −∑c∈C

wcρ(||c1 − TS(c0)||, r) (5)

where the wc are weights in the interval [0, 1] and ρ is arobust estimator defined as

ρ(δ, r) =

{3(r2−δ2)

4r3 δ < r0 otherwise

(6)

In each iteration of minimization the radius of confidence ris decreased and ρ classifies those correspondences outsidethe radius as outliers. This ensures deterministic termina-tion which is essential for an interactive system as opposedto a method like RANSAC [5].

3.3. Obtaining the Planar Reference View

To perform non-rigid surface detection as described in[17] and [18] knowledge of planar reference view is neces-

75

Page 4: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

sary. In this work we assume this planar view is availableto us. This is reasonable since in our primary use case a vi-sually impaired user will be interacting with printed media.Typically printed media is prepared digitally and thus thereis a priori knowledge of the planar view.

There are a number of ways to obtain this planar vieweven when the planar state is not known a priori; we brieflydescribe three of them here. When there is no previousknowledge of the planar reference view it is still possible tosynthesize one. Using a portable structured light system orstructure from motion it is possible to obtain a point cloudrepresentation of the deformed surface. This point cloudrepresentation can be used to synthesize a planar referenceview with methods such as those described in [4, 8, 19].In the absence of range data, methods such as the ones de-scribed in [6, 20, 24] could be used to obtain a planar ref-erence view from monocular images. Finally, in the specialcase of documents with dense text it is possible to use the

(a)

(b)Figure 3. A tourist points to a location on a city map. (a) Themap distorted by the pressure from his finger and the appropriatelywarped mesh. (b) Despite the distortion, the location of the col-ored marker’s centroid is correctly located in the planar referenceframe.

Figure 4. The high-level flow of our method.

texture flow to synthesize the planar view as described in[13].

4. GesturesOur system offers a new method of interaction with com-

mon place objects such as newspapers, restaurant menus,magazines, and books. For visually impaired users this sys-tem can provide access to information unavailable to themwithout assistance from other people.

4.1. Fingertip Detection and Tracking

Our method relies on being able to accurately detectfingertips. There are many systems that have been suc-cessfully developed to reliably detect hands and fingertips[1, 3, 10, 11, 12, 15, 22, 25]. In our implementation weuse the simple approach of colored markers adhered to theuser’s fingernail. We then find the centroids of the detectedmarkers in the video frames and use these as the estimatedlocation of the fingertip.

4.2. Metadata Registration

After obtaining the planar reference view there is achance to process the planar view for a particular appli-cation. For instance in a common use case users will beinteracting with text and at this step the planar view wouldbe run through an OCR engine. The results of OCR aretypically hierarchical bounding boxes of characters, words,sentences and paragraphs with the detected text attached asmetadata. Since this processing is done on the planar refer-ence view the location of these bounding boxes can easilybe represented in that planar reference frame. As illustratedin Figure 5 the mapped location of the user’s fingertip in theplanar reference frame can be used to detect which of thesebounding boxes the user is touching. In the case of a visu-ally impaired user or a tourist at a restaurant the boundingboxes would contain entree items and the metadata wouldbe the description or the description’s translation. The menu

76

Page 5: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

Figure 5. The pointing gesture: The user holds his finger still fora short time and the fingertip location is mapped into the planarreference frame. The registered bounding shapes are tested to seeif any contain the fingertip location. A predefined action is per-formed for any bounding shapes the user’s fingertip is touching.

item pointed to by the user can be detected and its associ-ated metadata could be read aloud to the user.

There is no restriction to the bounding shapes that can beregistered in the planar reference frame nor the action thatshould be performed when activated. This is analogous tothe traditional GUI paradigm of buttons and events.

4.3. Basic Gestures

In our system we have several pieces of informationavailable to develop gestures. At any given time we haveknowledge of how many fingertips are present and theirlocations in the planar reference frame. In addition sincewe capture a video stream we can store data from previousframes to develop complex gestures.

In [1] Argyros and Lourakis present sensible design cri-teria for gestures which we take into account in our ownsystem. Specifically they suggest that gestures should beintuitive and unambiguous. This is beneficial for both theuser and system designer since gestures are kept simple andare easy to detect. In our typical use case the user is ex-pected to be holding the surface with which they are inter-acting. Thus we designed our basic gestures to require theuse of only one hand, but the user is free to use either handfor gesturing at any time.

To avoid ambiguity our gestures are activated by holdingfingers still for a certain amount of time. This is simpleto detect since we merely need to check that the location ofthe fingertips stays within a small radius of their positions inprior frames. If their locations do not drift out of this radiusfor a specific number of consecutive video frames we detectan action.

Our most basic and primary gesture is pointing. To per-form this gesture the user simply points to a location on the

surface; Figures 2 and 3 demonstrate this gesture. When apointing action is performed bounding shapes registered inthe planar reference frame are tested to see if they containthe location of the fingertip. If a bounding shape does con-tain the fingertip location its predefined action is performed;this is analogous to a graphical button and a mouse clickin a traditional GUI. If there are many registered boundingshapes, which could be the case with dense text, it wouldbe inefficient to test each one. Since we detect gestures ina planar reference frame it is possible to use a binary spacepartition to achieve sub-linear time for this operation.

Selection is our other basic gesture and demonstrates theconcept of gesture modes. Since pointing is an extremelynatural gesture it is desired that all our gestures will be de-rived from this. However, this introduces ambiguity to oursystem. To overcome this we have a concept of gesturemodes. To enter another gesture mode the user presentsa certain number of fingers and holds them still for a shortamount of time. As depicted in Figure 6 a user extends twofingers to enter selection mode and then gestures a boundingbox to select a region of the surface. To gesture the bound-ing box the user indicates diagonal corners of the box withpointing gestures. Once the region is selected it is registeredin the planar reference frame. The user can then point to hisselection to execute a predefined action such as saving theselection as an image or activating each registered boundingshape in the selected region. In the case of text this couldmean reading aloud all text that is selected in the region orcopying the text to the system clipboard of a portable de-vice. If the user desires to clear his selection without ex-ecuting any action on it he can present two fingers againbefore another pointing gesture.

5. Current Implementation

Our current system is implemented in Matlab as a proofof concept and processing is done with pre-captured videos

Figure 6. The selection gesture: The user shows two fingers toenter selection mode. Using two pointing gestures the user drawsa bounding box to select a region of the surface.

77

Page 6: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

off-line. The videos are captured using a web cameramounted on goggles worn on the user’s head. This way thecamera sees whatever the user’s head is pointing towards.We chose this arrangement with the visually impaired inmind since pointing the camera at the surface relies solelyon proprioception. The user would simply point his headtowards the finger being used for gesturing. This is also anatural position for the camera to be mounted for sightedusers.

5.1. Paths to Real-time

Our system is intended to be interactive and we haveplans to implement a real-time version. It is expected thatthis system will run in real-time if optimized and compiledto machine code. Our justification is the fact we rely solelyon the method described in [17] which was shown to run at10 frames per second on a 2.8 GHz Intel Pentium 4 ma-chine. Currently we use SURF features implemented inMatlab and feature extraction and descriptor computationtake the bulk of our processing time [2]. Ferns features wereused in [17] since they are fast detect, but this is made at atrade-off for off-line processing [16]. We prefer SURF fea-tures because there are GPU implementations that can runat 30 frames per second and require no off-line processing[23].

5.2. Experiments

Our method is intended to let users interact with activelydeforming surfaces and thus should be accurate despite oc-clusions and deformation to the surface. To evaluate theaccuracy of our method first we obtain the planar referenceview of a surface. We then affix a small colored markerto the surface while in a planar state and map the marker’sdetected centroid into the planar reference frame. Next weproceed to warp the surface and occlude large portions ofit with the user’s hands. The marker’s centroid is detectedin each frame of the captured video and mapped into theplanar reference frame. We then record the difference be-tween the mapped marker centroid and the initial mappedlocation which would ideally be zero. In practice we havefound that in the presence of even drastic deformation thisdifference remains within a few pixels for a 640x480 im-age. In the presence of large occlusions we found similarresults as long as the area directly around the marker wasnot completely occluded.

Our proposed gestures are based on pointing, thus thisis precisely the situation that would arise in practice. Sincethe terminal end of a user’s extended finger would be thelocation of the marker there would not be much occlusionin that area. Thus even if the mesh is registered poorly inthe area occluded by the user’s hand the portion of the meshat the user’s fingertip will be precisely registered. We de-signed our gestures with this in mind and are able to cope

(a) (b)

(c) (d)

(e) (f)

(g) (h)

(i) (j)

Figure 7. Images from the quantitative experiment. (a) The planarreference mesh. (b) The mapped centroid of the marker in theplanar reference frame falls in the same grid cell for (c-j). (c, e, g,i) meshes warped using all the hand clicked feature points. (d, f, h,j) meshes warped using the five closest hand clicked feature pointsto the marker’s centroid in the warped view.

with imprecise registration unlike the system in [18] whichneeds global precision to perform graphical augmentation;this is demonstrated in Figure 7.

To quantitatively evaluate the accuracy of fingertip map-ping we hand selected feature points and then sorted them

78

Page 7: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

(a) (b)

(c) (d)

Figure 8. The hand clicked points depicted in Figure 7(c-j) were sorted by increasing distances from the marker centroid in their respectivewarped views. Non-rigid surface detection was run repeatedly removing the farthest feature at the end of each iteration. The detectedmarker centroid at each iteration was mapped into the planar reference frame and the difference was recorded. This experiment was thenrun again for different numbers of mesh elements. (a) The plot corresponding to Figure 7(c-d). (b) The plot corresponding to Figure 7(e-f).(c) The plot corresponding to Figure 7(g-h). (d) The plot corresponding to Figure 7(i-j).

by their distances from the marker’s detected centroid. Weran non-rigid surface detection repeatedly removing the far-thest feature from the centroid before the next iteration.We recorded the difference between the current mappingof the centroid and the reference mapping shown in Figure8. These results suggest that it is only important to have afew good feature correspondences near the marker. In factthe accuracy of the mapping can improve when the majorityof features are located near the marker; this is evident fromthe dips in the plots of Figure 8(a) and 8(c). This experi-ment was also run for varying numbers of triangular meshelements. When there are many feature points, having moremesh elements improves the accuracy of the mapping, asexpected. However, as the number of feature points be-comes small the accuracy of the mapping degrades morequickly when using more mesh elements.

6. Conclusions and Future WorkIn the future we plan to implement a real-time system

with automatic fingertip detection. Rather than using col-ored markers on the user’s fingertips we plan to use a tech-nique such as the one described in [11]. In a companionproject we are investigating methods to acquire the planarreference view automatically when a priori knowledge ofthe surface is not available. This would cover the caseof materials with no available digital copy such as legacybooks or restaurant menus.

The method we have presented can enable a new way ofinteracting with the information printed on paper. In addi-tion it has the potential to give the visually impaired inde-pendent access to this information.

79

Page 8: Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces · 2014-09-04 · Non-Rigid Surface Detection for Gestural Interaction with Applicable Surfaces Andrew

7. AcknowledgementsWe would like to thank Spencer Johnson for graciously

providing us with cameras in the early stages of proto-typing. This work was supported by ONR MURI Grant#N00014-08-1-0638 as well as the Amazon AWS in Edu-cation program.

References[1] A. A. Argyros and M. I. A. Lourakis. Vision-based inter-

pretation of hand gestures for remote control of a computermouse. In In Computer Vision in Human-Computer Interac-tion, pages 40–51. Springer-Verlag, 2006. 73, 76, 77

[2] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (SURF). Comput. Vis. Image Underst.,110:346–359, June 2008. 78

[3] N. D. Binh, E. Shuichi, and T. Ejima. Real-time hand track-ing and gesture recognition system. In Proceedings of In-ternational Conference on Graphics, Vision and Image Pro-cessing (GVIP-05, pages 362–368, 2005. 76

[4] M. Brown and W. Seales. Document restoration using 3dshape: a general deskewing algorithm for arbitrarily warpeddocuments. In Computer Vision, 2001. ICCV 2001. Proceed-ings. Eighth IEEE International Conference on, volume 2,pages 367 –374 vol.2, 2001. 76

[5] M. A. Fischler and R. C. Bolles. Random sample consensus:a paradigm for model fitting with applications to image anal-ysis and automated cartography. Commun. ACM, 24:381–395, June 1981. 75

[6] N. Gumerov, A. Z, R. Duraiswami, and L. S. Davis. Structureof applicable surfaces from single views. In ECCV04. 1,pages 482–496, 2004. 76

[7] S. Gustafson, D. Bierwirth, and P. Baudisch. Imaginary in-terfaces: spatial interaction with empty hands and withoutvisual feedback. In Proceedings of the 23nd annual ACMsymposium on User interface software and technology, UIST’10, pages 3–12, New York, NY, USA, 2010. ACM. 73

[8] A. Iketani, T. Sato, S. Ikeda, M. Kanbara, N. Nakajima, andN. Yokoya. Video mosaicing based on structure from motionfor distortion-free document digitization. In Proceedingsof the 8th Asian conference on Computer vision - VolumePart II, ACCV’07, pages 73–84, Berlin, Heidelberg, 2007.Springer-Verlag. 76

[9] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Activecontour models. International Journal of Computer Vision,1(4):321–331, 1988. 75

[10] T. Kurata, T. Okuma, M. Kourogi, and K. Sakaue. The handmouse: Gmm hand-color classification and mean shift track-ing. In Recognition, Analysis, and Tracking of Faces andGestures in Real-Time Systems, 2001. Proceedings. IEEEICCV Workshop on, pages 119 –124, 2001. 76

[11] T. Lee and T. Hollerer. Handy AR: Markerless inspection ofaugmented reality objects using fingertip tracking. In Pro-ceedings of the 2007 11th IEEE International Symposiumon Wearable Computers, pages 1–8, Washington, DC, USA,2007. IEEE Computer Society. 73, 76, 79

[12] J. Letessier and F. Berard. Visual tracking of bare fingers forinteractive surfaces. In Proceedings of the 17th annual ACMsymposium on User interface software and technology, UIST’04, pages 119–122, New York, NY, USA, 2004. ACM. 73,76

[13] J. Liang, D. DeMenthon, and D. Doermann. Flatteningcurved documents in images. In Computer Vision and Pat-tern Recognition, 2005. CVPR 2005. IEEE Computer SocietyConference on, volume 2, pages 338 – 345 vol. 2, june 2005.76

[14] P. Mistry, P. Maes, and L. Chang. WUW - wear Ur world:a wearable gestural interface. In Proceedings of the 27th in-ternational conference extended abstracts on Human factorsin computing systems, CHI EA ’09, pages 4111–4116, NewYork, NY, USA, 2009. ACM. 73

[15] K. Oka, Y. Sato, and H. Koike. Real-time fingertip trackingand gesture recognition. Computer Graphics and Applica-tions, IEEE, 22(6):64 – 71, nov/dec 2002. 73, 76

[16] M. Ozuysal, M. Calonder, V. Lepetit, and P. Fua. Fast key-point recognition using random ferns. In IEEE Transactionson Pattern Analysis and Machine Intelligence (PAMI, 2009.78

[17] J. Pilet, V. Lepetit, and P. Fua. Real-time nonrigid surface de-tection. In Computer Vision and Pattern Recognition, 2005.CVPR 2005. IEEE Computer Society Conference on, vol-ume 1, pages 822 – 828 vol. 1, june 2005. 74, 75, 78

[18] J. Pilet, V. Lepetit, and P. Fua. Fast non-rigid surface detec-tion, registration and realistic augmentation. Int. J. Comput.Vision, 76:109–122, February 2008. 74, 75, 78

[19] M. Pilu. Undoing page curl distortion using applicable sur-faces. In Computer Vision and Pattern Recognition Confer-ence, pages 67–72, 2001. 76

[20] M. Salzmann, F. Moreno-Noguer, V. Lepetit, and P. Fua.Closed-form solution to non-rigid 3D surface registration. InECCV (4)’08, pages 581–594, 2008. 76

[21] D. Schlegel, A. Chen, C. Xiong, J. Delmerico, and J. Corso.Airtouch: Interacting with computer systems at a distance. InApplications of Computer Vision (WACV), 2011 IEEE Work-shop on, pages 1 –8, jan. 2011. 73

[22] N. Takao, J. Shi, and S. Baker. Tele-graffiti: A camera-projector based remote sketching system with hand-baseduser interface and automatic session summarization. Inter-national Journal of Computer Vision, 53:115–133, 2003. 76

[23] T. B. Terriberry, L. M. French, and J. Helmsen. GPU accel-erating speeded-up robust features. In Proceedings of the 4thInternational Symposium on 3D Data Processing, Visualiza-tion and Transmission, 3DPVT ’08, pages 355–362, Atlanta,GA, USA, 2008. 78

[24] A. Varol, M. Salzmann, E. Tola, and P. Fua. Template-freemonocular reconstruction of deformable surfaces. In Com-puter Vision, 2009 IEEE 12th International Conference on,pages 1811 –1818, 29 2009-oct. 2 2009. 76

[25] Z. Zhang. Visual panel: Virtual mouse keyboard and 3Dcontroller with an ordinary piece of paper. In In Workshopon Perceptive User Interfaces, pages 1–8. ACM Press, 2001.73, 74, 76

80