Download - Thesis Annexing Reality

8/18/2019 Thesis Annexing Reality

1/37


2/37

Abstract

Annexing Reality: Enabling Opportunistic Use of Everyday Objects as Tangible

Proxies in Augmented Reality

Anuruddha Lakmal Hettiarachchi

Master of Science

Graduate Department of Computer Science

University of Toronto

2016

Advances in display and tracking technologies hold the promise of increasingly immer-

sive augmented-reality experiences. Unfortunately, the on-demand generation of haptic

experiences has lagged behind these advances in other feedback channels. This the-

sis presents Annexing Reality; a system that opportunistically acquires physical objects

from a users current setting to provide the best-available haptic sensation for virtual

content. It allows content creators to specify haptic experiences that adapt to the users

current physical environment. The system continuously scans and annexes physical ob-

jects that are physically similar to each virtual one; it then aligns the virtual model to

that object, and adjusts the model to reduce visual-haptic mismatch. The developers

experience with the Annexing Reality system and the techniques utilized in realizing it

are described. The results of a developer study that validates the usability and utility of

this method of defining haptic experiences is also presented.

ii


3/37

Acknowledgements

Even though only my name is visible on the cover of this document, there are dozens of

invisible names on it of whose collective efforts this is a result of.

First and foremost, I would like express my sincerest gratitude to my supervisor Prof.

Daniel Wigdor for guiding me throughout my masters. I am grateful for his enthusiasm,

motivation, patience, and support.

I would like to extend my gratitude to Prof. Karan Singh for reviewing this thesis and

providing valuable feedback. I am thankful to Prof. Khai Truong for being kind as to

voice over my video submission to CHI2016. Thank you to Katie Barker for critically

reviewing my paper submission to CHI2016 and keeping me on track during that period.

I am indebted to Dr. Bruno De Araujo for his advice and constant feedback on myresearch. I thank Dr. Ricardo Jota and Dr. Michelle Annette for their advice and

support throughout my masters program.

I was fortunate to have such a wonderful group of people in the lab; thank you to my

fellow students and all members of DGP lab. Special thanks to John Hancock, not for

his magical power of fixing things I would still have been stuck with a Vicon system

somewhere in the lab.

Last but not least, I thank my parents Kusum and Shantha, brother Pradeep, and my

girlfriend Akila for their love and support, and for always being there for me, in good

and bad times.

iii


4/37

Contents

1 Introduction 1

2 Related Work 4

2.1 Visuo-Haptic Mixed Reality . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Opportunistic Tangible Proxies . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Virtual-Real Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Augmented Reality Content Authoring . . . . . . . . . . . . . . . . . . . 6

3 Developer Experience 7

3.1 Annexing Reality Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Opportunistic Annexing of Physical Objects 10

4.1 Object Shape Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1.1 Object Cluster Extraction . . . . . . . . . . . . . . . . . . . . . . 11

4.1.2 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1.3 Shape Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.4 Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.1.5 Virtual-Real Object Similarity . . . . . . . . . . . . . . . . . . . . 12

4.1.6 Voting Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Object Tracking and Rendering 15

5.1 Initial Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 ICP Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.3 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Developer Study 19

6.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

iv


5/37

6.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.3 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6.5.1 Understanding of Haptic Models . . . . . . . . . . . . . . . . . . . 216.5.2 Tool Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7 Limitations 23

8 Conclusion and Future Directions 24

Bibliography 25

v


6/37


7/37


8/37

Chapter 1. Introduction 2

Figure 1.1: Left: a users table as seen un-aided by augmented reality. Right: The sameusers table, as seen through an AR headset, augmented with the Annexing Reality tool.Note that virtual content is placed and scaled in order to opportunistically take advantageof physical objects in order to provide haptic experiences for virtual ones.

affects the users feeling of immersion and engagement and have found that the greater

the degree of mismatch, the lower the believability of the experience. This confirms the

findings of Kwon et al. [34] on the effect of shape and size of physical props: object

manipulation is more efficient when physical and virtual objects are alike in shape and

size. Hence, selecting a physical prop that is as similar as possible to a virtual object is

crucial for the users suspension of disbelief.

This thesis presents Annexing Reality; a system that opportunistically finds physical

objects from the users surrounding that would best represent a given set of virtual objects,

and then tailors the virtual objects to match the shape and size of the annexed physicalones. In so doing, we are able to minimize the mismatch between the virtual object and

the physical prop to which it is mapped. A developer experience is described in which

the creator of a virtual object specifies its size and matches its shape to basic primitives,

similar to a collision model. Additionally, tolerances for deformation of the object are

specified, as are preferences for matching (e.g.: it is more important to match a cylinder

than it is to get the size right). The system then uses this information to optimally match

virtual and physical objects in the users environment and performs 3-DOF scaling on the

model to improve physical correspondence.

The task of matching virtual and physical objects is devised as a combinatorial optimiza-

tion in which the set of available physical objects and the given set of virtual objects

make the two partite sets of a complete bipartite graph. The Annexing Reality system

scans the users physical environment with a Kinect depth sensor [4], discovers available

physical objects, and derives their critical primitive shape and size information. The

system casts a vote for each virtual-physical pair based on the shape and size similarity


9/37

Chapter 1. Introduction 3

between the two, creating a complete weighted bipartite graph. The voting algorithm

allows for a three-level priority scheme where level-1 defines matching priorities for indi-

vidual virtual objects, level-2 defines matching priorities for different primitive parts of

each virtual object, and level-3 defines matching priorities for physical dimensions of each

primitive part. Once the graph is generated, the system finds the optimal assignmentthat maximizes the total vote using Hungarian algorithm [33]. Finally, virtual objects

are overlaid atop matched physical props in real-time allowing the user to physically

interact with virtual objects (figure 1.1). The system rescales the virtual objects to fit

the matched physical prop, further reducing the mismatch.

This thesis presents the following contributions:

1. The Annexing Reality method for parametrically matching and adjusting virtual

to physical objects

2. The developer UI and method for specifying preferences for attributes in matching

and scaling

3. The results of a developer study which demonstrate the effectiveness of the methods

and UI


10/37

Chapter 2

Related Work

Annexing Reality system builds upon four main areas of related research: visuo-haptic

mixed reality, opportunistic tangible proxies, the issue of mismatch between the virtual

and the real in augmented reality, and AR content authoring.

2.1 Visuo-Haptic Mixed Reality

Milgram et al. [37] defined Mixed Reality systems in which completely real environments

are connected to completely virtual environments. VHMR allows the user to both see

and touch virtual objects in a co-located manner such that the interaction becomes closer

to reality [50]. Several systems have explored this idea of merging visual and kinesthetic

perceptions in the virtual space, especially in medicine and surgery. In their early work,

Carlin et al. [17] used VHMR for the treatment of spider phobia where a subject phys-

ically interacted with a virtual spider. Kotranza et al. [32] studied interpersonal touch

and social engagement while interacting with a Mixed Reality Human: a mannequin

with a virtual human overlay. Volumetric object models, derived from a CT, were used

in conjunction with a PHANTOM 1.0 haptic feedback device [36] to simulate temporalbone surgery [10, 40]. Among a variety of other areas, VHMR has also been explored

in gaming and product design. Oda et al. [39] have developed a racing game in which

the virtual car is controlled with a passive tangible prop. In the two-player ping-pong

game developed by Knoerlein et al. [31], players can feel the impact of the virtual ball on

the virtual bat with a co-located PHANTOM haptic device. ClonAR [20] developed by

Csongei et al. allows product designers to physically edit 3D scans of real-world objects.

4


11/37

Chapter 2. Related Work 5

2.2 Opportunistic Tangible Proxies

Tangible proxies take advantage of humans ability to grasp and manipulate physical ob-

jects as a means of interacting with digital information [29]. Instead of tightly coupling

digital functionality to physical objects, OTPs adapt to the users environment and lever-

age otherwise unused objects as tangible props. In Opportunistic Controls [24], Hender-

son et al. suggests using physical affordances already present in the domain environment

to provide haptic feedback on user inputs in augmented reality. Smarter Objects [25]

associates a graphical interface with a physical object in a co-located manner allowing a

user to provide tangible inputs while visualizing their effects. Cheng et al.[18], Corsten

et al. [19], and Funk et al. [22] propose repurposing everyday objects as tangible input

devices. Cheng et al. place fiducial markers on everyday objects to convert them to

input controls while Corsten et al. and Funk at al. use a Kinect depth sensor for objectrecognition and tracking. All of these require the user to create physical-digital pairs

by programming physical objects. Alternatively, this work focuses on letting the system

create physical-virtual pairs based on a developers specifications.

2.3 Virtual-Real Mismatch

Since it is nearly impossible to find an exact physical replica of each virtual object withinones surroundings, there exists an inherent mismatch between characteristics of virtual

objects and their physical props. In their recent work, Simeone et al. [47] have found that

some amount of mismatch is acceptable especially where interaction between the user and

the object is less likely to happen. However, increased mismatch negatively affects the

users believability of the virtual experience. From a similar study, Kwon et al. [34] have

concluded that lower disparity results in greater realism and easier manipulation of virtual

objects. Annexing Reality system intends to minimize the mismatch by dynamically

finding a physical object from the users surrounding that is as similar as possible to a

given virtual object, and scale virtual objects to match the physical ones.

An interesting observation from the Pseudo-haptics literature is that, when visual and

kinesthetic stimuli are fused together, visual stimuli dominate over kinesthetic stimuli

resulting in illusionary haptic sensations. In BurnAR [51], subjects have reported invol-

untary heat sensation in their hands when virtual flames and smoke were overlaid onto

their hands. In their series of visuo-haptic experiments, Ban et al. modified the curvature


12/37

Chapter 2. Related Work 6

[13], angle of edges [12], and position of edges [14] of virtual shapes, without changing the

physical prop which those virtual shapes were overlaid onto. For each virtual shape, the

participants reported that they felt they were touching a physical shape similar to the

virtual one being displayed. These observations indicate that it is the critical shape that

matters, thus plausible to use geometric primitives to represent more complex shapes.

2.4 Augmented Reality Content Authoring

AR content authoring tools provide software frameworks on which developers can quickly

create augmented reality applications without worrying about the functionality of under-

lying components. Such early systems [1, 2] supported marker based virtual-real pairing

where virtual objects were tied to unique image patterns known as fiducials markers.

More recent commercial toolkits such as Vuforia [8] and metaio [6] enable use of physical

objects as fiducials. In these systems however, the developer hard-pairs virtual objects

to physical ones at the time of development, requiring the user to have the exact fiducials

specified by the developer in order to use the applications. Taking a different approach,

the Annexing Reality system scans and dynamically matches available physical objects

from the users physical environment to virtual ones.


13/37

Chapter 3

Developer Experience

The process begins with a developer creating a virtual scene, such as in an augmented

reality game. An early goal of this project was developer transparency; that is, enabling

matching and adjusting of models to physical objects without any input from developers.

It soon became clear, however, that while this would provide a lower workload for the

target users, it would also serve to disempower them in defining their intended experi-

ences, as the Annexing Reality tool would, in effect, by making design decisions on the

fly. Thus, it was essential to define a developer experience that is simultaneously pow-

erful, easily grasped, and requires little work. To this end, three guiding principles were

defined to guide the design of a tool to provide that experience:

1. Do not impose any limitations on the properties of the models of the virtual objects

that the user will see.

2. Provide a separation between the visual and physical properties to allow developers

to focus on each individually.

3. Provide flexibility to enable the developer to prioritize any aspect of the physical

properties of the objects to be matched.

3.1 Annexing Reality Tool

The developer creates their virtual content using their preferred tool. Once it is complete,

it is imported into Annexing Reality software, where they specify which virtual objects

should be matched to physical ones in the users setting and which should not. In this

7


14/37

Chapter 3. Developer Experience 8

Figure 3.1: The developer UI for Annexing Reality. (a) The primitives of the hapticmodel, shown in wire frame, for a Champaign bottle object. (b) The developer sets minand max sizes for scaling each primitive (c) Ranking UI to prioritize matching.

manner, the developer controls which virtual objects are physically manipulated by the

player. Other virtual objects may react to the behavior of player manipulated objects

(e.g., the ball in a ping-pong game where the paddle is physically controlled by the player)

or be solely controlled by the developer at the time of development (e.g. imagery in the

background of the world).

For each user-manipulable virtual object, the developer generates a haptic model (e.g.

figure 3.1a). This begins with matching basic geometric primitives, which will be matched

to the physical object(s) from the users surrounding. The shape, size, and position of

the model with respect to the virtual object reflect the haptic experience intended by

the developer. As an example, consider the virtual model of a Champaign bottle, shown

in figure 3.1a. If the developer intends for the player to grab the bottle by the body, she

would remove the conic part of the haptic model, and the bottle would be matched to

cylindrical physical objects at its body. If her experience would be successful no matter

which part which part of the bottle can be grabbed, she can leave the cylinder and cone

in the model, and one or the other will be matched, with priority based on the weighting

she will later define.

The reduction of the virtual model to a set of primitive shapes is what gives Annexing


15/37

Chapter 3. Developer Experience 9

Reality its power: the developer is provided with the opportunity to specify the haptic

experience they wish to provide, at a level of abstraction that allows, at development

time, the tweaking of how perfect a match must be in order to perform the annexation,

or leave the object as virtual-only.

The final step is setting the parameters for the matching algorithm (figure 3.1c). This

requires the developer to order their virtual objects by priority (for situations where there

is an insufficient number of physical objects for matching, or where one object equally

matches more than one virtual object). Next, the developer prioritizes the primitives

within their haptic model (for situations where more than one of those primitives might

be matched. As an example, with the Champaign bottle, the developer might prefer

to match the body to a cylinder, and as a second choice to match the neck to a conic

physical object). Finally, physical dimensions of those primitives are given preferences.

Continuing to use the Champaign bottle as an example, the developer might find itmore important that the matched cylindrical object have a small enough radius to enable

grasping, and less important that its height precisely match that of a bottle, or they might

prefer that a match only be made if both radius and height are within a tight tolerance.

figure 3.1b shows how to set maximum and minimum values for each dimension, and

figure 3.1c shows how to set priorities within the allowable range.

Once the developer is finished creating haptic models, the system converts and saves this

representation into a textual form which is understandable to the Object Finder module,

described in detail below, which is responsible for doing runtime matching of virtual andphysical objects, parameterized by the properties set by the developer.


16/37

Chapter 4

Opportunistic Annexing of Physical

Objects

The haptic model forms the basis for matching done at runtime. When a virtual ob-

ject is added to a scene, the Annexing Reality system finds suitable physical objects as

props from the surrounding, and overlays virtual objects onto. The complete pipeline is

illustrated in figure 4.1. The Object Finder module is responsible for finding physical

objects that are as similar as possible to virtual objects specified by the developer. It

scans the environment with the Kinect depth sensor, detects available physical objects,

derives their salient primitive shape and size, and matches virtual objects to them based

on the input from the Haptic Model Generator.

Figure 4.1: A dataflow diagram for the Annexing Reality system. Input to the systemcomes from the Haptic Model Generator tool (generated by the developer), and from theFrame Grabber module which reads from the Kinect Sensor.

10


17/37

Chapter 4. Opportunistic Annexing of Physical Objects 11

4.1 Object Shape Recognition

The main function of Object Shape Recognizer is to identify physical objects present in

the Kinects field of view. It continuously scans the players physical environment, detects

horizontal planer surfaces (e.g. tables, floor), extracts objects lying on them, and then

estimates each objects primitive shape, size, and relative orientation.

4.1.1 Object Cluster Extraction

The Annexing Reality system only matches physical objects that are supported by hor-

izontal planer surfaces (see Limitations). As the first step, it detects and segments the

dominant plane in the point cloud generated from Kinects depth map. It then computes

the 2D convex hull of the segmented plane. Next, a polygonal prism is created with

the 2D convex hull as the base, which encapsulates the physical objects lying on the

plane. Finally, it separates point clusters belonging to different objects using Euclidean

Cluster Extraction method [42]. As a means of improving the accuracy of future shape

detection phase, the Annexing Reality system reconstructs object models by smoothing

and resampling point clouds using a Moving Least Squares method [11] (figure 5.1a).

4.1.2 Collision Detection

Object Finder may not be able to assign a physical prop to each virtual object due to

two reasons: 1) There may be fewer number of physical objects available in the players

setting than the number of virtual objects. In this case, virtual objects with lower

matching priority may not get physical props. 2) Physical objects of certain shapes may

not be found in the environment. Therefore, the Object Finder continuously matches

physical objects to virtual ones. As a result, when a new physical object appears in the

Kinects field of view (for instance, the player brought a new object into the scene, or

moved to look at another part of the scene which was out of the Kinects field of view

before), the entire object assignment changes, resulting in some virtual objects switching

between physical objects. This is avoided by excluding physical objects that are already

assigned to a virtual object from the matching phase by detecting collisions.

For each object point cluster extracted, the axis aligned bounding box is computed.

Object Finder then iterates through the Pose Trackers assigned to each virtual object (see


18/37


Pose Tracking and Rendering) to check if any of them are currently tracking a physical

object whose centroid is inside the bounding box. If so, the point cluster belongs to a

physical object that is already being assigned to a virtual object, hence that cluster is

excluded from the matching pipeline.

4.1.3 Shape Recognition

A RANSAC based shape detection method developed by Schnabel et al. [46] is used to

recognize the dominant primitive shape (sphere, cylinder, cone, and torus) of each point

cluster. Due to noisy-partial point clouds received from the depth sensor, erroneous shape

detections frequently occur. The results are verified by comparing physical dimensions

of the shapes to preset size limits; these limits were empirically tuned to generate the

most accurate results.

4.1.4 Voting

Once the salient primitive shape and the size of all point clusters are determined, each

virtual object with no physical prop previously allotted votes for point clusters based

on their shape and size similarity. The purpose of voting is to determine the optimal

assignment of physical objects in which all virtual objects get reasonably similar physical

props. The system applies the priority scheme set by the developer to weight the votes.

4.1.5 Virtual-Real Object Similarity

Each virtual object is compared with each physical object detected. If the physical object

does not contain the primitive(s) of the virtual objects haptic model, those objects are

considered dissimilar and no further processing is done on the pair. If one or more

primitives is found within the physical object, the scale of each primitive is compared to

its size in the haptic model. A vote reduction method is used to determine the vote. Thus,instead of calculating the size similarity, the Voter module calculates the opposite; size

mismatch. For a given dimension h (e.g. height, radius, and angle), the size mismatch

∆H between a virtual object v and a physical object p is calculated as;

∆H = min(hv, h p)

max(hv, h p) (4.1)


19/37


The error ratio is taken instead of the difference to eliminate the influence of unit of

measurement on the size mismatch value.

4.1.6 Voting Equation

For a given virtual-physical object pair, the Voter iterates through all the primitive shapes

of the virtual object defined by the developer in the order of priority to determine the

shared primitive shape with the highest priority value. If the search fails (there are no

shared primitives), it assigns 0 as the vote given by the virtual object to the physical

object. If a shared primitive is found, it starts the vote reduction process by first assigning

the highest possible vote (a constant value C ) and then multiplying that value by virtual

object priority wo and primitive shape priority ws; each being a value between 0 and

1. To obtain the final vote X vp the Voter multiplies the already reduced vote by the

sum of weighted dimensional mismatch values (D denotes the set of all dimensions for a

given shape while wh denotes priority given for the dimension h). The complete voting

equation (vote cast by virtual object v on physical object p) can be expressed as;

xvp = C × wo × ws ×h∈D

min(hv, h p)

max(hv, h p) × wh

(4.2)

4.2 Matching

Once the voting is complete for each virtual-physical pair, the final step is to make the

final match between physical and virtual objects. The goal is to maximize satisfaction

with the match, based on the score provided by voting. As can be seen in figure 4.23,

some virtual objects, such as v1, have some degree of satisfaction from more than one

match (55% satisfaction with physical object p1, and 34% satisfaction with p2). Other

objects, such as v2, require a particular physical object in order to be satisfied; in this

case, 65% satisfaction with p3. In many instances, virtual objects may not get their first

choice; in the scene described in figure 4.2, v1 will be assigned to p2, so that v3 can be

assigned to p1, thus maximizing the overall match score.

This matching task is formulated as a combinatorial optimization problem. The voting

results in a weighted bipartite graph G = (V, P ; X ) with the set of given virtual objects

V and the set of candidate physical objects P as two partitions of G and each edge


20/37


Figure 4.2: A weighted bipartite graph representing votes casted by virtual objects (V)on physical objects (P). A dummy node (d1) is added to make the graph symmetric.

in X having a nonnegative weight (vote given by virtual object on the physical object

connected by the edge). Dummy vertices (dummy virtual objects if n(V ) < n(P ) or

dummy physical objects otherwise) and dummy edges with weight 0 are added to convert

G to a symmetric weighted bipartite graph (figure 4.2).

A matching M is a subset of edge set X such that each vertex is incident upon at most

one edge in M . The task here is to find a perfect matching M ∗ (a matching in which

every vertex in G is incident upon one and only one edge) such that the total weight

of M ∗ is maximized. An open source C++ implementation [3] of Hungarian algorithm

(KuhnMunkres method) [33] is used for solving this assignment problem. Once matching

is complete, all point clouds are saved in order to be used as source models for object

tracking.


21/37

Chapter 5

Object Tracking and Rendering

Once matched, the Annexing Reality system overlays virtual objects onto physical props

in real-time allowing the user to interact with them. The Pose Tracker is responsible

for tracking matched physical objects in Kinects field of view while Renderer renders

virtual objects in place so that the user sees them atop physical props through the

Head Mounted Display (HMD). Each virtual object has a concurrent object tracking

pipeline with the following three threads: InitialAligner, TargetVolumeSegmenter, and

IcpTracker. InitialAligner estimates the poses of physical objects in Kinects field of view

(see figure 5.1b). TargetVolumeSegmenter segments the cubic volume of the point cloud

that contains the physical object as estimated by the IntialAligner (see figure 5.1c).IcpTracker takes a rough pose estimation as input and fine-tunes it by incrementally

aligning the source model (saved point cloud of the matched object being tracked) to the

object in the cubic volume (see figure 5.1d).

5.1 Initial Alignment

Given a source model, the InitialAligner searches the corresponding physical object inthe Kinect's field of view and roughly estimates its pose. This rough pose works as

the initial guess of the IcpTracker. As the first step of initial alignment, key-points of

both source model and scene point cloud are obtained by simply down-sampling each

uniformly. Fast Point Feature Histograms (FPFH) [43] descriptors are then calculated

at key-points of both point clouds. FPFH captures local geometric features of a query

point by computing the relationship between points themselves and normal vectors of the

15


22/37

Chapter 5. Object Tracking and Rendering 16

Figure 5.1: Object pose tracking pipeline, after the Object Shape Recognizer has iden-tified an object for tracking: (a) reconstructed point cloud model of an object to betracked (b) InitialAligner performs a rough pose estimation to locate the object in thefield of view (red). (c) A cubic volume of the scenes point cloud (yellow) is segmentedsuch that it encapsulates the object (d) IcpTracker computes the accurate pose (green)by incrementally aligning the reconstructed model to the point cloud inside the cubicvolume.

query point and its neighbors. All points within the radius of 2.5cm from a key-point are

used as its neighbors. This radius value is empirically determined to capture a sufficient

neighborhood while remaining local.

A RANSAC based pose estimation algorithm [16] is used for estimating rough trans-

formation between the source model and the aligned model with previously computed

FPFH descriptors. This method accelerates the computation by eliminating poses that

are likely to be wrong. These bad poses are detected by comparing the ratio between

edge lengths of virtual polygons formed on the source and the target (to be aligned) point

cloud models. A similarity threshold of 90% is used for eliminating bad poses.

For each candidate pose, the algorithm tries to maximize the number of inliers. If the

inlier fraction (ration of inliers detected to total number of points in the source pointcloud) is greater than a preset threshold, that pose is marked as good. An empirically

determined inlier threshold of 70% is used, as found to be the optimal percentage for

including a sufficient number of inliers while accounting for noise, clutter, and occlusions.

The Euclidean error score (sum of squared distance error) between the source and the

aligned point clouds is calculated; if the error is less than a preset threshold (0.0003),

that pose is considered as a good initial guess.


23/37


5.2 ICP Tracking

IcpTracker refines the initial registration by incrementally aligning the source point cloud

to the target in order to obtain an accurate pose estimation. Iterative Closest Point

(ICP) algorithm [15] is used for the incremental alignment. For a given frame, the sourcepoint cloud to the IcpTracker is its own output from the previous frame except for two

instances: 1) at the beginning of the pose tracking where IcpTracker has no previous

output, 2) when the tracker has lost tracking. In these two cases, the IcpTracker fetches

a source point cloud from the the InitialAligner.

As a preliminary step, a cubic volume from the scene point is segmented such that it

encapsulates the point cloud of the target object (see figure 5.1c). The TargetVolume-

Segmenter takes as input, the source model dimensions from the Object Finder and the

position of the object being tracked from IcpTracker (or from InitialAligner in the twoinstances described above). It then extracts a cubic volume of the 120% of the size of

the target object.

The IcpTracker then incrementally aligns the source point cloud to the target inside the

cubic volume in up to maximum of 20 iterations. The alignment finishes if either of the

following two conditions is met: 1) the maximum number of iterations is reached, 2) the

expected alignment fitness is achieved (a Euclidean error score less than 0.0001). If it fails

to find a satisfactory alignment for the current frame, it rolls back to its previous results

and reattempts on the next frame. If the IcpTracker continually fails for 5 consecutiveframes, it restarts tracking with a new initial pose received from InitialAligner.

5.3 Rendering

Renderer retrieves poses of physical objects from the IcpTracker, derives poses of their

primitives, removes jitter, scales and positions virtual objects in the scene, and renders

a stereogram of the virtual scene on the HMD.

The poses of physical props received the from Pose Tracker are relative to the Kinect.

However, the renderer renders corresponding virtual objects in HMDs viewpoint. Thus,

all poses are transformed into HMD's coordinate frames by precisely creating a model of

the Kinect+HMD setup including accurate field of view and aspect ratio of the display.

The virtual objects are then scaled and positioned such that parts corresponding to


24/37


matched primitive shapes of virtual and physical objects are correctly aligned (figure 1.1,

figure 6.1). This is accomplished for each virtual object by first creating a dummy object

with the pose as the virtual primitive shape (collision model), then by making that object

the parent of the virtual object, and finally by transforming that parent object such that

primitive shaped parts of virtual and physical objects overlap.


25/37

Chapter 6

Developer Study

A study was conducted to evaluate the usability and utility of the Annexing Reality

system for developers. Goals were to twofold. The first was to understand the utility of

the annexing reality technique: is the set of information required of the developer to pro-

duce each haptic model—primitive shape matching, scaling, and preference information—

useful in producing matches for objects which developers expect? The second goal was

to study the usability of the tool for building those models: would developers be able to

successfully understand and use the UI, a set of sliders and text boxes, along with a 3D

modeling interface (see figure 3.1), to define their desired haptic model?

Specifically, following are the research questions that were to be answered:

1. Are developers able to express their design intent as a Haptic Model: a set of

primitive shapes, priorities, and size parameters?

2. How usable is the Annexing Reality software tool for building that the haptic

model?

6.1 Participants

Eight professional game developers and 3D modelers were recruited, all male, age 27-48

years, to act as expert assessors of our tool. All of them had both game development

and 3D modeling experience. All had experience with at least one augmented reality

application while three of them had developed augmented reality applications. Each

participant received $50 per one hour session.

19


26/37

Chapter 6. Developer Study 20

6.2 Apparatus

Seven royalty-free 3D models found online were used as virtual objects, which were to

be matched to one or more of eight physical props (see figure 6.1). The participants

used a Windows PC with a 24 inch monitor. At several points during the study, they

wore a helmet-mounted Kinect with a Meta 1 head-mounted display [5] while testing and

interacting with physical props (see figure 1.1).

6.3 Task

The task was performed 4 times per participant twice as with simple virtual models

which matched a single primitive (e.g. a cylinder for a lightsabers handle), and thentwice with more complex models which contained multiple primitive shapes (e.g. a lamp

with a spherical body and cylindrical shade; see figure 6.1). Each task was divided into

two parts. First, the participant was shown one of the virtual models, and asked to

imagine the sorts of physical objects which they might want to match the object to.

They were then asked to use the Annexing Reality tool to construct the haptic model,

including primitive shape, prioritizations, and size characteristics to enable matching.

Once this was complete, participants were given the set of physical props, and asked to

sort them into two groups: those which they expected to match their representation, and

those they did not. They were then asked to sort the props in the will match group to

their expected prioritization. Once each task was complete, participants were given the

opportunity to don the HMD to observe the effects of their matches.

6.4 Procedure

Assessors completed a demographic questionnaire prior to the study. They were given

the background to the project and a detailed explanation of the task, followed by a

demonstration. Then, the participants completed the task four times: twice for basic,

single-primitive physical objects, and then twice for more complex physical objects.

During each trial, the time required to complete the haptic model was measured. The

frequency and type of errors participants made were also recorded. After finishing all

trials, the participants completed a post-study questionnaire. They rated their impres-


27/37


sion on learnability, efficiency, and satisfaction of the tool using a 5-point Likert scale.

In addition, participants gave their expert opinion on utilizing geometric primitives to

represent complex shapes and using a priority scheme to control matching.

6.5 Results

Data for 32 trials was collected. After the short demo, participants were quickly able

to learn the tool and generate haptic models for the virtual objects given. On average,

they took 2 minutes and 5 seconds to complete a single primitive haptic model and 3

minutes 18 seconds to complete a two primitive model. Participants made some errors;

on average 0.53 per trial. Some common errors were, accidentally modifying other haptic

models instead of the intended one, forgetting to set certain priority values and minimum-

maximum scaling limits, and placing primitives in incorrect orientations.

Figure 6.1: Top: the visual models used in the developer studies, overlaid with primitivesfrom possible haptic models. Bottom: various visual models matched to physical objectsbased on the haptic models match for physical characteristics.

6.5.1 Understanding of Haptic Models

Participants were successful in using haptic models to define which physical objects should

be matched to each virtual model. In all 32 trials, they successfully matched their group

of hand-picked physical objects to those selected by the system. Within this hand-picked


28/37


group, participants were asked to rank which object would be selected first, second, third,

etc. Interestingly, in only 19/32 did participants successfully identify which object would

be chosen first. Of the remaining trials, their second ranked object was chosen 10 times,

and third ranked object 3 times. In all of these 13 trials, a multi-primitive haptic model

included nearly equal matching priorities for both primitive shapes. This suggests thata forced-rank system might provide a better UI to aid developers in understanding the

matches which will be made with their haptic models.

Participants accepted the use of simple geometric primitives to represent more complex

visual models (median score of 4.5). They also expressed the necessity of using a priority

scheme to manage matching (median score of 4). However, participants mentioned that

it was not very intuitive to precisely predict the outcome of weighting without looking at

internal vote calculation steps (median score of 3), although they were able to successfully

guess the system matched physical objects. At the same time, participants were confidentthat their ability to predict would improve over time (median score of 4).

6.5.2 Tool Usability

Participants to assessed the Annexing Reality graphical tool on the following: the efforts

needed to learn how to create haptic models using different functionalities of the tool;

the ability to perform a task using the tool without leading to erroneous behaviours; and

their satisfaction with the overall performance of the tool. Participants strongly agreedthat the software was easy to learn (median score of 5 on 5-point Likert scale). Many

believed that the tool worked as intended without leading to errors (median score of

4) although those developers who had no previous experience with 3D modeling found

that aspect of the system challenging. In general, participants were pleased with the

performance (median score of 4).

To summarize the key findings: the Annexing Reality graphical tool was rated by the

expert assessors as easy to learn and use. Assessors could generate haptic models quickly,

and only made a very few errors. All assessors were satisfied about the overall perfor-mance of the tool. Assessors were able to generate haptic models that matched a given

virtual object to a given set of physical objects. However, further research is required to

improve the predictability of the weighting scheme.


29/37

Chapter 7

Limitations

A primary limitation of the Annexing Reality graphical tool that was made clear by the

developer study is the difficulty to precisely predict a physical prop based on the priority

scheme in the haptic model. Although the goal of the system is to define a class of

physical objects for annexation, which developers were able to do without error, their

high rate of error in ordering of within that class objects to be matched suggests either

an incomplete understanding of the haptic models or of how they are used to perform

matching. In particular, the priority scheme involves multilevel weighting in a continuous

scale (0-1), and this prioritization would seem to not have been understood.

In terms of system performance, object detection, shape recognition, and motion track-

ing limit the current implementation of the Annexing Reality system. The system can

only detect physical objects supported by horizontal planer surfaces as it uses a plane

segmentation based method for object cluster extraction. At the same time, due to erro-

neous depth information generated by the Kinect depth sensor, the system cannot detect

transparent or glossy objects. The ObjectShapeRecognizer module limits itself to recog-

nizing only the most salient primitive shape of an object due to the fact that inferring

less salient shape information from imperfect, partial point clouds is error prone. Also,

objects smaller than 7cm ×

5cm ×

5cm

do not generate a sufficient number of 3D pointsin the Kinect data to accurately determine their shapes, hence are not detectable with

the current implementation. As the user wears the Kinect depth sensor on a helmet,

quick head movements result in erroneous tracking because of high relative motions of

objects with respect to the Kinect. Though there is no contribution to computer vision

here, it is expected that this work will continue to motivate improvements.

23


30/37

Chapter 8

Conclusion and Future Directions

This thesis has presented Annexing Reality; opportunistic annexation of physical objects

in the users surround to temporarily provide best available haptic sensation for virtual

objects in augmented reality. It allows users to freely move between environments while

experiencing tangible augmented reality, without carrying special equipment or objects

with them. With this system, developers can define, a priori, adaptive haptic experiences

for virtual scenes. Based on this specification, the system dynamically finds physical

objects from the users surrounding that are physically similar to virtual ones, and uses

them to physically represent them. A developer study revealed that content creators

accepted the Annexing Reality system as a useful tool for designing augmented realityapplications in which virtual objects are opportunistically paired to and controlled by

everyday physical objects. Numerous ways the system can be extended to further improve

its usability and utility are identified.

Developers'understanding of the priority scheme can be improved by providing tools

for simulating voting and matching with different combinations of priority values. In

particular, providing a database of physical objects commonly found by sensors using this

system would allow for useful simulation during development. In addition, simplifying

the prioritization mechanism by replacing a continuous scale with a forced-choice ranking

would increase the predictability of matches.

Although the set of currently implemented primitives allowed thorough exploration of the

concept of Annexing Reality, adding more geometric primitives such as planes, cubes, and

pyramids will be essential in providing more power to the developers for defining haptic

experiences. Additional computer vision techniques would allow the detection of features

24


31/37

Chapter 8. Conclusion and Future Directions 25

such as texture, and applying object recognition technologies would help understand the

weight and additional meta data about objects seen by the system. The application of

improved sensing and computer vision will also allow detection of more than one primitive

shape in a given physical object, allowing the system to match more complex physical

objects.

For an enhanced user experience, accurate, low latency tracking is crucial. As such, uti-

lizing RGB texture information may help tracking rotations of geometrically symmetric

objects. Moreover, being able to detect and track transparent, glossy, and deformable

objects will expand the variety of objects that can be used with the system.

While haptic feedback technologies continue to improve, there is a clear opportunity for

an immediate leap forward through annexation of physical experiences. This thesis may

serve as a guidepost to begin to overcome significant challenges of authoring experiences

given a lack of information about the physical objects users will actually be interacting

with.


32/37

Bibliography

[1] ATOMIC Authoring Tool.

[2] buildAR.

[3] dlib C++ Library - Optimization.

[4] Kinect for Xbox One.

[5] Meta Augmented Reality.

[6] metaio.

[7] Microsoft HoloLens.

[8] Vuforia Augmented Reality for 3D Mobile Content.

[9] Merwan Achibet, Maud Marchal, Ferran Argelaguet, and Anatole Lecuyer. The Vir-

tual Mitten: A novel interaction paradigm for visuo-haptic manipulation of objects

using grip force. In 2014 IEEE Symposium on 3D User Interfaces (3DUI), pages

59–66. IEEE, March 2014.

[10] Marco Agus, Andrea Giachetti, Enrico Gobbetti, Gianluigi Zanetti, and Antonio

Zorcolo. A multiprocessor decoupled system for the simulation of temporal bone

surgery. Computing and Visualization in Science , 5(1):35–43, January 2014.

[11] Marc Alexa, Johannes Behr, Daniel Cohen-Or, Shachar Fleishman, David Levin, and

Claudio T. Silva. Computing and rendering point set surfaces. IEEE Transactions

on Visualization and Computer Graphics , 9(1):3–15, January 2003.

[12] Yuki Ban, Takashi Kajinami, Takuji Narumi, Tomohiro Tanikawa, and Michitaka

Hirose. Modifying an identified angle of edged shapes using pseudo-haptic effects.

Haptics: Perception, Devices, Mobility, and Communication , pages 25—-36, 2012.

26


33/37


34/37

Bibliography 28

[22] Markus Funk, Oliver Korn, and Albrecht Schmidt. An augmented workplace for

enabling user-defined tangibles. In Proceedings of the extended abstracts of the 32nd

annual ACM conference on Human factors in computing systems - CHI EA ’14 ,

pages 1285–1290, New York, New York, USA, April 2014. ACM Press.

[23] Sidhant Gupta, Dan Morris, Shwetak N. Patel, and Desney Tan. AirWave: non-

contact haptic feedback using air vortex rings. In Proceedings of the 2013 ACM

international joint conference on Pervasive and ubiquitous computing - UbiComp

’13 , pages 419–428, New York, New York, USA, September 2013. ACM Press.

[24] Steven J. Henderson and Steven Feiner. Opportunistic controls: leveraging natural

affordances as tangible user interfaces for augmented reality. In Proceedings of the

2008 ACM symposium on Virtual reality software and technology - VRST ’08 , pages

211–218, New York, New York, USA, October 2008. ACM Press.

[25] Valentin Heun, Shunichi Kasahara, and Pattie Maes. Smarter objects: using AR

technology to program physical objects and their interactions. In CHI ’13 Extended

Abstracts on Human Factors in Computing Systems on - CHI EA ’13 , pages 961–

966, New York, New York, USA, April 2013. ACM Press.

[26] Ken Hinckley, Randy Pausch, John C. Goble, and Neal F. Kassell. Passive real-

world interface props for neurosurgical visualization. In Proceedings of the SIGCHI

conference on Human factors in computing systems celebrating interdependence -CHI ’94, pages 452–458, New York, New York, USA, April 1994. ACM Press.

[27] Koichi Hirota and Michitaka Hirose. Providing force feedback in virtual environ-

ments. IEEE Computer Graphics and Applications , 15(5):22–30, 1995.

[28] Hunter G. Hoffman. Physically touching virtual objects using tactile augmentation

enhances the realism of virtual environments. In Proceedings. IEEE 1998 Virtual

Reality Annual International Symposium (Cat. No.98CB36180), pages 59–63. IEEE

Comput. Soc, 1998.

[29] Hiroshi Ishii and Brygg Ullmer. Tangible bits. In Proceedings of the SIGCHI confer-

ence on Human factors in computing systems - CHI ’97 , pages 234–241, New York,

New York, USA, March 1997. ACM Press.

[30] Hiroo Iwata, Hiroaki Yano, Fumitaka Nakaizumi, and Ryo Kawamura. Project

FEELEX: adding haptic surface to graphics. In Proceedings of the 28th annual


35/37

Bibliography 29

conference on Computer graphics and interactive techniques - SIGGRAPH ’01, pages

469–476, New York, New York, USA, August 2001. ACM Press.

[31] Benjamin Knoerlein, Gábor Székely, and Matthias Harders. Visuo-haptic collabora-

tive augmented reality ping-pong. In Proceedings of the international conference on

Advances in computer entertainment technology - ACE ’07 , pages 91–94, New York,

New York, USA, June 2007. ACM Press.

[32] Aaron Kotranza and Benjamin Lok. Virtual Human + Tangible Interface = Mixed

Reality Human An Initial Exploration with a Virtual Breast Exam Patient. In 2008

IEEE Virtual Reality Conference , pages 99–106. IEEE, 2008.

[33] Harold W. Kuhn. The Hungarian method for the assignment problem. Naval Re-

search Logistics Quarterly , 2(1-2):83–97, March 1955.

[34] Eun Kwon, Gerard J. Kim, and Sangyoon Lee. Effects of sizes and shapes of props

in tangible augmented reality. In 2009 8th IEEE International Symposium on Mixed

and Augmented Reality , pages 201–202. IEEE, October 2009.

[35] Daniel Leithinger, Sean Follmer, Alex Olwal, Samuel Luescher, Akimitsu Hogge,

Jinha Lee, and Hiroshi Ishii. Sublimate: state-changing virtual and physical ren-

dering to augment interaction with shape displays. In Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems - CHI ’13 , pages 1441–1450,

New York, New York, USA, April 2013. ACM Press.

[36] Thomas H. Massie and Kenneth J. Salisbury. The phantom haptic interface: A de-

vice for probing virtual objects. In Proceedings of the ASME winter annual meeting,

symposium on haptic interfaces for virtual environment and teleoperator systems ,

volume 55, pages 295–300. Chicago, IL, 1994.

[37] Paul Milgram and Fumio Kishino. A Taxonomy of Mixed Reality Visual Displays,

December 1994.

[38] Yoichi Ochiai, Kota Kumagai, Takayuki Hoshi, Jun Rekimoto, Satoshi Hasegawa,

and Yoshio Hayasaki. Fairy lights in femtoseconds. In ACM SIGGRAPH 2015

Posters on - SIGGRAPH ’15 , pages 1–1, New York, New York, USA, July 2015.

ACM Press.

[39] Ohan Oda, Levi J. Lister, Sean White, and Steven Feiner. Developing an augmented

reality racing game. page 2, January 2008.


36/37

Bibliography 30

[40] Bernhard Pflesser, Andreas Petersik, Ulf Tiede, Karl Heinz Höhne, and Rudolf

Leuwer. Volume cutting for virtual petrous bone surgery. Computer aided surgery :

official journal of the International Society for Computer Aided Surgery , 7(2):74–83,

January 2002.

[41] Jun Rekimoto. Traxion: a tactile interaction device with virtual force sensation.

In ACM SIGGRAPH 2014 Emerging Technologies on - SIGGRAPH ’14, pages 1–1,

New York, New York, USA, July 2014. ACM Press.

[42] Radu B. Rusu. Semantic 3D Object Maps for Everyday Manipulation in Human

Living Environments. KI - K¨ unstliche Intelligenz , 24(4):345–348, August 2010.

[43] Radu B. Rusu, Nico Blodow, and Michael Beetz. Fast Point Feature Histograms

(FPFH) for 3D registration. In 2009 IEEE International Conference on Robotics

and Automation , pages 3212–3217. IEEE, May 2009.

[44] Satoshi Saga and Ramesh Raskar. Feel through window. In SIGGRAPH Asia

2012 Emerging Technologies on - SA ’12 , pages 1–3, New York, New York, USA,

November 2012. ACM Press.

[45] Makoto Sato. SPIDAR and virtual reality. In Proceedings of the 5th Biannual World

Automation Congress , volume 13, pages 17–23. TSI Press, 2002.

[46] Ruwen Schnabel, Roland Wahl, and Reinhard Klein. Efficient RANSAC for Point-

Cloud Shape Detection. Computer Graphics Forum , 26(2):214–226, June 2007.

[47] Adalberto L. Simeone, Eduardo Velloso, and Hans Gellersen. Substitutional Reality.

In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing

Systems - CHI ’15 , pages 3307–3316, New York, New York, USA, April 2015. ACM

Press.

[48] Rajinder Sodhi, Ivan Poupyrev, Matthew Glisson, and Ali Israr. AIREAL: interac-

tive tactile experiences in free air. ACM Transactions on Graphics , 32(4):134, July

2013.

[49] Yuriko Suzuki and Minoru Kobayashi. Air jet driven force feedback in virtual reality.

IEEE Computer Graphics and Applications , 25(1):44–47, January 2005.

[50] James Vallino and Christopher Brown. Haptics in augmented reality. In Proceedings

IEEE International Conference on Multimedia Computing and Systems , volume 1,

pages 195–200. IEEE Comput. Soc, 1999.


37/37

Bibliography 31

[51] Peter Weir, Christian Sandor, Matt Swoboda, Ulrich Eck, Gerhard Reitmayr, and

Arindam Dey. BurnAR: Feel the heat. In 2012 IEEE International Symposium on

Mixed and Augmented Reality (ISMAR), pages 331–332. IEEE, November 2012.