8/18/2019 Thesis Annexing Reality
1/37
8/18/2019 Thesis Annexing Reality
2/37
Abstract
Annexing Reality: Enabling Opportunistic Use of Everyday Objects as Tangible
Proxies in Augmented Reality
Anuruddha Lakmal Hettiarachchi
Master of Science
Graduate Department of Computer Science
University of Toronto
2016
Advances in display and tracking technologies hold the promise of increasingly immer-
sive augmented-reality experiences. Unfortunately, the on-demand generation of haptic
experiences has lagged behind these advances in other feedback channels. This the-
sis presents Annexing Reality; a system that opportunistically acquires physical objects
from a users current setting to provide the best-available haptic sensation for virtual
content. It allows content creators to specify haptic experiences that adapt to the users
current physical environment. The system continuously scans and annexes physical ob-
jects that are physically similar to each virtual one; it then aligns the virtual model to
that object, and adjusts the model to reduce visual-haptic mismatch. The developers
experience with the Annexing Reality system and the techniques utilized in realizing it
are described. The results of a developer study that validates the usability and utility of
this method of defining haptic experiences is also presented.
ii
8/18/2019 Thesis Annexing Reality
3/37
Acknowledgements
Even though only my name is visible on the cover of this document, there are dozens of
invisible names on it of whose collective efforts this is a result of.
First and foremost, I would like express my sincerest gratitude to my supervisor Prof.
Daniel Wigdor for guiding me throughout my masters. I am grateful for his enthusiasm,
motivation, patience, and support.
I would like to extend my gratitude to Prof. Karan Singh for reviewing this thesis and
providing valuable feedback. I am thankful to Prof. Khai Truong for being kind as to
voice over my video submission to CHI2016. Thank you to Katie Barker for critically
reviewing my paper submission to CHI2016 and keeping me on track during that period.
I am indebted to Dr. Bruno De Araujo for his advice and constant feedback on myresearch. I thank Dr. Ricardo Jota and Dr. Michelle Annette for their advice and
support throughout my masters program.
I was fortunate to have such a wonderful group of people in the lab; thank you to my
fellow students and all members of DGP lab. Special thanks to John Hancock, not for
his magical power of fixing things I would still have been stuck with a Vicon system
somewhere in the lab.
Last but not least, I thank my parents Kusum and Shantha, brother Pradeep, and my
girlfriend Akila for their love and support, and for always being there for me, in good
and bad times.
iii
8/18/2019 Thesis Annexing Reality
4/37
Contents
1 Introduction 1
2 Related Work 4
2.1 Visuo-Haptic Mixed Reality . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Opportunistic Tangible Proxies . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Virtual-Real Mismatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Augmented Reality Content Authoring . . . . . . . . . . . . . . . . . . . 6
3 Developer Experience 7
3.1 Annexing Reality Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Opportunistic Annexing of Physical Objects 10
4.1 Object Shape Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1.1 Object Cluster Extraction . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.3 Shape Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.4 Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.5 Virtual-Real Object Similarity . . . . . . . . . . . . . . . . . . . . 12
4.1.6 Voting Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Object Tracking and Rendering 15
5.1 Initial Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 ICP Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Developer Study 19
6.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
iv
8/18/2019 Thesis Annexing Reality
5/37
6.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.3 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.5.1 Understanding of Haptic Models . . . . . . . . . . . . . . . . . . . 216.5.2 Tool Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7 Limitations 23
8 Conclusion and Future Directions 24
Bibliography 25
v
8/18/2019 Thesis Annexing Reality
6/37
8/18/2019 Thesis Annexing Reality
7/37
8/18/2019 Thesis Annexing Reality
8/37
Chapter 1. Introduction 2
Figure 1.1: Left: a users table as seen un-aided by augmented reality. Right: The sameusers table, as seen through an AR headset, augmented with the Annexing Reality tool.Note that virtual content is placed and scaled in order to opportunistically take advantageof physical objects in order to provide haptic experiences for virtual ones.
affects the users feeling of immersion and engagement and have found that the greater
the degree of mismatch, the lower the believability of the experience. This confirms the
findings of Kwon et al. [34] on the effect of shape and size of physical props: object
manipulation is more efficient when physical and virtual objects are alike in shape and
size. Hence, selecting a physical prop that is as similar as possible to a virtual object is
crucial for the users suspension of disbelief.
This thesis presents Annexing Reality; a system that opportunistically finds physical
objects from the users surrounding that would best represent a given set of virtual objects,
and then tailors the virtual objects to match the shape and size of the annexed physicalones. In so doing, we are able to minimize the mismatch between the virtual object and
the physical prop to which it is mapped. A developer experience is described in which
the creator of a virtual object specifies its size and matches its shape to basic primitives,
similar to a collision model. Additionally, tolerances for deformation of the object are
specified, as are preferences for matching (e.g.: it is more important to match a cylinder
than it is to get the size right). The system then uses this information to optimally match
virtual and physical objects in the users environment and performs 3-DOF scaling on the
model to improve physical correspondence.
The task of matching virtual and physical objects is devised as a combinatorial optimiza-
tion in which the set of available physical objects and the given set of virtual objects
make the two partite sets of a complete bipartite graph. The Annexing Reality system
scans the users physical environment with a Kinect depth sensor [4], discovers available
physical objects, and derives their critical primitive shape and size information. The
system casts a vote for each virtual-physical pair based on the shape and size similarity
8/18/2019 Thesis Annexing Reality
9/37
Chapter 1. Introduction 3
between the two, creating a complete weighted bipartite graph. The voting algorithm
allows for a three-level priority scheme where level-1 defines matching priorities for indi-
vidual virtual objects, level-2 defines matching priorities for different primitive parts of
each virtual object, and level-3 defines matching priorities for physical dimensions of each
primitive part. Once the graph is generated, the system finds the optimal assignmentthat maximizes the total vote using Hungarian algorithm [33]. Finally, virtual objects
are overlaid atop matched physical props in real-time allowing the user to physically
interact with virtual objects (figure 1.1). The system rescales the virtual objects to fit
the matched physical prop, further reducing the mismatch.
This thesis presents the following contributions:
1. The Annexing Reality method for parametrically matching and adjusting virtual
to physical objects
2. The developer UI and method for specifying preferences for attributes in matching
and scaling
3. The results of a developer study which demonstrate the effectiveness of the methods
and UI
8/18/2019 Thesis Annexing Reality
10/37
Chapter 2
Related Work
Annexing Reality system builds upon four main areas of related research: visuo-haptic
mixed reality, opportunistic tangible proxies, the issue of mismatch between the virtual
and the real in augmented reality, and AR content authoring.
2.1 Visuo-Haptic Mixed Reality
Milgram et al. [37] defined Mixed Reality systems in which completely real environments
are connected to completely virtual environments. VHMR allows the user to both see
and touch virtual objects in a co-located manner such that the interaction becomes closer
to reality [50]. Several systems have explored this idea of merging visual and kinesthetic
perceptions in the virtual space, especially in medicine and surgery. In their early work,
Carlin et al. [17] used VHMR for the treatment of spider phobia where a subject phys-
ically interacted with a virtual spider. Kotranza et al. [32] studied interpersonal touch
and social engagement while interacting with a Mixed Reality Human: a mannequin
with a virtual human overlay. Volumetric object models, derived from a CT, were used
in conjunction with a PHANTOM 1.0 haptic feedback device [36] to simulate temporalbone surgery [10, 40]. Among a variety of other areas, VHMR has also been explored
in gaming and product design. Oda et al. [39] have developed a racing game in which
the virtual car is controlled with a passive tangible prop. In the two-player ping-pong
game developed by Knoerlein et al. [31], players can feel the impact of the virtual ball on
the virtual bat with a co-located PHANTOM haptic device. ClonAR [20] developed by
Csongei et al. allows product designers to physically edit 3D scans of real-world objects.
4
8/18/2019 Thesis Annexing Reality
11/37
Chapter 2. Related Work 5
2.2 Opportunistic Tangible Proxies
Tangible proxies take advantage of humans ability to grasp and manipulate physical ob-
jects as a means of interacting with digital information [29]. Instead of tightly coupling
digital functionality to physical objects, OTPs adapt to the users environment and lever-
age otherwise unused objects as tangible props. In Opportunistic Controls [24], Hender-
son et al. suggests using physical affordances already present in the domain environment
to provide haptic feedback on user inputs in augmented reality. Smarter Objects [25]
associates a graphical interface with a physical object in a co-located manner allowing a
user to provide tangible inputs while visualizing their effects. Cheng et al.[18], Corsten
et al. [19], and Funk et al. [22] propose repurposing everyday objects as tangible input
devices. Cheng et al. place fiducial markers on everyday objects to convert them to
input controls while Corsten et al. and Funk at al. use a Kinect depth sensor for objectrecognition and tracking. All of these require the user to create physical-digital pairs
by programming physical objects. Alternatively, this work focuses on letting the system
create physical-virtual pairs based on a developers specifications.
2.3 Virtual-Real Mismatch
Since it is nearly impossible to find an exact physical replica of each virtual object withinones surroundings, there exists an inherent mismatch between characteristics of virtual
objects and their physical props. In their recent work, Simeone et al. [47] have found that
some amount of mismatch is acceptable especially where interaction between the user and
the object is less likely to happen. However, increased mismatch negatively affects the
users believability of the virtual experience. From a similar study, Kwon et al. [34] have
concluded that lower disparity results in greater realism and easier manipulation of virtual
objects. Annexing Reality system intends to minimize the mismatch by dynamically
finding a physical object from the users surrounding that is as similar as possible to a
given virtual object, and scale virtual objects to match the physical ones.
An interesting observation from the Pseudo-haptics literature is that, when visual and
kinesthetic stimuli are fused together, visual stimuli dominate over kinesthetic stimuli
resulting in illusionary haptic sensations. In BurnAR [51], subjects have reported invol-
untary heat sensation in their hands when virtual flames and smoke were overlaid onto
their hands. In their series of visuo-haptic experiments, Ban et al. modified the curvature
8/18/2019 Thesis Annexing Reality
12/37
Chapter 2. Related Work 6
[13], angle of edges [12], and position of edges [14] of virtual shapes, without changing the
physical prop which those virtual shapes were overlaid onto. For each virtual shape, the
participants reported that they felt they were touching a physical shape similar to the
virtual one being displayed. These observations indicate that it is the critical shape that
matters, thus plausible to use geometric primitives to represent more complex shapes.
2.4 Augmented Reality Content Authoring
AR content authoring tools provide software frameworks on which developers can quickly
create augmented reality applications without worrying about the functionality of under-
lying components. Such early systems [1, 2] supported marker based virtual-real pairing
where virtual objects were tied to unique image patterns known as fiducials markers.
More recent commercial toolkits such as Vuforia [8] and metaio [6] enable use of physical
objects as fiducials. In these systems however, the developer hard-pairs virtual objects
to physical ones at the time of development, requiring the user to have the exact fiducials
specified by the developer in order to use the applications. Taking a different approach,
the Annexing Reality system scans and dynamically matches available physical objects
from the users physical environment to virtual ones.
8/18/2019 Thesis Annexing Reality
13/37
Chapter 3
Developer Experience
The process begins with a developer creating a virtual scene, such as in an augmented
reality game. An early goal of this project was developer transparency; that is, enabling
matching and adjusting of models to physical objects without any input from developers.
It soon became clear, however, that while this would provide a lower workload for the
target users, it would also serve to disempower them in defining their intended experi-
ences, as the Annexing Reality tool would, in effect, by making design decisions on the
fly. Thus, it was essential to define a developer experience that is simultaneously pow-
erful, easily grasped, and requires little work. To this end, three guiding principles were
defined to guide the design of a tool to provide that experience:
1. Do not impose any limitations on the properties of the models of the virtual objects
that the user will see.
2. Provide a separation between the visual and physical properties to allow developers
to focus on each individually.
3. Provide flexibility to enable the developer to prioritize any aspect of the physical
properties of the objects to be matched.
3.1 Annexing Reality Tool
The developer creates their virtual content using their preferred tool. Once it is complete,
it is imported into Annexing Reality software, where they specify which virtual objects
should be matched to physical ones in the users setting and which should not. In this
7
8/18/2019 Thesis Annexing Reality
14/37
Chapter 3. Developer Experience 8
Figure 3.1: The developer UI for Annexing Reality. (a) The primitives of the hapticmodel, shown in wire frame, for a Champaign bottle object. (b) The developer sets minand max sizes for scaling each primitive (c) Ranking UI to prioritize matching.
manner, the developer controls which virtual objects are physically manipulated by the
player. Other virtual objects may react to the behavior of player manipulated objects
(e.g., the ball in a ping-pong game where the paddle is physically controlled by the player)
or be solely controlled by the developer at the time of development (e.g. imagery in the
background of the world).
For each user-manipulable virtual object, the developer generates a haptic model (e.g.
figure 3.1a). This begins with matching basic geometric primitives, which will be matched
to the physical object(s) from the users surrounding. The shape, size, and position of
the model with respect to the virtual object reflect the haptic experience intended by
the developer. As an example, consider the virtual model of a Champaign bottle, shown
in figure 3.1a. If the developer intends for the player to grab the bottle by the body, she
would remove the conic part of the haptic model, and the bottle would be matched to
cylindrical physical objects at its body. If her experience would be successful no matter
which part which part of the bottle can be grabbed, she can leave the cylinder and cone
in the model, and one or the other will be matched, with priority based on the weighting
she will later define.
The reduction of the virtual model to a set of primitive shapes is what gives Annexing
8/18/2019 Thesis Annexing Reality
15/37
Chapter 3. Developer Experience 9
Reality its power: the developer is provided with the opportunity to specify the haptic
experience they wish to provide, at a level of abstraction that allows, at development
time, the tweaking of how perfect a match must be in order to perform the annexation,
or leave the object as virtual-only.
The final step is setting the parameters for the matching algorithm (figure 3.1c). This
requires the developer to order their virtual objects by priority (for situations where there
is an insufficient number of physical objects for matching, or where one object equally
matches more than one virtual object). Next, the developer prioritizes the primitives
within their haptic model (for situations where more than one of those primitives might
be matched. As an example, with the Champaign bottle, the developer might prefer
to match the body to a cylinder, and as a second choice to match the neck to a conic
physical object). Finally, physical dimensions of those primitives are given preferences.
Continuing to use the Champaign bottle as an example, the developer might find itmore important that the matched cylindrical object have a small enough radius to enable
grasping, and less important that its height precisely match that of a bottle, or they might
prefer that a match only be made if both radius and height are within a tight tolerance.
figure 3.1b shows how to set maximum and minimum values for each dimension, and
figure 3.1c shows how to set priorities within the allowable range.
Once the developer is finished creating haptic models, the system converts and saves this
representation into a textual form which is understandable to the Object Finder module,
described in detail below, which is responsible for doing runtime matching of virtual andphysical objects, parameterized by the properties set by the developer.
8/18/2019 Thesis Annexing Reality
16/37
Chapter 4
Opportunistic Annexing of Physical
Objects
The haptic model forms the basis for matching done at runtime. When a virtual ob-
ject is added to a scene, the Annexing Reality system finds suitable physical objects as
props from the surrounding, and overlays virtual objects onto. The complete pipeline is
illustrated in figure 4.1. The Object Finder module is responsible for finding physical
objects that are as similar as possible to virtual objects specified by the developer. It
scans the environment with the Kinect depth sensor, detects available physical objects,
derives their salient primitive shape and size, and matches virtual objects to them based
on the input from the Haptic Model Generator.
Figure 4.1: A dataflow diagram for the Annexing Reality system. Input to the systemcomes from the Haptic Model Generator tool (generated by the developer), and from theFrame Grabber module which reads from the Kinect Sensor.
10
8/18/2019 Thesis Annexing Reality
17/37
Chapter 4. Opportunistic Annexing of Physical Objects 11
4.1 Object Shape Recognition
The main function of Object Shape Recognizer is to identify physical objects present in
the Kinects field of view. It continuously scans the players physical environment, detects
horizontal planer surfaces (e.g. tables, floor), extracts objects lying on them, and then
estimates each objects primitive shape, size, and relative orientation.
4.1.1 Object Cluster Extraction
The Annexing Reality system only matches physical objects that are supported by hor-
izontal planer surfaces (see Limitations). As the first step, it detects and segments the
dominant plane in the point cloud generated from Kinects depth map. It then computes
the 2D convex hull of the segmented plane. Next, a polygonal prism is created with
the 2D convex hull as the base, which encapsulates the physical objects lying on the
plane. Finally, it separates point clusters belonging to different objects using Euclidean
Cluster Extraction method [42]. As a means of improving the accuracy of future shape
detection phase, the Annexing Reality system reconstructs object models by smoothing
and resampling point clouds using a Moving Least Squares method [11] (figure 5.1a).
4.1.2 Collision Detection
Object Finder may not be able to assign a physical prop to each virtual object due to
two reasons: 1) There may be fewer number of physical objects available in the players
setting than the number of virtual objects. In this case, virtual objects with lower
matching priority may not get physical props. 2) Physical objects of certain shapes may
not be found in the environment. Therefore, the Object Finder continuously matches
physical objects to virtual ones. As a result, when a new physical object appears in the
Kinects field of view (for instance, the player brought a new object into the scene, or
moved to look at another part of the scene which was out of the Kinects field of view
before), the entire object assignment changes, resulting in some virtual objects switching
between physical objects. This is avoided by excluding physical objects that are already
assigned to a virtual object from the matching phase by detecting collisions.
For each object point cluster extracted, the axis aligned bounding box is computed.
Object Finder then iterates through the Pose Trackers assigned to each virtual object (see
8/18/2019 Thesis Annexing Reality
18/37
Chapter 4. Opportunistic Annexing of Physical Objects 12
Pose Tracking and Rendering) to check if any of them are currently tracking a physical
object whose centroid is inside the bounding box. If so, the point cluster belongs to a
physical object that is already being assigned to a virtual object, hence that cluster is
excluded from the matching pipeline.
4.1.3 Shape Recognition
A RANSAC based shape detection method developed by Schnabel et al. [46] is used to
recognize the dominant primitive shape (sphere, cylinder, cone, and torus) of each point
cluster. Due to noisy-partial point clouds received from the depth sensor, erroneous shape
detections frequently occur. The results are verified by comparing physical dimensions
of the shapes to preset size limits; these limits were empirically tuned to generate the
most accurate results.
4.1.4 Voting
Once the salient primitive shape and the size of all point clusters are determined, each
virtual object with no physical prop previously allotted votes for point clusters based
on their shape and size similarity. The purpose of voting is to determine the optimal
assignment of physical objects in which all virtual objects get reasonably similar physical
props. The system applies the priority scheme set by the developer to weight the votes.
4.1.5 Virtual-Real Object Similarity
Each virtual object is compared with each physical object detected. If the physical object
does not contain the primitive(s) of the virtual objects haptic model, those objects are
considered dissimilar and no further processing is done on the pair. If one or more
primitives is found within the physical object, the scale of each primitive is compared to
its size in the haptic model. A vote reduction method is used to determine the vote. Thus,instead of calculating the size similarity, the Voter module calculates the opposite; size
mismatch. For a given dimension h (e.g. height, radius, and angle), the size mismatch
∆H between a virtual object v and a physical object p is calculated as;
∆H = min(hv, h p)
max(hv, h p) (4.1)
8/18/2019 Thesis Annexing Reality
19/37
Chapter 4. Opportunistic Annexing of Physical Objects 13
The error ratio is taken instead of the difference to eliminate the influence of unit of
measurement on the size mismatch value.
4.1.6 Voting Equation
For a given virtual-physical object pair, the Voter iterates through all the primitive shapes
of the virtual object defined by the developer in the order of priority to determine the
shared primitive shape with the highest priority value. If the search fails (there are no
shared primitives), it assigns 0 as the vote given by the virtual object to the physical
object. If a shared primitive is found, it starts the vote reduction process by first assigning
the highest possible vote (a constant value C ) and then multiplying that value by virtual
object priority wo and primitive shape priority ws; each being a value between 0 and
1. To obtain the final vote X vp the Voter multiplies the already reduced vote by the
sum of weighted dimensional mismatch values (D denotes the set of all dimensions for a
given shape while wh denotes priority given for the dimension h). The complete voting
equation (vote cast by virtual object v on physical object p) can be expressed as;
xvp = C × wo × ws ×h∈D
min(hv, h p)
max(hv, h p) × wh
(4.2)
4.2 Matching
Once the voting is complete for each virtual-physical pair, the final step is to make the
final match between physical and virtual objects. The goal is to maximize satisfaction
with the match, based on the score provided by voting. As can be seen in figure 4.23,
some virtual objects, such as v1, have some degree of satisfaction from more than one
match (55% satisfaction with physical object p1, and 34% satisfaction with p2). Other
objects, such as v2, require a particular physical object in order to be satisfied; in this
case, 65% satisfaction with p3. In many instances, virtual objects may not get their first
choice; in the scene described in figure 4.2, v1 will be assigned to p2, so that v3 can be
assigned to p1, thus maximizing the overall match score.
This matching task is formulated as a combinatorial optimization problem. The voting
results in a weighted bipartite graph G = (V, P ; X ) with the set of given virtual objects
V and the set of candidate physical objects P as two partitions of G and each edge
8/18/2019 Thesis Annexing Reality
20/37
Chapter 4. Opportunistic Annexing of Physical Objects 14
Figure 4.2: A weighted bipartite graph representing votes casted by virtual objects (V)on physical objects (P). A dummy node (d1) is added to make the graph symmetric.
in X having a nonnegative weight (vote given by virtual object on the physical object
connected by the edge). Dummy vertices (dummy virtual objects if n(V ) < n(P ) or
dummy physical objects otherwise) and dummy edges with weight 0 are added to convert
G to a symmetric weighted bipartite graph (figure 4.2).
A matching M is a subset of edge set X such that each vertex is incident upon at most
one edge in M . The task here is to find a perfect matching M ∗ (a matching in which
every vertex in G is incident upon one and only one edge) such that the total weight
of M ∗ is maximized. An open source C++ implementation [3] of Hungarian algorithm
(KuhnMunkres method) [33] is used for solving this assignment problem. Once matching
is complete, all point clouds are saved in order to be used as source models for object
tracking.
8/18/2019 Thesis Annexing Reality
21/37
Chapter 5
Object Tracking and Rendering
Once matched, the Annexing Reality system overlays virtual objects onto physical props
in real-time allowing the user to interact with them. The Pose Tracker is responsible
for tracking matched physical objects in Kinects field of view while Renderer renders
virtual objects in place so that the user sees them atop physical props through the
Head Mounted Display (HMD). Each virtual object has a concurrent object tracking
pipeline with the following three threads: InitialAligner, TargetVolumeSegmenter, and
IcpTracker. InitialAligner estimates the poses of physical objects in Kinects field of view
(see figure 5.1b). TargetVolumeSegmenter segments the cubic volume of the point cloud
that contains the physical object as estimated by the IntialAligner (see figure 5.1c).IcpTracker takes a rough pose estimation as input and fine-tunes it by incrementally
aligning the source model (saved point cloud of the matched object being tracked) to the
object in the cubic volume (see figure 5.1d).
5.1 Initial Alignment
Given a source model, the InitialAligner searches the corresponding physical object inthe Kinect's field of view and roughly estimates its pose. This rough pose works as
the initial guess of the IcpTracker. As the first step of initial alignment, key-points of
both source model and scene point cloud are obtained by simply down-sampling each
uniformly. Fast Point Feature Histograms (FPFH) [43] descriptors are then calculated
at key-points of both point clouds. FPFH captures local geometric features of a query
point by computing the relationship between points themselves and normal vectors of the
15
8/18/2019 Thesis Annexing Reality
22/37
Chapter 5. Object Tracking and Rendering 16
Figure 5.1: Object pose tracking pipeline, after the Object Shape Recognizer has iden-tified an object for tracking: (a) reconstructed point cloud model of an object to betracked (b) InitialAligner performs a rough pose estimation to locate the object in thefield of view (red). (c) A cubic volume of the scenes point cloud (yellow) is segmentedsuch that it encapsulates the object (d) IcpTracker computes the accurate pose (green)by incrementally aligning the reconstructed model to the point cloud inside the cubicvolume.
query point and its neighbors. All points within the radius of 2.5cm from a key-point are
used as its neighbors. This radius value is empirically determined to capture a sufficient
neighborhood while remaining local.
A RANSAC based pose estimation algorithm [16] is used for estimating rough trans-
formation between the source model and the aligned model with previously computed
FPFH descriptors. This method accelerates the computation by eliminating poses that
are likely to be wrong. These bad poses are detected by comparing the ratio between
edge lengths of virtual polygons formed on the source and the target (to be aligned) point
cloud models. A similarity threshold of 90% is used for eliminating bad poses.
For each candidate pose, the algorithm tries to maximize the number of inliers. If the
inlier fraction (ration of inliers detected to total number of points in the source pointcloud) is greater than a preset threshold, that pose is marked as good. An empirically
determined inlier threshold of 70% is used, as found to be the optimal percentage for
including a sufficient number of inliers while accounting for noise, clutter, and occlusions.
The Euclidean error score (sum of squared distance error) between the source and the
aligned point clouds is calculated; if the error is less than a preset threshold (0.0003),
that pose is considered as a good initial guess.
8/18/2019 Thesis Annexing Reality
23/37
Chapter 5. Object Tracking and Rendering 17
5.2 ICP Tracking
IcpTracker refines the initial registration by incrementally aligning the source point cloud
to the target in order to obtain an accurate pose estimation. Iterative Closest Point
(ICP) algorithm [15] is used for the incremental alignment. For a given frame, the sourcepoint cloud to the IcpTracker is its own output from the previous frame except for two
instances: 1) at the beginning of the pose tracking where IcpTracker has no previous
output, 2) when the tracker has lost tracking. In these two cases, the IcpTracker fetches
a source point cloud from the the InitialAligner.
As a preliminary step, a cubic volume from the scene point is segmented such that it
encapsulates the point cloud of the target object (see figure 5.1c). The TargetVolume-
Segmenter takes as input, the source model dimensions from the Object Finder and the
position of the object being tracked from IcpTracker (or from InitialAligner in the twoinstances described above). It then extracts a cubic volume of the 120% of the size of
the target object.
The IcpTracker then incrementally aligns the source point cloud to the target inside the
cubic volume in up to maximum of 20 iterations. The alignment finishes if either of the
following two conditions is met: 1) the maximum number of iterations is reached, 2) the
expected alignment fitness is achieved (a Euclidean error score less than 0.0001). If it fails
to find a satisfactory alignment for the current frame, it rolls back to its previous results
and reattempts on the next frame. If the IcpTracker continually fails for 5 consecutiveframes, it restarts tracking with a new initial pose received from InitialAligner.
5.3 Rendering
Renderer retrieves poses of physical objects from the IcpTracker, derives poses of their
primitives, removes jitter, scales and positions virtual objects in the scene, and renders
a stereogram of the virtual scene on the HMD.
The poses of physical props received the from Pose Tracker are relative to the Kinect.
However, the renderer renders corresponding virtual objects in HMDs viewpoint. Thus,
all poses are transformed into HMD's coordinate frames by precisely creating a model of
the Kinect+HMD setup including accurate field of view and aspect ratio of the display.
The virtual objects are then scaled and positioned such that parts corresponding to
8/18/2019 Thesis Annexing Reality
24/37
Chapter 5. Object Tracking and Rendering 18
matched primitive shapes of virtual and physical objects are correctly aligned (figure 1.1,
figure 6.1). This is accomplished for each virtual object by first creating a dummy object
with the pose as the virtual primitive shape (collision model), then by making that object
the parent of the virtual object, and finally by transforming that parent object such that
primitive shaped parts of virtual and physical objects overlap.
8/18/2019 Thesis Annexing Reality
25/37
Chapter 6
Developer Study
A study was conducted to evaluate the usability and utility of the Annexing Reality
system for developers. Goals were to twofold. The first was to understand the utility of
the annexing reality technique: is the set of information required of the developer to pro-
duce each haptic model—primitive shape matching, scaling, and preference information—
useful in producing matches for objects which developers expect? The second goal was
to study the usability of the tool for building those models: would developers be able to
successfully understand and use the UI, a set of sliders and text boxes, along with a 3D
modeling interface (see figure 3.1), to define their desired haptic model?
Specifically, following are the research questions that were to be answered:
1. Are developers able to express their design intent as a Haptic Model: a set of
primitive shapes, priorities, and size parameters?
2. How usable is the Annexing Reality software tool for building that the haptic
model?
6.1 Participants
Eight professional game developers and 3D modelers were recruited, all male, age 27-48
years, to act as expert assessors of our tool. All of them had both game development
and 3D modeling experience. All had experience with at least one augmented reality
application while three of them had developed augmented reality applications. Each
participant received $50 per one hour session.
19
8/18/2019 Thesis Annexing Reality
26/37
Chapter 6. Developer Study 20
6.2 Apparatus
Seven royalty-free 3D models found online were used as virtual objects, which were to
be matched to one or more of eight physical props (see figure 6.1). The participants
used a Windows PC with a 24 inch monitor. At several points during the study, they
wore a helmet-mounted Kinect with a Meta 1 head-mounted display [5] while testing and
interacting with physical props (see figure 1.1).
6.3 Task
The task was performed 4 times per participant twice as with simple virtual models
which matched a single primitive (e.g. a cylinder for a lightsabers handle), and thentwice with more complex models which contained multiple primitive shapes (e.g. a lamp
with a spherical body and cylindrical shade; see figure 6.1). Each task was divided into
two parts. First, the participant was shown one of the virtual models, and asked to
imagine the sorts of physical objects which they might want to match the object to.
They were then asked to use the Annexing Reality tool to construct the haptic model,
including primitive shape, prioritizations, and size characteristics to enable matching.
Once this was complete, participants were given the set of physical props, and asked to
sort them into two groups: those which they expected to match their representation, and
those they did not. They were then asked to sort the props in the will match group to
their expected prioritization. Once each task was complete, participants were given the
opportunity to don the HMD to observe the effects of their matches.
6.4 Procedure
Assessors completed a demographic questionnaire prior to the study. They were given
the background to the project and a detailed explanation of the task, followed by a
demonstration. Then, the participants completed the task four times: twice for basic,
single-primitive physical objects, and then twice for more complex physical objects.
During each trial, the time required to complete the haptic model was measured. The
frequency and type of errors participants made were also recorded. After finishing all
trials, the participants completed a post-study questionnaire. They rated their impres-
8/18/2019 Thesis Annexing Reality
27/37
Chapter 6. Developer Study 21
sion on learnability, efficiency, and satisfaction of the tool using a 5-point Likert scale.
In addition, participants gave their expert opinion on utilizing geometric primitives to
represent complex shapes and using a priority scheme to control matching.
6.5 Results
Data for 32 trials was collected. After the short demo, participants were quickly able
to learn the tool and generate haptic models for the virtual objects given. On average,
they took 2 minutes and 5 seconds to complete a single primitive haptic model and 3
minutes 18 seconds to complete a two primitive model. Participants made some errors;
on average 0.53 per trial. Some common errors were, accidentally modifying other haptic
models instead of the intended one, forgetting to set certain priority values and minimum-
maximum scaling limits, and placing primitives in incorrect orientations.
Figure 6.1: Top: the visual models used in the developer studies, overlaid with primitivesfrom possible haptic models. Bottom: various visual models matched to physical objectsbased on the haptic models match for physical characteristics.
6.5.1 Understanding of Haptic Models
Participants were successful in using haptic models to define which physical objects should
be matched to each virtual model. In all 32 trials, they successfully matched their group
of hand-picked physical objects to those selected by the system. Within this hand-picked
8/18/2019 Thesis Annexing Reality
28/37
Chapter 6. Developer Study 22
group, participants were asked to rank which object would be selected first, second, third,
etc. Interestingly, in only 19/32 did participants successfully identify which object would
be chosen first. Of the remaining trials, their second ranked object was chosen 10 times,
and third ranked object 3 times. In all of these 13 trials, a multi-primitive haptic model
included nearly equal matching priorities for both primitive shapes. This suggests thata forced-rank system might provide a better UI to aid developers in understanding the
matches which will be made with their haptic models.
Participants accepted the use of simple geometric primitives to represent more complex
visual models (median score of 4.5). They also expressed the necessity of using a priority
scheme to manage matching (median score of 4). However, participants mentioned that
it was not very intuitive to precisely predict the outcome of weighting without looking at
internal vote calculation steps (median score of 3), although they were able to successfully
guess the system matched physical objects. At the same time, participants were confidentthat their ability to predict would improve over time (median score of 4).
6.5.2 Tool Usability
Participants to assessed the Annexing Reality graphical tool on the following: the efforts
needed to learn how to create haptic models using different functionalities of the tool;
the ability to perform a task using the tool without leading to erroneous behaviours; and
their satisfaction with the overall performance of the tool. Participants strongly agreedthat the software was easy to learn (median score of 5 on 5-point Likert scale). Many
believed that the tool worked as intended without leading to errors (median score of
4) although those developers who had no previous experience with 3D modeling found
that aspect of the system challenging. In general, participants were pleased with the
performance (median score of 4).
To summarize the key findings: the Annexing Reality graphical tool was rated by the
expert assessors as easy to learn and use. Assessors could generate haptic models quickly,
and only made a very few errors. All assessors were satisfied about the overall perfor-mance of the tool. Assessors were able to generate haptic models that matched a given
virtual object to a given set of physical objects. However, further research is required to
improve the predictability of the weighting scheme.
8/18/2019 Thesis Annexing Reality
29/37
Chapter 7
Limitations
A primary limitation of the Annexing Reality graphical tool that was made clear by the
developer study is the difficulty to precisely predict a physical prop based on the priority
scheme in the haptic model. Although the goal of the system is to define a class of
physical objects for annexation, which developers were able to do without error, their
high rate of error in ordering of within that class objects to be matched suggests either
an incomplete understanding of the haptic models or of how they are used to perform
matching. In particular, the priority scheme involves multilevel weighting in a continuous
scale (0-1), and this prioritization would seem to not have been understood.
In terms of system performance, object detection, shape recognition, and motion track-
ing limit the current implementation of the Annexing Reality system. The system can
only detect physical objects supported by horizontal planer surfaces as it uses a plane
segmentation based method for object cluster extraction. At the same time, due to erro-
neous depth information generated by the Kinect depth sensor, the system cannot detect
transparent or glossy objects. The ObjectShapeRecognizer module limits itself to recog-
nizing only the most salient primitive shape of an object due to the fact that inferring
less salient shape information from imperfect, partial point clouds is error prone. Also,
objects smaller than 7cm ×
5cm ×
5cm
do not generate a sufficient number of 3D pointsin the Kinect data to accurately determine their shapes, hence are not detectable with
the current implementation. As the user wears the Kinect depth sensor on a helmet,
quick head movements result in erroneous tracking because of high relative motions of
objects with respect to the Kinect. Though there is no contribution to computer vision
here, it is expected that this work will continue to motivate improvements.
23
8/18/2019 Thesis Annexing Reality
30/37
Chapter 8
Conclusion and Future Directions
This thesis has presented Annexing Reality; opportunistic annexation of physical objects
in the users surround to temporarily provide best available haptic sensation for virtual
objects in augmented reality. It allows users to freely move between environments while
experiencing tangible augmented reality, without carrying special equipment or objects
with them. With this system, developers can define, a priori, adaptive haptic experiences
for virtual scenes. Based on this specification, the system dynamically finds physical
objects from the users surrounding that are physically similar to virtual ones, and uses
them to physically represent them. A developer study revealed that content creators
accepted the Annexing Reality system as a useful tool for designing augmented realityapplications in which virtual objects are opportunistically paired to and controlled by
everyday physical objects. Numerous ways the system can be extended to further improve
its usability and utility are identified.
Developers'understanding of the priority scheme can be improved by providing tools
for simulating voting and matching with different combinations of priority values. In
particular, providing a database of physical objects commonly found by sensors using this
system would allow for useful simulation during development. In addition, simplifying
the prioritization mechanism by replacing a continuous scale with a forced-choice ranking
would increase the predictability of matches.
Although the set of currently implemented primitives allowed thorough exploration of the
concept of Annexing Reality, adding more geometric primitives such as planes, cubes, and
pyramids will be essential in providing more power to the developers for defining haptic
experiences. Additional computer vision techniques would allow the detection of features
24
8/18/2019 Thesis Annexing Reality
31/37
Chapter 8. Conclusion and Future Directions 25
such as texture, and applying object recognition technologies would help understand the
weight and additional meta data about objects seen by the system. The application of
improved sensing and computer vision will also allow detection of more than one primitive
shape in a given physical object, allowing the system to match more complex physical
objects.
For an enhanced user experience, accurate, low latency tracking is crucial. As such, uti-
lizing RGB texture information may help tracking rotations of geometrically symmetric
objects. Moreover, being able to detect and track transparent, glossy, and deformable
objects will expand the variety of objects that can be used with the system.
While haptic feedback technologies continue to improve, there is a clear opportunity for
an immediate leap forward through annexation of physical experiences. This thesis may
serve as a guidepost to begin to overcome significant challenges of authoring experiences
given a lack of information about the physical objects users will actually be interacting
with.
8/18/2019 Thesis Annexing Reality
32/37
Bibliography
[1] ATOMIC Authoring Tool.
[2] buildAR.
[3] dlib C++ Library - Optimization.
[4] Kinect for Xbox One.
[5] Meta Augmented Reality.
[6] metaio.
[7] Microsoft HoloLens.
[8] Vuforia Augmented Reality for 3D Mobile Content.
[9] Merwan Achibet, Maud Marchal, Ferran Argelaguet, and Anatole Lecuyer. The Vir-
tual Mitten: A novel interaction paradigm for visuo-haptic manipulation of objects
using grip force. In 2014 IEEE Symposium on 3D User Interfaces (3DUI), pages
59–66. IEEE, March 2014.
[10] Marco Agus, Andrea Giachetti, Enrico Gobbetti, Gianluigi Zanetti, and Antonio
Zorcolo. A multiprocessor decoupled system for the simulation of temporal bone
surgery. Computing and Visualization in Science , 5(1):35–43, January 2014.
[11] Marc Alexa, Johannes Behr, Daniel Cohen-Or, Shachar Fleishman, David Levin, and
Claudio T. Silva. Computing and rendering point set surfaces. IEEE Transactions
on Visualization and Computer Graphics , 9(1):3–15, January 2003.
[12] Yuki Ban, Takashi Kajinami, Takuji Narumi, Tomohiro Tanikawa, and Michitaka
Hirose. Modifying an identified angle of edged shapes using pseudo-haptic effects.
Haptics: Perception, Devices, Mobility, and Communication , pages 25—-36, 2012.
26
8/18/2019 Thesis Annexing Reality
33/37
8/18/2019 Thesis Annexing Reality
34/37
Bibliography 28
[22] Markus Funk, Oliver Korn, and Albrecht Schmidt. An augmented workplace for
enabling user-defined tangibles. In Proceedings of the extended abstracts of the 32nd
annual ACM conference on Human factors in computing systems - CHI EA ’14 ,
pages 1285–1290, New York, New York, USA, April 2014. ACM Press.
[23] Sidhant Gupta, Dan Morris, Shwetak N. Patel, and Desney Tan. AirWave: non-
contact haptic feedback using air vortex rings. In Proceedings of the 2013 ACM
international joint conference on Pervasive and ubiquitous computing - UbiComp
’13 , pages 419–428, New York, New York, USA, September 2013. ACM Press.
[24] Steven J. Henderson and Steven Feiner. Opportunistic controls: leveraging natural
affordances as tangible user interfaces for augmented reality. In Proceedings of the
2008 ACM symposium on Virtual reality software and technology - VRST ’08 , pages
211–218, New York, New York, USA, October 2008. ACM Press.
[25] Valentin Heun, Shunichi Kasahara, and Pattie Maes. Smarter objects: using AR
technology to program physical objects and their interactions. In CHI ’13 Extended
Abstracts on Human Factors in Computing Systems on - CHI EA ’13 , pages 961–
966, New York, New York, USA, April 2013. ACM Press.
[26] Ken Hinckley, Randy Pausch, John C. Goble, and Neal F. Kassell. Passive real-
world interface props for neurosurgical visualization. In Proceedings of the SIGCHI
conference on Human factors in computing systems celebrating interdependence -CHI ’94, pages 452–458, New York, New York, USA, April 1994. ACM Press.
[27] Koichi Hirota and Michitaka Hirose. Providing force feedback in virtual environ-
ments. IEEE Computer Graphics and Applications , 15(5):22–30, 1995.
[28] Hunter G. Hoffman. Physically touching virtual objects using tactile augmentation
enhances the realism of virtual environments. In Proceedings. IEEE 1998 Virtual
Reality Annual International Symposium (Cat. No.98CB36180), pages 59–63. IEEE
Comput. Soc, 1998.
[29] Hiroshi Ishii and Brygg Ullmer. Tangible bits. In Proceedings of the SIGCHI confer-
ence on Human factors in computing systems - CHI ’97 , pages 234–241, New York,
New York, USA, March 1997. ACM Press.
[30] Hiroo Iwata, Hiroaki Yano, Fumitaka Nakaizumi, and Ryo Kawamura. Project
FEELEX: adding haptic surface to graphics. In Proceedings of the 28th annual
8/18/2019 Thesis Annexing Reality
35/37
Bibliography 29
conference on Computer graphics and interactive techniques - SIGGRAPH ’01, pages
469–476, New York, New York, USA, August 2001. ACM Press.
[31] Benjamin Knoerlein, Gábor Székely, and Matthias Harders. Visuo-haptic collabora-
tive augmented reality ping-pong. In Proceedings of the international conference on
Advances in computer entertainment technology - ACE ’07 , pages 91–94, New York,
New York, USA, June 2007. ACM Press.
[32] Aaron Kotranza and Benjamin Lok. Virtual Human + Tangible Interface = Mixed
Reality Human An Initial Exploration with a Virtual Breast Exam Patient. In 2008
IEEE Virtual Reality Conference , pages 99–106. IEEE, 2008.
[33] Harold W. Kuhn. The Hungarian method for the assignment problem. Naval Re-
search Logistics Quarterly , 2(1-2):83–97, March 1955.
[34] Eun Kwon, Gerard J. Kim, and Sangyoon Lee. Effects of sizes and shapes of props
in tangible augmented reality. In 2009 8th IEEE International Symposium on Mixed
and Augmented Reality , pages 201–202. IEEE, October 2009.
[35] Daniel Leithinger, Sean Follmer, Alex Olwal, Samuel Luescher, Akimitsu Hogge,
Jinha Lee, and Hiroshi Ishii. Sublimate: state-changing virtual and physical ren-
dering to augment interaction with shape displays. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems - CHI ’13 , pages 1441–1450,
New York, New York, USA, April 2013. ACM Press.
[36] Thomas H. Massie and Kenneth J. Salisbury. The phantom haptic interface: A de-
vice for probing virtual objects. In Proceedings of the ASME winter annual meeting,
symposium on haptic interfaces for virtual environment and teleoperator systems ,
volume 55, pages 295–300. Chicago, IL, 1994.
[37] Paul Milgram and Fumio Kishino. A Taxonomy of Mixed Reality Visual Displays,
December 1994.
[38] Yoichi Ochiai, Kota Kumagai, Takayuki Hoshi, Jun Rekimoto, Satoshi Hasegawa,
and Yoshio Hayasaki. Fairy lights in femtoseconds. In ACM SIGGRAPH 2015
Posters on - SIGGRAPH ’15 , pages 1–1, New York, New York, USA, July 2015.
ACM Press.
[39] Ohan Oda, Levi J. Lister, Sean White, and Steven Feiner. Developing an augmented
reality racing game. page 2, January 2008.
8/18/2019 Thesis Annexing Reality
36/37
Bibliography 30
[40] Bernhard Pflesser, Andreas Petersik, Ulf Tiede, Karl Heinz Höhne, and Rudolf
Leuwer. Volume cutting for virtual petrous bone surgery. Computer aided surgery :
official journal of the International Society for Computer Aided Surgery , 7(2):74–83,
January 2002.
[41] Jun Rekimoto. Traxion: a tactile interaction device with virtual force sensation.
In ACM SIGGRAPH 2014 Emerging Technologies on - SIGGRAPH ’14, pages 1–1,
New York, New York, USA, July 2014. ACM Press.
[42] Radu B. Rusu. Semantic 3D Object Maps for Everyday Manipulation in Human
Living Environments. KI - K¨ unstliche Intelligenz , 24(4):345–348, August 2010.
[43] Radu B. Rusu, Nico Blodow, and Michael Beetz. Fast Point Feature Histograms
(FPFH) for 3D registration. In 2009 IEEE International Conference on Robotics
and Automation , pages 3212–3217. IEEE, May 2009.
[44] Satoshi Saga and Ramesh Raskar. Feel through window. In SIGGRAPH Asia
2012 Emerging Technologies on - SA ’12 , pages 1–3, New York, New York, USA,
November 2012. ACM Press.
[45] Makoto Sato. SPIDAR and virtual reality. In Proceedings of the 5th Biannual World
Automation Congress , volume 13, pages 17–23. TSI Press, 2002.
[46] Ruwen Schnabel, Roland Wahl, and Reinhard Klein. Efficient RANSAC for Point-
Cloud Shape Detection. Computer Graphics Forum , 26(2):214–226, June 2007.
[47] Adalberto L. Simeone, Eduardo Velloso, and Hans Gellersen. Substitutional Reality.
In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing
Systems - CHI ’15 , pages 3307–3316, New York, New York, USA, April 2015. ACM
Press.
[48] Rajinder Sodhi, Ivan Poupyrev, Matthew Glisson, and Ali Israr. AIREAL: interac-
tive tactile experiences in free air. ACM Transactions on Graphics , 32(4):134, July
2013.
[49] Yuriko Suzuki and Minoru Kobayashi. Air jet driven force feedback in virtual reality.
IEEE Computer Graphics and Applications , 25(1):44–47, January 2005.
[50] James Vallino and Christopher Brown. Haptics in augmented reality. In Proceedings
IEEE International Conference on Multimedia Computing and Systems , volume 1,
pages 195–200. IEEE Comput. Soc, 1999.
8/18/2019 Thesis Annexing Reality
37/37
Bibliography 31
[51] Peter Weir, Christian Sandor, Matt Swoboda, Ulrich Eck, Gerhard Reitmayr, and
Arindam Dey. BurnAR: Feel the heat. In 2012 IEEE International Symposium on
Mixed and Augmented Reality (ISMAR), pages 331–332. IEEE, November 2012.
Top Related