Perception of 3-D location based on vision, touch, and ... Perception of 3-D location based on...

1

Perception of 3-D location based on vision, touch, and extended touch

Nicholas A. Giudice,1 Roberta L. Klatzky,

2 Christopher R. Bennett,

1 and Jack M. Loomis

3

1 Spatial Informatics Program, School of Computing and Information Science, University of Maine,

Orono, Maine, USA 2Department of Psychology, Carnegie Mellon University,

Pittsburgh, Pennsylvania, USA 3Department of Psychological and Brain Sciences, University of California,

Santa Barbara, California, USA

Abstract. Perception of the near environment gives rise to spatial images in working memory that continue to

represent the spatial layout even after cessation of sensory input. As the observer moves, these spatial images are

continuously updated. This research is concerned with (1) whether spatial images of targets are formed when they

are sensed using extended touch (i.e., using a probe to extend the reach of the arm) and (2) the accuracy with which

such targets are perceived. In Experiment 1, participants perceived the 3-D locations of individual targets from a

fixed origin and were then tested with an updating task involving blindfolded walking followed by placement of the

hand at the remembered target location. Twenty-four target locations, representing all combinations of two

distances, two heights, and six azimuths, were perceived by vision or by blindfolded exploration with the bare hand,

a 1-m probe, or a 2-m probe. Systematic errors in azimuth were observed for all targets, reflecting errors in

representing the target locations and updating. Overall, updating after visual perception was best, but the

quantitative differences between conditions were small. Experiment 2 demonstrated that auditory information

signifying contact with the target was not a factor. Overall, the results indicate that 3-D spatial images can be formed

of targets sensed by extended touch and that perception by extended touch, even out to 1.75 m, is surprisingly

accurate.

Keywords: Spatial image, extended touch, haptic perception, spatial cognition

Correspondence should be sent to Nicholas Giudice, Spatial Informatics Program, School of Computing and

Information Science, University of Maine, Orono, ME 04469. Phone 207-581-2187, Fax 207-581-2206

E-mail: [email protected]

2

Introduction

Vision, hearing, touch, and language provide information for the creation of perceptual representations (percepts) of

our surroundings. Such percepts give rise to transient spatial representations in working memory that persist well

after the input has ceased, as well as the source of enduring representations that exist in long-term memory.

Although most research in spatial cognition has focused on vision as the input modality for these processes, there is

growing interest in multimodal spatial cognition—the development of representations based on different sensory and

linguistic inputs and their subsequent use in spatial judgments and action.

We are interested in a form of spatial representation maintained in working memory that plays a role in the control

of action when the source stimulus is temporarily interrupted or removed. We refer to this representation as a

"spatial image" (Loomis, Klatzky, & Giudice, in press-a). For an isolated visual, auditory, or haptic target, the

spatial image of a target is spatially coincident with the percept. As such, perceptual errors in distance or direction

are inherited by the spatial image and remain once the stimulus and resulting percept are no longer present. There is

growing evidence that the spatial image is amodal in nature-- that once formed, it no longer retains modality-specific

information with respect to subsequent tasks that rely on it (Giudice, Betty, and Loomis, 2011; Giudice, Klatzky,

and Loomis, 2009; Loomis et al., 2012; Loomis et al., in press). In this article, our interest is whether there are

spatial images of the perceived locations of targets sensed by extended touch (i.e., using a probe to extend the reach

of the arm) as there are with normal touch (e.g., Giudice et al., 2011) and, if so, to assess the accuracy of extended

touch relative to normal touch and vision.

Our interest in extended touch was prompted by the well-known phenomenological result that feeling with a probe

causes the observer to perceive the location where the probe contacts the surface rather than just perceiving the

vibrations in the handle (Gibson, 1966; Katz, 1925/1989; Lotze, 1894; Polanyi, 1966; Weber, 1846/1978). This

perceptual externalization of sensory information is often referred to as distal attribution (Epstein et al., 1986;

Loomis, 1992; Siegle & Warren, 2010). Going beyond phenomenology, many elegant empirical studies have been

conducted on "dynamic touch" (Gibson, 1966; Turvey, 1996), of which extended touch (involving contact of a probe

with a surface) is a special case. Turvey and associates have shown that people wielding a probe can judge the

length of the probe as well as the distance to the surface being contacted (Carello, Fitzpatrick, & Turvey, 1992;

Chan & Turvey, 1991) and the size of an aperture defined by two edges (Barac-Cikoja & Turvey, 1991). In making

these judgments, people are able to sense intrinsic properties of the wielded probe (e.g., its moments of inertia)

despite considerable variations in how the probe is held and the joints about which limb rotation occurs (e.g., elbow

and wrist). Related work has shown that bi-manual wielding of crossed rods enables observers to accurately judge

the intersection distance (Cabe, Wright & Wright, 2003).

While we appreciate that people are able to sense the intrinsic properties of the wielded probe as the basis for

judging where the contact point is in space, we are still fascinated by the phenomenological claim that people

actually experience the point of contact at the end of the probe. For a point of contact to be perceived without

ambiguity, the observer must effectively model the linkage from the body to the externalized world (Loomis, 1992).

The work cited above indicates that such modeling is possible when the extension is not geometrically complex,

such as a hand-held rod. This elegant achievement of our evolutionary history is evidenced not only in humans, but

in lower animals who exhibit tool use (Maravita & Iriki, 2004; Povinelli, Reaux, & Frey, 2011).

Based on other research we have done on development of the spatial image as representing space around the person

(for review, see Loomis et al., in press), we assert that once striking with the probe has ceased, people continue to

represent the perceived contact point within working memory. The representing spatial image can then serve to

support spatial behaviors, such as spatial updating, whereby the person can move flexibly through space while

updating the spatial locations of the initially perceived contact points. In addition to investigating whether such

spatial images can be formed with extended touch, we were also interested in assessing the accuracy and precision

of locations perceived using extended touch relative to normal touch and full-cue visual viewing.

To address these issues, we used a spatial updating task, where participants perceived a single target from a specific

observation point (the origin), then sidestepped 1 m left or right, and walked without vision toward the remembered

target location. Upon reaching the target, participants indicated its 3-D location by gesturing with the hand, a

response method developed by Ooi, Wu, & He (2004) and subsequently used in other research (Ooi & He, 2006;

3

Ooi, Wu, & He, 2006; Wu, Ooi, & He, 2004). Under the assumption that spatial updating is performed without

systematic error (see Loomis & Philbeck, 2008), the location indicated by the gesturing hand is a best estimate of the

initially perceived 3-D location. With good distance information available, the average gestured location to targets

above the ground and up to 7 m away is within 15 cm of the target (Ooi & He, 2006). In reduced-cue environments,

systematic biases in visual perception can be demonstrated by this blind walking/gesturing technique. For example,

when it is used to measure the perceived 3-D locations of nearby dim points of light viewed in an otherwise dark

room, the indicated location typically is farther from the viewing point than the target, indicating over-perception of

target distance, but essentially in line with the target, indicating correct perception of target direction (Ooi et al,

f2001, 2006; Wu et al., 2004). We note, however, that the claim of correct perception of direction has been

challenged recently. Using a different method for measuring perceived direction, Li and Durgin (2012) concluded

that the visual direction of a target is systematically misperceived, with a consequent misperception of its distance.

Our primary concern in this experiment is to use the walking/gesturing technique to assess the accuracy with which

people can perceive the distance and height of a target using extended touch. However, we also varied the azimuth

of the target relative to the starting orientation of the observer as a manipulation of secondary interest. In the

experiment, targets were presented at two heights, two distances from the origin, and six azimuths relative to the

participants' starting orientation. Perception of all 24 target combinations was assessed for four experimental

conditions (the first three done under blindfold): (1) extended touch with a 2-m aluminum pole, (2) extended touch

with a 1-m aluminum pole following stepping forward 0.75 m, (3) physical touch after walking to the vicinity of the

target, and (4) normal visual viewing of the targets. The 1-m pole condition was included as a hybrid of the 2-m pole

and hand touch conditions inasmuch as it involved both extended touch and the proprioception of walking. In the

two conditions where the participants walked forward (2 and 3), they stepped backward to the origin after

responding, so that in all conditions, responses were initiated after side-stepping from the origin.

Having participants sidestep prior to walking and gesturing was intended to eliminate two alternative strategies that

would not rely on a spatial image. First, it precludes the possibility that the walking and gesturing responses might

be computed from the observation point and performed ballistically. Second, it prevents participants in the normal

touch condition from simply repeating the motor movements during the response phase that they made to reach the

target in the perception phase. In order for participants to respond to the initially perceived target following

sidestepping, they must mentally represent the target in working memory (i.e., form a spatial image) and then update

this representation during both side-stepping and subsequent blind walking / gesturing. Performance with the

extended touch conditions in this study will be possible only if spatial images can be accurately developed of the

locations contacted by the end of the pole.

Method

Participants. Fourteen participants (7 female), ages 20 to 30 (M = 24.6, SD = 3.5), took part in the study. The

research was approved by the University of Maine’s local ethics committee and written informed consent was

obtained from all participants, who received monetary compensation for their time.

Apparatus. Targets consisted of two 6.6 cm diameter field hockey balls, which were affixed to the tops of two

microphone stands (0.5 and 1.5 m height). PVC tubing (4.5 cm diameter) was put over the stalk of the stand to

ensure a uniform surface along its extent. Two aluminum poles (1 m and 2 m, 1.5 cm diameter) were used for the

extended probes used to apprehend the targets (see Figure 1). A fingerless cycling glove with an infrared LED

affixed to the dorsal surface was worn in each condition. The LED was used to track user movement during the

blind walking tests by means of a four camera optical tracking system (PPT, Worldviz Inc., Santa Barbara, CA).

Recording of tracking data and sequencing of experimental trials was done using the Vizard 3D rendering suite

(version 3.17, Worldviz). A blindfold (Mindfold Inc., Tucson, AZ) was worn during all experimental trials, except

for those requiring visual inspection. A Nintendo Wiimote was used for making test responses in the blind walking

trials.

4

Figure 1. Three of the panels illustrate a participant feeling a target in the long pole, short pole, and hand touch

conditions. The fourth panel shows the participant making a gesturing response to the target location following blind

walking to its vicinity.

Procedure. The experiment had four separate conditions, each with 24 trials, and adopted a within participants

design, with each participant completing each condition. The 24 trials represented the different combinations of

height, distance, and azimuth for target placement. The two heights used were 0.5 and 1.5 meters; the two distances

were 1.75 and 1.25 meters from the origin, and the six azimuths used were -135º, -45º, 0º, 45º, 135º, and 180º

relative to the starting orientation (0º) at the origin (see Figure 2), with negative values indicating azimuths to the

left. Within our coordinate frame, x corresponds to left/right variation in the figure, y corresponds to height, and z is

parallel to the starting orientation. The 24 target location presentations were randomized over the 14 participants,

and condition order was counterbalanced as fully as possible given the number of participants using a Latin Square

design. The exception was the vision condition, which was always performed last. This was done as viewing was

performed with full cues and thus provided information about the room, which could have biased the non-visual

conditions if run earlier.

Figure 2. Top down view of the space with a participant at the origin in the initial orientation (0°). Situated around the

person are all of the possible physical locations of the targets (height collapsed) as well as the two drop off points (1m to

the left and 1m to the right).

5

Prior to the start of the experiment proper, the blind walking task with target distances of 3, 5, and 10 ft (0.30, 1.50,

and 3.05 m) was demonstrated in a hallway outside of the lab room. Participants looked at a taped marker on the

floor, walked to the point with eyes closed, and then opened their eyes to get corrective feedback about differences

between the target and the stopping point. This was done for three replications at each distance.1 The participant

then entered the lab room. All experimental trials comprised a perception phase followed by a response phase. For

the three conditions involving touch, one practice trial preceded the experimental trials to acquaint the participant

with the procedure for exploring the target and then responding by blind walking and gesturing (with the target

removed). Participants were not allowed to view or haptically explore the rod and no feedback about the accuracy of

the response was provided.

The procedures for perceiving the target's location in the four different perception conditions are given below (see

Figure 1).

1) Long pole: In this condition, the blindfolded participant used a 2-m pole to explore each target and thus to

perceive its location. To begin, the participant stood at an origin within the lab facing the starting orientation (0º

azimuth), as indicated by a toe rest used to facilitate non-visual alignment. The participant held onto the top (handle

end) of the pole which was vertically oriented directly in front of the origin. Once the target was silently positioned,

the experimenter lifted the tip of the pole, which had been resting on the floor, until it was parallel to the floor,

rotated the pole until aligned with the target azimuth, and rested the tip on the target’s dorsal surface. The participant

was instructed to remain at the origin and rotate in place to follow the movement of the pole, such that it always

remained in front. Once the end of the pole was placed on the target, the participant had 20 sec to explore and thus

perceive the target’s location with respect to his or her own location in the room. This 20 sec period was found to be

more than sufficient from pilot studies in the lab. To explore the target, the participant was allowed unrestricted

movement of the pole up and down along the stand upon which the target was mounted. The participant could also

move the pole left and right to allow for triangulation or forward and back to calibrate the distance against the length

of the pole. To minimize the use of auditory and body-based distance cues, the participant was not allowed to rotate

or translate the head or body during exploration, was discouraged from tapping the target stand with the pole, and

could not sweep the pole back and forth between the target’s location and the origin (i.e. providing a distance metric

beyond the internalized length of the pole). After this perception phase, the experimenter grasped the tip of the pole

and rotated it (and the participant) back to the starting orientation.

2) Short pole. A 1–m pole was used in this condition. The initial procedure was identical to the long pole condition

except that after the participant rotated through the angle from the original orientation to the target location, he/she

stepped 0.75 m forward before the tip of the pole was placed on the top of the target by the experimenter. (This

distance was sufficient to reach both targets.) He/she then had 20 sec to explore and perceive the target as described

earlier. When finished, the participant translated 0.75 m backward to the origin and was rotated back to the starting

orientation with a short guidance rod.

3) Hand touch: In this condition, the blindfolded participant began at the origin. After target placement, the

experimenter handed him/her a short 10 cm rod which was used to guide him/her while rotating into alignment with

the target azimuth. Once the participant was aligned, he/she stepped forward until standing 0.25 meters in front of

the target (placing it within comfortable arm’s reach). The experimenter then guided the participant's hand to rest on

the top of the target, and the participant was allowed the 20 sec exploration period. He/she then translated backward

to the origin and was then rotated back to the starting azimuth with the guidance rod.

4) Vision: The initially blindfolded participant stood at the origin. Once the target was in place, he/she was

instructed to lift the blindfold while remaining at the origin, view the target, and then to rotate so that the target was

directly ahead. Once aligned, he/she had 20 seconds to view the target and then was asked to rotate back to the

starting azimuth under visual control and lower the blindfold.

For all conditions, the response phase began after the participant had returned to the starting azimuth following the

20 sec exploration/viewing period for each trial. It is important to note that the participant was informed that he or

she was being returned to the same origin, with the experimenter-controlled (or, with vision, self-controlled) rotation

and toe rest ensuring the same orientation for every trial. Prior to the actual response, the participant was instructed

to sidestep 1 m either left or right (stepping direction was counterbalanced over all trials, conditions, and

6

participants). Once positioned at the drop-off point, as indicated by toe rests for all conditions to facilitate alignment

with the starting orientation, he/she pressed a button on the Wiimote controller held in the left hand to indicate the

start of the response itself. Because the drop-off points differed from the origin, task performance required spatial

updating of the target during sidestepping in order to know the correct direction for walking toward the target. To

perform the blind walking/gesturing response, participants attempted to turn and then walk directly from the drop-

off location to the initially perceived and subsequently updated target location (which could be in front or behind

them) and to place the palm of their right hand at the height where they believed the object, now removed, had been

located (see lower right panel of Figure 1). Immediately upon reaching the target location and indicating its height,

the participant pressed a second button on the Wiimote controller. The button press logged the precise x, y, z

position calculated from the LED affixed to the glove on the right hand. This location constituted the response in the

subsequent analysis. After making the response, the participant was led back to the starting location and once again

was aligned with the starting azimuth. This perception/test procedure was repeated for each of the 24 trials in every

condition, for a total of 96 trials.

Results and Discussion

Prior to analysis, the nominal target locations were converted to measured locations with the LED affixed to the

glove of an experimenter who positioned his hand on top of each of the four targets at the 0° azimuth; this was

necessary to compensate for the fact that the LED affixed to the glove was several centimeters above the top of the

target when the hand was in the correct position. The initial analysis examined the target-to-response distance error

for effects of the order in which participants participated in the haptic exploratory modes (recall that vision was

always administered last). The effect of order was small and inconsistent across exploratory modes (mean error was

61, 60, and 57 cm for orders 1, 2, and 3), and a set of t-tests comparing all possible orders within each exploratory

mode (with N for combinations of order and mode ranging from 3 to 6) produced no significant results.

Accordingly, data were pooled across order for subsequent analyses.

Biases in the Azimuths of the Response Centroids

Figure 3 gives a top down view of the response centroids (averaged over the two heights) as a function of perception

condition, distance of the target, azimuth of the target, and drop off point. The azimuths of the response centroids

exhibit large biases relative to those of the respective targets, depending on response location. The response

centroids from the location to the right tend to be leftward of targets, whereas responses from the location to the left

tend to be rightward of targets. The biases may reflect two different influences--an error in the representation of

target azimuth and an influence of sidestepping on the subsequent response. A factor contributing to representational

error may have been the relatively transparent geometry of the target layout, given the symmetry and small number

of locations; geometric regularities are known to induce response biases in spatial tasks (Huttenlocher, Newcombe,

& Sandberg, 1994). To the extent that sidestepping itself induces error, this tendency is inconsistent with previous

work showing that spatial updating is performed without systematic error (e.g., Loomis et al., 2002; Philbeck,

Loomis, & Beall, 1997). In particular, the influence of sidestepping was not observed in a previous study by the

authors (Klatzky et al., 2003).

The biases that depend on the response location are all approximately symmetric about the axis of the starting

orientation. This observation was confirmed by summing the signed errors perpendicular to this axis for each pair of

symmetric targets and finding that the 95% confidence interval (-3.8 + 9.6 cm, using the 16 symmetric pairs as units

of observation) included zero. Accordingly, Figure 4 presents the same data after averaging over the two directions

of sidestepping and averaging by reflection over the two sides of this axis. In this figure, the response azimuth tends

to be pulled toward the axis of the starting orientation both for targets off to one side, whether in front and in back.

These biases, after averaging out the systematic error due to response location, are evidence of residual influences in

how the target azimuth is represented. First, the degree of bias is much greater for targets off to one side than for the

targets directly ahead and behind. Average azimuthal errors for the folded data are 2.0º, -2.9º, -9.2º, and 8.0º for

targets with azimuths of zero, 180º, 45º, and 135º, constituting an approximately 4:1 ratio of error for oblique

relative to sagittal targets. Second, the centroids for vision are considerably closer to the targets than are the

centroids for the other three conditions. The average centroid-to-target distances are 12.1 cm for vision, versus 24.7

cm for hand touch, 21.3 cm for long pole, and 18.4 cm for short pole. There is good reason to expect that perception

of azimuth is more accurate with vision, for participants were able to see the separation between the starting

orientation and the target azimuth both before and during the rotation to face the target as well as to sense the body

7

rotation using proprioception. In the other three conditions, the only basis for perceiving target azimuth was to use

proprioception to sense the passive rotation of the body, as guided by the experimenter.

Figure 3. Top down view of the centroids of the response locations (averaged over the two heights), as a function of

perception condition, target location, and drop off points (to the left and right). The same symbol is used for the drop-off

centroid and drop-off point (DOC and DOP, respectively), as the locations of the latter immediately left and right of

center are clear.

Figure 4. Top down view of the response centroids, averaged over the two heights, over the left and right drop off points,

and averaged over left and right targets, after folding over the axis of the initial orientation (0°).

8

Accuracy and precision of perceiving and updating targets varying in height and distance

As mentioned in the introduction, our primary concern is with the accuracy with which distance and height are

perceived using extended touch. Given the clear azimuthal biases just discussed, our subsequent analysis

concentrates on the perception of distance and height. For this subset, analyses of exploratory-condition order again

showed no consistent effect. As is apparent in Figures 3 and 4, errors in distance (i.e., difference between walked

distance and origin-to-target distance) tended to vary little across values of azimuth. ANOVAs on signed error

conducted within each perception condition, with factors of target azimuth and target distance, confirmed the

absence of effects involving azimuth for all perception conditions but for vision, where the main effect of azimuth

was significant, F(5,13) = 2.83, p = .019, η2

p = 0.10. The tendency in the vision condition was for responses along

the 0° azimuth to be more accurate than along the obliques (average signed error = 1.7 cm vs. 9.6 cm, respectively).

We therefore chose to analyze the accuracy of distance and height responses for only this azimuth. If we assume that

participants erred in the direction of walking as a result of sidestepping, the distances of the responses would be

slightly underestimated if we simply took the coordinates of the responses in the z direction. Instead, we computed

the Euclidean distance of each response from the origin, taking into account both the x and z coordinates. By

partialing out azimuthal error in this way, we effectively defined a modified response point that lies within the yz

plane. Subsequent calculations of the centroids and response errors are based on these slightly modified response

points. Figure 5 shows, for each perception condition, a side view of the targets, the centroids of the responses for 0°

azimuth, and the respective standard errors of the mean for both height and distance.

Figure 5. Side view of the targets and responses in each of four perception conditions. These data are based on only the

responses to the targets at 0° azimuth. Here a side profile of a “participant” is shown and in front are the four possible

physical targets (1.25m and 1.75m distances in combination with 0.5m and 1.5m heights). Because of slight azimuthal

errors, the centroids depicted here are based on the heights of the responses and their distances measured from the origin.

Associated with each centroid are the standard errors of the means in the height and distance directions. There were 14

participants in this experiment. Included in the panel for Vision is the mean gestured location ("centroid") to a target

viewed under reduced-cue viewing, showing the substantial distance overestimation typical of near targets (data from

Figure 5c of Ooi et al., 2006).

We would like to draw conclusions about the accuracy and precision with which targets are perceived from the

accuracy and precision of the responses. Clearly, the responses reflect more than perception, for the participant must

remember its location, sense his/her motion through space, update the remembered location, and respond by hand

gesturing. The accuracy and precision of the responses reflect all of these sources of variation (for discussion, see

Loomis & Philbeck, 2008). However, to the extent blind walking and gesturing is calibrated for the near

9

environment for vision under full distance cues, as appears to be the case (Ooi & He, 2006; Wu et al., 2004), the

precision and accuracy of the responses place lower limits on the precision and accuracy of perception and allow

conclusions about the relative accuracy and precision of the four different modes of perception.

The responses to visual targets in Figure 5 constitute a gold standard for performance in our experiment. As in the

full-cue condition in the study by Ooi and He (2006), performance here is very good. More importantly is the

generally high accuracy of the responses for all four perception conditions, as indicated by the proximity of the

centroids to the targets. Especially of interest are the results for the long pole condition, which involves pure haptic

information without translations of the body. For all conditions, updating accuracy was assessed for distance and

height by calculating for each participant the signed error, i.e., the signed distance between the target and the

response either along the distance or height axis. Small values indicate high accuracy. Separate repeated-measures

ANOVAs were conducted on the height and distance errors, with factors of target height, target distance, and

perception condition. The analysis on distance error showed no significant effects. The analysis on height error

showed effects of perception condition, F(3,13) = 3.93, p = 0.015, η2p = 0.23, with means of 1, 9, 7, and 3 cm for

hand, long pole, short pole and vision, respectively. Paired t-tests were used to compare vision to each other

condition. There were significant effects (2-tailed) for long pole (p < 0.05) and short pole (p < 0.01), but not hand (p

> 0.50). Thus, as expected, vision tended to produce more accurate responses. There were also effects of distance,

F(1,13) = 5.79, p = .032, η2

p = 0.31, and a height x distance interaction, F(1,13) = 9.34, p = 0.009, η2p = 0.42,

reflecting some over-estimation at the longer distance and greater height.

We interpret the proximity of all of the centroids to the respective targets in Figure 5 as evidence of accurate

perception in all four conditions. However, because "accurate" is a relative term, it is useful to compare perception

in our experiment with perception in other studies. The panel in Figure 5 for vision includes the results for one

target viewed under conditions of reduced distance cues (from Figure 5c of Ooi et al., 2006). As in the current

study, participants indicated the perceived location of the target using the blind walking and gesturing procedure.

The centroid of the gestured locations, for 8 participants, can be seen to be more or less in line with the target from

the viewing location, indicating fairly accurate perception of direction, but is much farther away, indicating

significant over-perception of distance when distance information is impoverished.

Figure 6. Accuracy of distance perception in the current study and three other studies. The dashed line represents

accurate distance perception. The four lines in the center of the figure give the mean distances of the indicated locations

in the four conditions of the current study after averaging over the two heights used. The results of another extended

touch study are those from Experiment 1 of Chan and Turvey (1991). Large distance overestimation errors occur with

audition in a field outdoors (data from the Dynamic-4 m condition of Figure 3 of Speigle & Loomis, 1993) and with

reduced-cue vision in a darkened room (data taken from the Motion, Binocular condition of Figure 2 of Philbeck &

Loomis, 1997). Both of these latter studies used the blind walking method to measure perceived distance.

10

Other studies using similar methods of response also show evidence of large perceptual errors. Figure 6 gives the

results of two such studies. Speigle and Loomis (1993) had participants perform blind walking to sounds delivered

by loudspeakers in an outdoor field. Even with dynamic distance cues available during a short period of walking

prior to the blind walking response, auditory targets were perceived to be more than a meter farther away than they

actually were. Similarly, when participants viewed glowing targets at eye level in an otherwise dark room, very near

targets were perceived as more distant than they actually were (Philbeck & Loomis, 1997). Figure 6 also gives the

results of another extended touch study (Chan & Turvey, 1991, Experiment 1), obtained using a different response

method. Accuracy was on the same order as that in the current study.

Perceptual precision was assessed for both height and distance by computing the absolute deviation between the

participant’s response and the mean response, averaged over participants. These mean absolute deviations are

closely related to the standard errors for height and distance in Figure 5. Small values indicate high precision. The

ANOVA on height deviations produced effects of height, F(1,13) = 47.39, p < 0.001, η2p = 0.78, and height x

perception condition, F(3,39) = 3.09, p =.038, η2

p = 0.19. The ANOVA on the distance deviations showed effects of

perception condition, F(1,13) = 10.96, p < 0.001, η2

p = 0.46, with means of 21, 30, 17, and 13 cm for hand, long

pole, short pole, and vision, respectively. Paired t-tests were used to compare vision to each other mode. These

were significant (2-tailed) for hand (p <.05) and long pole (p < 0.001), but not short pole (p >.20). Thus, not

surprisingly, vision is slightly more precise.

A secondary purpose of the experiment was to provide evidence that a spatial image is formed of the point of

contact between the probe and the target being explored. While we believe that the usual blind walking/ gesturing

involves updating of a spatial image, requiring the participants to sidestep precluded them from simply making an

estimate of the contact point and performing the blind walking/gesturing response in ballistic fashion. Instead they

were forced to form a representation of the contact point (a spatial image) and update this location in order to

perform the subsequent response. Although sidestepping introduced some systematic bias, the fact that the response

centroids are close to the targets in height and distance is evidence of the ability of participants to form a mental

representation of the contact point.

Experiment 2

Although the instructions to the participant in Experiment 1 were for the purpose of minimizing any auditory cues

produced when the pole struck the target or its supporting stand, weak auditory cues might still have been available

in the two extended touch conditions. If they were, such cues would vitiate any claim we could make about extended

touch conditions based on haptic cues alone. Experiment 2 is a control experiment to determine whether the fairly

accurate and precise performance in the two pole conditions in Experiment 1 was not the result of unintended

auditory cues to the target locations. Here we compared performance with the long and short probe under the same

exploratory conditions as Experiment 1 and under exploratory conditions that eliminated auditory cues. Only direct

walking responses were required, since the issue is whether auditory cues facilitate the perception of target location,

a process that is independent of whether subsequent responses are direct or indirect. If no reliable differences are

observed between conditions, as is expected, then we have good evidence that the results of Experiment 1 were

driven solely by extended haptic exploration with the probes.

Method

Participants. Twelve new participants (6 female), ages 18 to 29 (M = 20.7, SD = 3.1), took part in the study. The

research was approved by the University of Maine’s local ethics committee and written informed consent was

obtained from all participants, who received monetary compensation for their time.

Apparatus and procedure. The apparatus used here was the same as Experiment 1, except that noise cancelling

headphones (Ryobi Tek, model RP4530) fed with white noise were worn during the perception phase on half of the

trials; in pilot testing, the noise level was chosen so as to block out all sounds in the lab, including those made when

the pole contacted the target or stand. The practice, exploration/perception, and testing procedures were the same as

in Experiment 1 except for a few modifications. Because our primary interest here was whether auditory cues might

have influenced extended touch in Experiment 1, we only used the 1-m and 2-m pole conditions. Half of the trials

with each pole were done without auditory masking, as in Experiment 1, and the other half included auditory

11

masking, per above. Practice trials employed masking and did not include experimental target positions. Masking

and no-masking trials were blocked, with long and short probe trials alternating. To keep the number of trials to a

minimum, only targets for 0° azimuth were used. Also, rather than requiring indirect walking from a left or right

drop-off point, as in Experiment 1, testing here was done using direct blind walking from the origin followed by

gesturing of the target position, similar to the method used in the previously mentioned studies of visual distance

perception (Ooi & He, 2006; Ooi et al., 2001, 2006; Wu, Ooi, & He, 2004). As in Experiment 1, the target heights

were 0.5 and 1.5 meters and the target distances were 1.25 and 1.75 meters. In Experiment 2, we added four

distracter trials in each condition, consisting of a 1m high target placed at distances of 1, 1.25, 1.5, and 1.75 meters.

The data from the distracter trials were not analyzed.

Results and Discussion

As in Experiment 1, the results are based on the locations of the 3-D gesturing responses. Here the four conditions

were long pole without noise mask, long pole with noise mask, short pole without noise mask, and short pole with

noise mask. As in Experiment 1, we assessed accuracy by calculating the signed errors between response locations

and the targets for both the distance and height axes, and we assessed precision by calculating the absolute

deviations of the response values from the corresponding means for the two axes. Accuracy is reflected in Figure 7

by the proximity of the response centroids to the targets, and precision is reflected in the size of the error bars

(standard errors of the mean). Separate ANOVAs were conducted to assess whether accuracy and precision were

affected by length of pole (1 m vs. 2 m), presence of a noise mask, which eliminated potential auditory localization

cues, and height and distance of the target. Here we concentrate on effects involving the noise mask factor,

potentially modulated by the pole and target locations.

Figure 7. Side view of the targets and responses in two exploration conditions (long pole, short pole), with auditory cues

present (as in Experiment 1, compare to Figure 5) versus masked by noise. Here a side profile of a “participant” is shown

and in front are the four possible physical targets (1.25m and 1.75m distances in combination with 0.5m and 1.5m

heights). Associated with each centroid are the standard errors of the means in the height and distance directions. There

were 12 participants in this experiment.

The ANOVA on signed error, the inverse of accuracy, showed no significant effects involving noise mask for either

height or distance. The ANOVA on absolute deviations, corresponding to the inverse of precision, indicated that

noise masking, which eliminated auditory localization cues, in some cases slightly improved the precision of the

response height for near targets, as indicated by a significant noise mask x distance interaction, F(1,11) = 5.46, p =

0.039, η2p = 0.33. In the analysis of absolute deviations, a slight increase in precision caused by the noise mask was

evident only with the short pole, leading to a significant interaction of noise x distance x pole length, F(1,11) = 7.14,

p = 0.022, η2p = 0.33.

12

The results for accuracy and precision reveal no facilitation due to the auditory localization cues in this experiment,

there actually being some increase in precision for selected noise masking conditions. We conclude that haptic

information alone is responsible for the generally high accuracy and precision observed with the short and long pole

conditions in both Experiments 1 and 2.

General Discussion

Previous research has demonstrated the ability of people to perceive and update haptic stimuli within arm’s reach.

This is true for both single targets (Barber & Lederman, 1988; Hollins & Kelley, 1988) and configurations of a small

number of targets (Giudice et al., 2009; Giudice et al., 2011; Pasqualotto, Finucane, & Newell, 2005). Rotational

updating of locations perceived beyond arm’s reach, i.e. by touching them with a cane, has also been demonstrated

(May & Vogeley, 2006). The present experiments used the blind walking/gesturing method of Ooi and her

colleagues (Ooi et al., 2001, 2006; Wu et al., 2004) to investigate the ability of people to perceive targets sensed

with a long probe, which extended the reach of the arm by 1 or 2 m. The 2-m pole condition was of greatest interest,

for it involved purely haptic exploration involving the hand and arm.

Because we were primarily interested in the perception of targets varying in height and distance using extended

touch, we focused our analysis more on the targets that were initially straight ahead. In Experiment 1 both accuracy

and precision were best for those trials (Figures 3 and 4); in Experiment 2, which showed that auditory localization

cues were not a factor in extended touch, only straight-ahead targets were used. Figures 5 and 6 give these results for

Experiment 1 and 2, respectively. As expected from prior work (Ooi & He, 2006; Ooi et al., 2001, 2006; Wu et al.,

2004), vision led to responses with the greatest accuracy (Figure 5). Precision was also slightly better. The most

notable result is the remarkably good accuracy and precision with which people perceive targets using a 2 m pole

(Figures 5 and 7). This result extends the research of Chan and Turvey (1991), which showed that people wielding a

probe were able to perceive the vertical distance to a horizontal surface up to 80 cm away. It is also noteworthy that

the short pole and hand touch conditions resulted in comparably good performance. Both of these conditions involve

haptic input from hand and arm and proprioceptive input associated with walking forward to reach the target and

then walking backward to the origin. Previous work investigating haptic learning during ambulation and hand

exploration, where blindfolded participants were guided through a room-sized layout of six sequentially exposed

objects, has shown accurate learning (Yamamoto & Shelton, 2005, 2007).

Although the manipulation of target azimuth in Experiment 1 resulted in significant azimuthal errors (Figures 2 and

3), participants nevertheless showed the ability to update the locations of targets initially lying in all directions about

the starting orientation. Although we did not report the results for height, accuracy and precision of the responses to

height and distance for targets straight ahead were only slightly better than those for the other directions. It is

noteworthy that in all four of the perception conditions, participants were able to update the locations of targets

directly behind them while sidestepping. Indeed, in a study that controlled for differences in body turning during the

perception and response phases, Horn and Loomis (2004) showed that updating performance is about as good in

back as in front.

As discussed in the introduction, phenomenological reports indicate that the most salient aspect of the experience of

contacting a surface or point target with a wielded probe is not the vibrations and forces in the probe but the

perception of the point of contact. We maintain that the participant does indeed perceive the point of contact in

external space, not unlike perceiving the location of a target with the bare hand. Moreover, as other research has

shown with visual and auditory targets (for a summary, see Loomis et al, in press), accompanying the perceived

location is a more abstract spatial representation of the same location (spatial image) that continues even after the

stimulus and its corresponding percept have ceased. The experiments here provide evidence of a 3-D spatial image

corresponding to the perceived contact point between target and probe. In particular, requiring the participants to

sidestep precluded them from simply making an estimate of the contact point and performing the blind

walking/gesturing response in a ballistic fashion. Instead they needed to form a representation of the contact point (a

spatial image) and update this location in order to perform the subsequent response.

Our notion of the spatial image, as built up from extended touch, is a representation of the object at the end of the

probe through a linkage of what is being perceived from the external world and an internal model of the extension

(Loomis, 1992). This idea is somewhat different from other views positing that use of a tool actually leads to a

change in the body schema, such that the representation of the limb expands to encompass the tool (Maravita &

13

Iriki, 2004). In other words, the perceptual-motor expansion of peripersonal space afforded by tool use leads to

expansion of the neural representation of the arm in our body schema, as evidenced by monkeys (Iriki, Tanaka, &

Iwamura, 1996) and humans (Cardinali et al., 2009). While these alterations of body schema are known to occur

after extended training, the current results were demonstrated almost immediately after initial perception with the

probe. We interpret these findings as supporting the development of a spatial image of surrounding space based on

accurate perception of the tool and an internal model of its extent, rather than inducing a more enduring modification

of the body schema. As the spatial image is postulated as representing the perceived contact point of the probe, it

can readily support spatial behaviors after the percept has ceased, such as the spatial updating performance shown in

this paper. This notion is congruent with the phenomenological claim that people experience the point of contact at

the end of the probe, rather than the probe itself (Gibson, 1966; Katz, 1925/1989) and is in agreement with the

suggestion that the tip of the tool is what is being represented, rather than considering it as an extension of the arm’s

representation in the body schema (Holmes & Spence, 2004). This interpretation is also consistent with the view that

we may have separate representations of the hand and tool which are co-registered during context-appropriate actions

(Povinelli et al., 2011).

Acknowledgements

This research was funded by NIH grant 1R01EY016817. The authors thank Mick Ramos for assistance in recruiting

and running participants and Tim McGrath for assistance with coding the experimental scripts.

Footnote

1. Practice with blind walking is not usually provided in studies using blind walking or blind walking/gesturing

techniques to assess distance perception. Even without it, research has shown that the mean indicated distances

using blind walking to visual targets viewed with full distance cues are very accurate (for a summary, see Loomis &

Philbeck, 2008). The ability to perform blind walking and blind walking/gesturing must require some calibration of

the overall scaling factor for walked distance. Normally, this calibration surely is provided by observing the

correspondence between walking speed and observed changes in the visual scene during normal ambulation

(Loomis & Philbeck, 2008, p. 20). Even with such calibration, large systematic performance errors are readily

apparent when distance cues are impoverished, errors that are traceable to errors in perceiving distance (for

summary, see Loomis & Philbeck, 2008). However, because we wished to minimize errors in perceived

displacements associated with blind walking, we chose to provide practice and feedback with only this component

of the response. Thus, the practice served only to precisely calibrate perceived displacements associated with

walking.

14

References

Barac-Cikoja D, Turvey MT (1991) Perceiving aperture size by striking. Journal of Experimental Psychology:

Human Perception and Performance 17: 330-346

Barber PO, Lederman SJ (1988) Encoding direction in manipulatory space and the role of visual experience. Journal

of Visual Impairment & Blindness 82: 99-106

Cabe PA, Wright CD, Wright MA (2003) Descartes’s blind man revisited: Bimanual triangulation of distance using

static hand-held rods. The American Journal of Psychology 116: 71-98

Cardinali L, Frassinetti F, Brozzoli C, Urquizar C, Roy AC, Farne A (2009) Tool-use induces morphological

updating of the body schema. Current Biology 19: R478-479

Carello C, Fitzpatrick P, Turvey MT (1992) Haptic probing: Perceiving the length of a probe and the distance of a

surface probed. Perception & Psychophysics 51: 580-598

Chan TC, Turvey MT (1991) The vertical distances of surfaces by means of a hand-held probe. Journal of

Experimental Psychology: Human Perception and Performance 17: 347-358

Epstein W, Hughes B, Schneider SL, Bach-y-Rita P (1986) Is there anything out there? A study of distal attribution

in response to vibrotactile stimulation. Perception 15: 275-284

Gibson JJ (1966) The senses considered as perceptual systems. Houghton-Mifflin, Boston

Giudice NA, Betty MR, Loomis JM (2011) Functional equivalence of spatial images from touch and vision:

Evidence from spatial updating in blind and sighted individuals. Journal of Experimental Psychology: Learning,

Memory and Cognition 37: 621-634

Giudice NA, Klatzky RL, Loomis JM (2009) Evidence for amodal representations after bimodal learning:

Integration of haptic-visual layouts into a common spatial image. Spatial Cognition & Computation 9: 287-304

Hollins M, Kelley EK (1988) Spatial updating in blind and sighted people. Perception and Psychophysics 43: 380-

388

Holmes NP, Spence C (2004) The body schema and the multisensory representation(s) of peripersonal space.

Cognitive Processing 5: 94-105

Horn DL, Loomis JM (2004) Spatial updating of targets in front and behind. Paidéia 14: 75-81

Huttenlocher J, Newcombe N, Sandberg E (1994) The coding of spatial location in young children. Cognitive

Psychology 27: 115-147

Iriki A, Tanaka M, Iwamura Y (1996) Coding of modified body schema during tool use by macaque postcentral

neurones. Neuroreport 7: 2325-2330

Katz D (1989) The world of touch. (L. E. Krueger, trans). Lawrence Erlbaum Associates (Originally published,

1925). Hillsdale, NJ

Klatzky RL, Lippa Y, Loomis JM, Golledge RG (2003) Encoding, learning, and spatial updating of multiple object

locations specified by 3-D sound, spatial language, and vision. Experimental Brain Research 149: 48-61

Li Z, Durgin FH (2012) A comparison of two theories of perceived distance on the ground plane: The angular

expansion hypothesis and the intrinsic bias hypothesis. iPerception 3: 368-383

Loomis JM (1992) Distal attribution and presence. Presence: Teleoperators and Virtual Environments 1: 113-119

15

Loomis JM, Klatzky RL, Giudice NA (in press) Representing 3D space in working memory: Spatial images from

vision, touch, hearing, and language. In: Lacey S, Lawson R (eds) Multisensory Imagery: Theory & Applications.

Springer, New York

Loomis JM, Klatzky RL, McHugh B, Giudice NA (2012) Spatial working memory for 3-D locations specified by

vision and audition. Attention, Perception, & Psychophysics 74:1260–1267

Loomis JM, Lippa Y, Golledge RG, Klatzky RL (2002) Spatial updating of locations specified by 3-d sound and

spatial language. Journal of Experimental Psychology: Learning, Memory, and Cognition 28: 335-345

Loomis JM, Philbeck JW (2008) Spatial perception with spatial updating and action. In: R.L. Klatzky, M.

Behrmann, & B. MacWhinney (eds) Embodiment, ego-space, and action. Taylor & Francis, New York, pp 1-43

Lotze H (1894) Microcosmus: An essay concerning man and his relation to the world (E. Hamilton and E. E. C.

Jones, trans.). In, vol I, Book V. T & T Clark Edinburgh, p 588

Maravita A, Iriki A (2004) Tools for the body (schema). Trends in Cognitive Sciences 8: 79-86

May M, Vogeley K (2006) Remembering where on the basis of spatial action and spatial language. In: Paper

presented at the International Workshop on Knowing that and Knowing how in Space, Bonn, Germany

Ooi TL He ZJ (2006). Localizing suspended objects in the intermediate distance range (>2 meters) by observers

with normal and abnormal binocular vision. Journal of Vision 6: 422a

Ooi TL, Wu B, He ZJ (2001) Distance determined by the angular declination below the horizon. Nature 414: 197-

200

Ooi TL, Wu B, He ZJ (2006) Perceptual space in the dark affected by the intrinsic bias of the visual system.

Perception 35: 605-624

Pasqualotto A, Finucane CM, Newell FN (2005) Visual and haptic representations of scenes are updated with

observer movement. Experimental Brain Research 166: 481-488

Philbeck JW, Loomis JM (1997) Comparison of two indicators of visually perceived egocentric distance under full-

cue and reduced-cue conditions. Journal of Experimental Psychology: Human Perception and Performance 23: 72-

85

Philbeck JW, Loomis JM, Beall AC (1997) Visually perceived location is an invariant in the control of action.

Perception & Psychophysics 59: 601-612

Polanyi M (1966) The tacit dimension. Doubleday, Garden City, NY

Povinelli DJ, Reaux JE, Frey SH (2011) Chimpanzees' context-dependent tool use provides evidence for separable

representations of hand and tool even during active use within peripersonal space. Neuropsychologia 48: 243-247

Siegle JH, Warren WH (2011) Distal attribution and distance perception in sensory substitution. Perception 39: 208-

223

Speigle JM, Loomis JM (1993) Auditory distance perception by translating observers. Proceedings of IEEE

Symposium on Research Frontiers in Virtual Reality, San Jose, CA, October 25-26, 1993

Turvey MT (1996) Dynamic touch. American Psychologist 51: 1134-1152

Weber EH (1978) Der Tastsinn (D. J. Murray, trans.) Academic Press (Originally published, 1846). New York

Wu B, Ooi TL, He ZJ (2004) Perceiving distances accurately by a directional process of integrating ground

information. Nature 428: 73-77

16

Yamamoto N, Shelton AL (2005) Visual and proprioceptive representations in spatial memory. Memory &

Cognition 33: 140-150

Yamamoto N, Shelton AL (2007) Path information effects in visual and proprioceptive spatial learning. Acta

Psychologica 125: 346-360

Perception of 3-D location based on vision, touch, and ... Perception of 3-D location based on...

Documents

Transcript of Perception of 3-D location based on vision, touch, and ... Perception of 3-D location based on...