Absolute auditory object localization · ITD: KEMAR: KHz: LVP: lIT MVP: Pa: secs: SPL: YR: List of...
Transcript of Absolute auditory object localization · ITD: KEMAR: KHz: LVP: lIT MVP: Pa: secs: SPL: YR: List of...
This item was submitted to Loughborough's Research Repository by the author. Items in Figshare are protected by copyright, with all rights reserved, unless otherwise indicated.
Absolute auditory object localizationAbsolute auditory object localization
PLEASE CITE THE PUBLISHED VERSION
PUBLISHER
© Emily Shotter
LICENCE
CC BY-NC-ND 4.0
REPOSITORY RECORD
Shotter, Emily. 2019. “Absolute Auditory Object Localization”. figshare. https://hdl.handle.net/2134/10833.
This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository
(https://dspace.lboro.ac.uk/) under the following Creative Commons Licence conditions.
For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/
Pilkinglon Library
~ Lol;Ighh.orough • Uruverslty
Author/Filing Title ............... ?H.~:T.T.o;;7). .... ~ ...................... .
Accession/Copy No.
Vol. No ................ . Class Mark ............................................... .
0401524671
11111I 11111111 11111 .
I,
Absolute Auditory Object Localization
by
Emily Shotter
. ,., .
~ . .,..
'. A Doctoral Tlit<sis
Submitted in partial fulfilment of t~~requirements for the award of
~ ~. -.. , . . Doctor-of . Philosophy •
of Loughborougq pI]iversity •. . I
".' .... " .' 'Juiin997
"
© by Emily Shotter
Abstract
This thesis concerns the.potential use of auditory virtual reality (A VR) in safety-critical
situations. Localization accuracy is essential in many VR situations, such as simulated
cockpits, where vision is fully occupied and targets must be signified acoustically.
However, the errors .reported for localizing 3D sounds varies ·considerably in the
literature and some (e.g. Wightman & Kistler, 1989; Wenzel et aI, 1993) report fairly
large errors. This thesis consists of an evaluation of the use of acoustic cues to indicate
the location of certain targets.
A Knowles Electronic Manikin for Acoustic Research (KEMAR) was used to examine
the effects of individualized pinnae on localization accuracy. The results showed that
using our own pinnae over foreign pinnae provides little or no benefit. More
surprisingly, substantial errors were observed in this study. This initial result drove the
fundamental investigation into the large angle errors.
The method of eliciting subject responses was investigated. The findings established
response method as an important methodological feature in localization experiments
from the significant effect it has on the results. Error values can be halved when using
a categorical method, compared to an unguided (non-categorical) method, possibly
because it constrains the subjects' response options. A further possible constraint on
subject responses is the effect of memory in absolute judgement tasks. If the memory
of one sound impinges on subsequent sounds then the subject's judgement is
constrained and the measurement of error may be contaminated. This effect was
studied by introducing variable delays that should affect memory .(0 a different extent.
No obvious differences in accuracy were noted. This rules out 'interstimulus interval'
as a cause for the variability of reported angle errors.
Stimulus types were varied in an effort to maximise acuity. Although broadband
sounds are purported to give the smallest errors (e.g. Stevens & Newman, 1936;
Sandel et aI, 1955), this investigation offered a unique comparison of long and short
duration broadband and complex sounds. But consistently high angle errors forced the
inclusions of non-acoustic cues such as vision and head movements, which decreased
the error to between 0° and 7°.
III
The implicatioos for VR in light of the importance of vision (demonstrated by this
work), are that it is 'not advisable to implement an auditory cueing system that may
conflict or fail to be guided by vision. Where high levels of accuracy are required, as is
paramount in safety-criticalsituations, auditory localizatioo is not sufficient as a sole
cue to target location.
Scientific conclusion: The acoustic cues alone (independent of context) cannot support
accurate auditory localization.
Applications conclusion: It is not advisable to implement an auditory cueing system that
is not guided by vision.
I
IV
Acknowledgements
My sincere thanks go to Prof. Ray Meddis for all of his advice and support and for
being an excellent supervisor.
I wish to express my gratitude to everybody in the Speech & Hearing Laboratory in
Loughborough and the Hearing Research Laboratory in Essex; to Stuart Hunter for his
technical support, to Enrique Lopez-Poveda {or being a true companion. both
academically and socially, to Lowel O'Mardfor his expertise and help with all
computer-based problems and to Roel, for his company.
I would like to thank all in the Department of Human Sciences at Loughborough. For
all of the academic and administrative support I have received, especially during my
final year away. Particular thanks goes to the research students, not ·only for their high
expectations of me, which provided motivation and encouragement, but also for their
social succour.
My gratitude goes to all in the Department of Psychology at Essex University for
making me feel so welcome and for making available all facilities and technical support
I needed.
I would like to thank my family for always believing in me and for giving me the
freedom and opportunity to find my own way in my own time. Thank you for your
enthusiasm and optimism. And finally, to Toby, for his unswerving belief, support,
love and companionship.
,
v
Table of Contents
Abstract .....•.•........................•...........•........•.........•........ iii
Acknowledgements .•.....................•..........•........•..... : •......... v
Table of Contents ....................................................•......... vi
List of Abbreviations and Acronyms ......................................... xi
List of Figures .......................................•.......•................. xii
List of Tables .................................................................. xviii
CHAPTER 1
General Introduction ..............•.......•...•..•..•.........•.....•.. 1
1.1 Motivation ......................................................... 1
1.2 Objectives .......................................................... 2
1.3 Original contributions ............................................ 5
1.4 Overview of the thesis ............................................ 7
CHAPTER 2
Background and Literature Review ....•...•..•..••..........•.•..•.•. 12
2.1 Introduction to localization ....................................... 12
2.2 Pinna effects ....................................................... 15
2.3 Head movements ................................................. 18
2.4 Vision .............................................................. 19
CHAPTER 3
Methodologies .......................................................... 22
3.1 Introduction ........................................................ 22
3.2 Headphones and tubephones .................................... 24
3.3 Pinna moulding ................................................... 24
3.4 KEMAR recording procedures .................................. 27
3.5 Front-back correction ............................................. 29
vi
CHAPTER 4 The Role of the Pinna in Sound Localization ........................ 30
Abstract. ................................................................. 30
-Introduction ............................................................. 32
Method ................................................................... 34
Results ................................................................... 37
Discussion ............................................................... 41
CHAPTER 5 Localization Judgements in the Azimuthal Plane .................... 43
Abstract. ................................................................. 43
"Introduction ........................................... : ................. 44
Method ................................................................... 47
Results ............................ , ...................................... 50
Discussion ............................................................... 53
CHAPTER 6
Methodologies: Site of Recording, Playback Method and
Pinna Effects ........................................................... 55
Abstract. ................................................................. 55
Introduction ................................. " .......................... 57
Method ................................................................... 59
Results ............... ~ ................................................... 62
Discussion ............................................................... 70
CHAPTER 7 The Effect of Interstimulus Delay and Response Method on
Localization Accuracy .................................................. 73
Abstract. ................................................................. 73
Introduction ............................................................. 74
Method ................................................................... 77
Results ................................................................... 82
Discussion ............................................................... 91
vii
CHAPTER 8
The Effect of 'Stimulus Type and 'Response Method on
Judgement Accuracy .....•............................................. 93
Abstract. ................................................................. 93
'Introduction ............................................................. 95
Method ................................................................... 98
Resul ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 106
Discussion ............................................................... 108
CHAPTER 9
'Live' Relay using the KEMAR Manikin ............................. 112
Abstract .................................................................. 112
Introduction ............................................................. 114
Method ................................................................... 116
Results ................................................................... 119
Discussion ............................................................... 123
CHAPTER 10
Vision and Head Movements in Localization ........................ 125
Abstract .................................................................. 125
Introduction ............................................................. 127
EXPERIMENT 1 ................................................................ 130
Method ................................................................... 130
Results ................................................................... 132
Discussion ............................................................... 134
EXPERIMENT 2 ................................................................ 135
Method ................................................................... 135
Results ................................................................... 141
Discussion ............................................................... 154
CHAPTER 11
Head Movements using the Head Tracker ............................ 156
Abstract. ................................................................. 156
Introduction ............................................................. 158
VllI
Method ................................................................... 1-60
Results ................................................................... 164
Discussion ............................................................... ·167
CHAPTER 12
General Discussion and Conclusions ................................. 169
12.1 'Summary ......................................................... 169
12.2 Discussion and conclusions .............. " .................... 170
12.2.1 Individualized pinnae ................................ 170
12.2.2 Pinnae/no pinnae ..................................... 170
12.2.3 Stimulus type ......................................... 171
12.2.4 Response method .................................... 172
12.2.5 Visual stimuli ......................................... 173
12.2.6 Live/recorded stimuli ................................ 173
12.2.7 Head movements ..................................... 174
12.3 Implications for VR ............................................. 175
12.4 Proposals forfuture work ...................................... 176
References ..................................................................... 179
APPENDIX 1
Pinnae Photographs •...•......•.•............................................. 185
APPENDIX 2
Calculation of transmitted information .........................•............ 188
APPENDIX 3
Responses of the headphones and tubephones to a ·c1ick ..........•........ 190
APPENDIX 4
Trial ordering- Chapter 8 ....••.•........................................... 191
IX
APPENDIX 5
Subject Instructions ........................................................... 192
APPENDIX SA (Chapter 4) ............................................................ 192
APPENDIX SB (Chapter S) ............................................................ 193
APPENDIX SC(Chapter<i) ............................................................ 194
APPENDIX SD.I{Chapter7) .......................................................... 19S
APPENDIX SD.2 (Chapter 7) .......................................................... 196
APPENDIX SE. I (Chapter 8) .......................................................... 197
APPENDIX SE.2(Chapter 8) .......................................................... 199
APPENDIX SF.(Chapter 9) ............................................................. 20 I
APPENDIX SG(Chapter 11) ........................................................... 203
x
ANOVA:
AYR:
DAT:
dB:
dB SPL:
df:
ft:
g:
HRTF:
Hz:
ILD:
ISI:
ITD:
KEMAR:
KHz:
LVP:
lIT
MVP:
Pa:
secs:
SPL:
YR:
List of Abbreviations and Acronyms
Analysis of Variance
Auditory Virtual Reality
Digital Audio Tape
decibel
dB (re 20x 10-6 Pa)
degrees of freedom (from analysis of variance)
feet
grams
Head-Related Transfer Function
Hertz (cycles per second)
Interaural Level Difference
Interstimulus Interval
Interaural Timing Difference
Knowles Electronic Manikin for Acoustic Research
kilohertz
Lateral Vertical Plane
metres
Median Vertical Plane
Pascals
seconds
Sound Pressure Level
Virtual Reality
xi
List of Figures
CHAPTER 3
Figure 3.1: Diagram (not to scale) of the manikin in the centre of a wooden hoop (3
m in diameter), used to support the speakers in the horizontal plane. All speaker
positions were 'fully adjustable. The hoop was supported on wooden struts, which
slotted into heavy metal base units to stabilise and secure construction.
Figure 3.2: Diagram (not to scale) of the speaker set-up for median plane (elevation)
source locations. The arrow shows the direction of movement around the manikin.
The range of possible speaker positions was -'SO° to +3200 elevation (where (J' is
straight ahead at ear-level and 1800 is directly behind).
Figure 3.3: Front-back correction of sound source ("A") judged to be at position "B".
The judgement is first shifted to the opposite hemisphere ("C") then the angle error
from this new shifted position is 'calculated ("0").
CHAPTER 4
Figure 4.1a: Response diagram given to subjects for azimuth judgements (actual
size). The head and horizontal plane are viewed from above. A separate diagram was
used for each response and subjects were free to put the cross anywhere on or within
the circle. Distance was not a variable and was ignored in the results.
Figure 4.1b: Response diagram given to subjects for elevation judgements. The head
is shown in profile and facing the median vertical plane. One diagram was used for
each judgement.
"
xii
Figure 4.2: Mean angle ·error values for azimuth judgements for all subjects
combined. Data is both uncorrected and corrected for front-back azimuth errors.
Statistically significant differences were found between the uncorrected and front-back
corrected data, although no differences were present for the different .pinna conditions
(ANOVA).
Figure 4.3: Mean angle error values for elevation judgements. The results are both
uncorrected and corrected for front-back azimuth errors. No statistically significant
differences were found either between uncorrected and front -back corrected data, or
between the different pinna·conditions.
CHAPTER 5
Figure 5.1: Response diagram (actual size) given to subjects. For each stimulus
sound heard, subjects were forced to place their judgement at one of the speaker
locations (I - 9). Subjects recorded their actual responses on a separate sheet.
Figure 5.2: Matrix showing the total transmission scores for all 16 subjects in the
binaural headphone condition.
Figure 5.3: Information matrix showing the mean angle error values for individual
source positions. The frequency of response and sum error in degrees is given for each
stimulus. (Note that the position numbers (1 - 9) listed along the top and down the side
of the matrix are rotated through +900 to obtain only positive angle error values). The
mean error for each source position is given in the extreme right hand column, with the
total mean angle error shown below.
CHAPTER 6
Figure 6.1a: Response diagram used for azimuth trials. Subjects were instructed to
mark a cross at the point of perceived sound origin.
Figure 6.1b: Response diagram given to subjects for elevation trials. Sound source
locati9n was indicated by placement of a cross, anywhere on the perimeter of the circle
(distance was not a factor in this experiment).
xiii
i<'igure 6.2: The effect of the internal and external recording positions with headphone
and tubephone .playbackfor 'front-back corrected angle·errors. ±2 standard errors are
shown in each case.
Figure 6.3: Spectra of original (internal and external) stimuli with comparisons of
playback through headphones and tubephones.
Figure 6.4: Diagrams of original recording and playback positions. (Internal)
Original shows the microphone at the eardrum location of the manikin. External
original is the signal received by the manikin with the microphone at the meatus
entrance. The playback positions, to human subjects, show the tubephones - at the
eardrum, and headphones - close to the meatus entrance.
CHAPTER 7
Figure 7.13: Blank response diagram given to subjects for the non-categorical
response condition. Subjects marked a cross on a separate diagram for each sound
heard. The diagram is actual size.
Figure 7.1 b: Guidance diagram (half actual size) provided to subjects in the
categorical response condition. Subjects used the diagram in a forced-choice paradigm.
Figure 7.lc: Response sheet used in conjunction with the guidance diagram (Figure
7.1 b). Subjects indicated a response letter from the guidance diagram next to each
stimulus number. A new response sheet was provided for each interstirnulus delay
sequence.
Figure 7.2: Chart showing the mean angle errors of the categorical and non
categorical response methods, broken down into interstimulus delay time. Statistically
significant differences exist {p~O.OI, ANOVA) for all interstimulus delay times
between the two response methods, as shown by the ±2 standard error bars. There are
no significant differences between the different interstimulus intervals within each
response condition.
i
xiv
Figure 7.3: Errors by target angle for each of the interstimulus delay times for the
categorical response condition. No statistically significant differences were found for
judgement accuracy of each .target angle within each interstimulus interval condition
(ANOYA). Random response values (0° to 90° range) are given for the categorical and
non-{;ategorical response conditions to illustrate chance levels.
Figure 7.4: Errors by target angle for all interstimulus delay times for the non
categorical response condition. There were no statistically significant differences for
response accuracy of the target angles within .each interstimulus interval condition
(ANOY A). Random response values for a full 360° range of possible responses are
included to show·chance levels.
CHAPTER 8
Figure 8.1a: Blank response diagram for azimuth with 10° markings around the
circumference. For subjects in the non-{;ategorical condition, either with or without the
judgement aid (marker strip).
Figure 8.1b: Blank response diagram for elevation, with 10° markings. This was the
response sheet provided for the non-categorical response condition.
Figure 8.2a: Diagram of stimulus locations for azimuth. Each subject was provided
with this guidance diagram for reference throughout the categorical condition of the
experiment.
Figure 8.2b: Diagram showing the elevation stimulus locations. This diagram was
provided throughout the categorical response condition for reference.
Figure 8.2e: Response table used in conjunction with the categorical response
condition. The same sheet was provided for all azimuth and elevation trials (6 in total
per subject).
xv
CHAP"FER 9
Figure 9.1: Respense diagram provided to. subjects (I fer each pinna conditien). The
square represents the environment/room in which the stimuli are played, viewed from
above. The head shows the manikin's position at the centre of the room. The
dimensiens are net to scale and ne furniture or fittings are shown.
Figure 9.2: Mean angle errors (front-back corrected) for the two presentatien
cenditions; live and recorded, fer pinna and ne pinna. ±2 standard error bars are
shown.
Figure 9.3: Percentage of front-back errors for pinna and no pinna for the live and
recerded presentations. ±2 standard errer bars are shown. Altheugh the standard
errors are large, a statistically significant difference.(p~0.05, related t-test) was feund
between pinna and no pinna for the live cendition. There is also a small, statistically
insignificant (ANDV A, f = 3.32, df = 34) interaction between presentation method and
pinna condition.
CHAPTER 10
Figure 10.1.1: Movement ef the head when either restrained by a head clamp or held
still but unrestrained. (NB: The y-axis does not represent absolute angles). The
measurements for azimuth, elevation and roll dimensions were taken simultaneously by
the Head Tracker. ±2 standard error bars are included. No statistically significant
differences (p~.05, related t-test) between the different head restraint conditions were
feund.
Figure 10.2.1: Response diagram given to subjects in all cenditions. Beside each
stimulus number a response letter had to be recorded (A - G), according to the
perceived lecatien ef the sound source.
Figure 10.2.2a: Diagram given to subjects showing the speaker locations in the
herizontal plane. Speakers were spaced at 30° intervals. Subjects heard each sound
source and were required to choose one ef the letters, which represented actual target
locations.
xvi
Figure 10.2.2b: Diagram given.to subjects representing 2IJo speaker spacing in the
horizontal plane.
Figures 10.2.2a-d: 'Confusion matrices showing the pattern of ·responses for the
free-field condition for (a) 30" speaker spacing with head fixed, (b) 30" speaker
spacing with head free, (c) 20° speaker spacing with head fixed and (d) 20° speaker
spacing with head free.
Figures 10.2.2e-h: Confusion matrices for the ring playback condition for (a) 30"
speaker spacing with head fixed, (b) 300 speaker spacing with head free, (c) 2ff'
speaker spacing with head fixed and (d) 20° speaker spacing with head free.
Figures 10.2.2i-l: Confusion matrices showing the response pattern for the booth
playback condition for (a) 30° speaker spacing with head fixed, (b) 30° speaker
spacing with head free, (c) 20° speaker spacing with head fixed and (d) 2IJ0 speaker
spacing wi th head free.
CHAPTER 11
Figure 11.1: Response diagram (actual size) given to subjects. Subjects marked the
numbers 1 to 14 on the diagram - the total number of stimuli per trial. A new
response sheet was provided for each of the 6 trials.
Figure 11.2: Mean angle errors with the Head Tracker switched on and off {or the
three different head motion conditions ('still' but without restraint, a controlled
'movement to the right' of 45° and 'free' movement). A statistically significant
improvement for the 'move freely' condition with the Head Tracker On over all
conditions with the Head Tracker Off. There was also a statistically significant
improvement over the 'head move right' condition with the Head Tracker On (ANOV A,
f = 6.99, df = 2).
, j
xvii
List of Tables
CHAPTER 4
Table 4.1a: Analysis of variance for azimuth. There are no statistically significant
differences between the three different pinna conditions; own, nonindividualized and
individualized. Data is corrected for front-back errors.
Table 4.1b: Analysis of variance for (front-back corrected elevation data). No
statistically significant differences are found between the different pinna conditions;
own, nonindividualized and individualized.
CHAPTER 5
Table 5.1: Individual information transmission scores (in 'bits') for all 16 subjects.
CHAPTER 6
Table 6.1a: Mean angle error values for headphone presentation with pinnae/no
pinnae and internal/external microphone placement for both azimuth and elevation
judgements. Statistically significant results (ANOV A, f = 3.96, df = 27) are in red.
Table 6.1b: Mean error values for tubephone presentation of stimuli. Results are
shown for pinnae/no pinnae and internal/external microphone positions for azimuth and
elevation judgements. Statistically significant differences (ANOV A, f = 3.96, df = 27)
are given in red.
Table 6.2a: Analysis of variance table for azimuth with front-back confusions
corrected. "p/np" represents 'pinna' or 'no pinna' conditions (with or without), "hp/tp"
refers to headphone or tubephone playback method and "i/e" represents internal or
external microphone placement. Statistically significant effects are shown in bold type. t
xviii
Table 6.2b: Analysis of variance table for elevation, front-back corrected data.
"p/np" represents pinna condition (with or without), "hp/tp" refers to headphone or
tubephone playback method and "i/e" r~resents internal or external microphone
position. No statistically significant effects were-found.
CHAPTER 7
Table 7.1: Starting points for all ten subjects for both sequence (different
interstimulus delays) and stimulus position within the sequence. Note that the ordering
for 'sequence' is a quasi-random selection taken from the full range of 4~factorial
permutations. Thus, a different interstimulus delay is used at least once as a starting
sequence, then two randomly picked sequence orders were added to make 6 in total -
the number required to give the full range of stimulus start points.
Table 7.1a: Analysis of variance table for all interstimulus intervals combined
showing the differences between different response methods. A statistically significant
improvement is found for the different (category) response methods.
Table 7.1 b: Analysis of variance table for the 2-second interstimulus interval
condition. A statistically significant difference is found for response method (category)
but not between the different target locations.
Table 7.1 c: Analysis of variance table for the 5-second interstimulus interval
condition. A statistically significant improvement for the categorical response method
was noted.
Table 7.1 d: Analysis of vanance table for the 8-second interstimulus interval
condition. A statistically significant difference was found for response method only.
Table 7.1e: Analysis of variance table for the l2-second interstimulus interval
condition. Statistically significant improvements were found for the categorical
response method.
XIX
CHAPTER 8
Table 8.1a: Summary of overall mean angle-errors for front-back corrected azimuth
results. There are 9 different subjects in each condition: non-categorical (with reference
marker strip in the sound booth, but 'for azimuth only), non-categorical '(with no
reference marker strip) and categorical.
Table 8.1b: Overall mean angle errors for elevation trials. The subjects are the same
in each condition as those in the azimuth trials. Responses are corrected for front -back
errors. The unusually large angle errors are little better than·chance.
CHAPTER 9
Table 9.1: Analysis of variance for Jive and recorded presentations for pinna and no
pinna. No statistically significant effects were found, although the interaction between
pinna effects and presentation method was only marginally insignificant.
CHAPTER 10
Table 10.2.1a: Analysis of variance for all stimulus types combined for -different;
speaker spacings ("spacing") - 20° and 30°, playback locations ("place") - free-field,
ring and booth, and head restraint conditions ("movement") - fixed (clamped) or free.
Statistically significant results are shown in bold type.
Table 10.2.1b: Analysis of variance for the speech stimulus "Chips" for the two
different speaker spacings, three playback locations and two head restraint conditions.
Statistically significant results are shown in bold type.
Table 10.2.1c: Analysis of variance clicks, for the two speaker spacing conditions,
the three playback locations and two head restraint conditions. Statistically significant
results are shown in bold type.
Table 10.2.1d: Analysis of variance for the noise stimulus for the different speaker
spaciIjgs, playback locations and head restraint conditions. Statistically significant
results are shown in bold.
xx
Tables 10.2.2a-c: Mean angleenor values for thefree"field condition are shown in
10.2.2a. Averages are broken down into speaker spacing (30° and 20°), .head restraint
(either fixed in a clamp or {ree to move), and stimulus type ("Chips", Clicks and
Noise). Identical breakdowns are given for subjects listening to pre-recorded stimuli in
the original recording set-up (i.e. with a visual correlate) in (10.2.2b) and for pre
recorded stimuli .played back in the booth (10.2.2c).
Tables 10.2.3a-c: Information analysis for the free-field (a), ring playback (b) and
booth playback (c) conditions. Breakdowns are given in terms of speaker spacing and
head motion conditions. Transmitted information is in bits and the corresponding
number of reliably identified positions, from the total of 7, is also shown.
CHAPTER 11
Table 11.1: Analysis of variance table showing mean angle error values. Head
Tracker status represent the Head Tracker On or Off. Motion refers to the three head
movement conditions; head still, controlled head movement to the right and head
allowed to move freely.
XXI
Chapter 1: General Introduction
CHAPTER 1
General Introduction.
1.1 MOTIVATION
Auditory virtual reality (A VR) is an integral part of the new simulated cockpit
environments. However, within these virtual reality (VR) set-ups, the localization of
auditory stimuli has been poor. This is a problem that cannot be afforded in such
safety critical situations where auditory cues convey critical information, such as
enemy target location or instrument warnings. Auditory information is .particularly
important during high-speed flight since vision may be almost fully occupied, making
sound a critical cue for directing attention. However, there is a real danger of the
signal being misjudged and confusing the user·(rather than aiding them), given the
problems users have had in localizing the signals. There is clearly the need for a
fundamental evaluation of the use of acoustic cues to indicate the location of certain
targets.
The basic aim is to ·establish the accuracy that can be expected in VRsystems,
through localization studies. Auditory localization is the process of determining the
position of sound sources in our external environment using auditory cues. Such a
phenomenon occurs daily as a matter of course, but is usually linked to visual
processes. Thus when we hear a bird singing in a tree, we will often direct our visual
gaze in the rough direction of the song (which is based very much on expectations and
experience as well as our ability to localize the sound) and use our eyes, not our ears,
to pinpoint the source precisely.
Although a significant amount of localization research exists, the errors reported vary
considerably between studies and there is little consistency in the literature. Initially
there is the need to try and resolve some basic issues, like the large mean angle errors ,.
repeatedly obtained by some researchers and the substantial rate of front-back
1
Chapter I: General Introduction
confusions. These fundamental problems must be clarified before addressing the
complex applicati<ms of 3-dimensional sound simulation.
To study auditory localization as a separate phenomenon, hearing must be separated
from vision in an experimental situation. This .is typically done either -by use of a
blindfold, -by presenting sounds that are simulated directly through headphones, or by
using sounds that have been pre-recorded and which are played back in an
environment that does not replicate the experimental 'set-up.
To overcome the problem of misjudgements and front-back confusions, a recently
developed piece of equipment called a 'head tracking device' can be incorporated into
virtual displays of this kind. A Head Tracker is able to monitor movements of the
head and adjusts the auditory cue that is sent to the listener. The listener then
perceives location of a signal altered in accordance with their head movement,
producing a realistic ,response. So far no experiments have investigated the
effectiveness of head tracking devices - a step that will provide valuable comparisons
of 3D audio and head tracked 3D audio !out of the head' sensations - highlighting
which problems (if any) are solved by the introduction of head movements. These
will be of significance not only {o VR set-ups, but to the auditory field as a whole.
1.2 OBJECTIVES
The primary objective is to evaluate the role of spectral cues for localizing sounds.
Attempts will also be made to identify and investigate the fundamental variables
underlying the localization process. The literature does contain a range of localization
research, although within it there are many contradictions, omissions and questionable
methodology in terms of its application to everyday auditory events. This research
will attempt to rectify and resolve many of these problems, some of which are
outlined below, in an attempt to demonstrate what is realistic and achievable in
'virtual' auditory simulations.
One method of generating VR sounds, to produce a 3D localized image, is with a
KEMAR (Knowles Electronics Manikin for Acoustic Research). Another method , uses so-called 'head related transfer functions' (HRTF's) which are discussed later.
2
Chapter I: General Introduction
However, manikin recor-dings are a more-direct method of reproducing the sound and
are therefore more likely to retain what may ·be vital signal content. They also
provide an accurate means of studying the usefulness.ofthespectral content of signals
to localization by retaining realistic spectl·al·profiles in·both ears (Durlach et al; 1992;
Hartmann & Wittenberg, 1996). The KEMA'R is an artificial figure of a head and
torso, with -removable pinnae and artificial ear canals, which are simulated using
Zwislocki Couplers. These are metal ear canal extensions that fit inside the ear
recesses of the manikin and have fittings for microphones to be attached at the
eardrum position. These microphones record stimuli.played from speakers placed at
various locations around a room.
Using this technique, a number of methodological issues will be investigated that may
have a substantial effect on localization accuracy.
• A fundamental aspect of this resear{;h is to reproduce everyday listening
conditions where possible. Thus, all experiments will be conducted in a normally
reverberant large room. A highly reverberant environment would make
localization very difficult due to confounding incident and reflected waves, and an
anechoic environment would cut out valuable (and typical) reflection and
reverberation cues.
• For each experiment a fairly large sample of untrained subjects will be used in an
attempt to produce results representative of the normal population. These contrast
with other auditory experiments with a small number·of subjects
• Pinna-based spectral cues have been demonstrated to play an important role in
elevation discrimination..(e.g. Gardner & Gardner, 1973; Lopez-Poveda, 1996)
and by some researchers it is also deemed important for azimuth judgements
(Freedman & Fisher, 1968). However, it is still unclear whether the pinna are
important for azimuth discrimination as well as elevation. Also, the fundamental
question of whether we need our own pinnae to accurately localize sound remains
unaddressed. The primary aim is to asses a listener's ability to localize with their
own pinna, another person's pinna or with no pinna at alL
• Another major objective is to resolve the problems surrounding response method.
The means of eliciting subject responses varies a great deal in localization studies,
as does the methodology in generaL Siegel & Siegel (1972) highlight two basic
3
Chapter I: General Introduction
ways of measuring accuracy; allowing subjects to either respond with their own
'free' judgement, or to choose from a number of given categories. But within
tbese types, methodologies vary considerably and do not allow for direct
comparisons and thus an understanding of the effect ofresponse method . .Because
using categories involves a great deal of constraint and guidance, the method of
response has apotentiaily hrge influence on judgement accuracy and requires a
controlled comparison. Tied in with response method are the effects of·different
speaker spacings and total angular arc containing the sound sources.Shelton·&
Searle's (1978) results demonstrated these factors to be important, but are unique
in doing so. Therefore the effect of different speaker spacings will also be
examined.
• The different stimulus types used in localization studies may have a large effect.
But from much of the literature, comparing different stimulus types directly is
difficult since it is impossible to disentangle their effect from other
methodological features. A controlled comparison of stimulus types is essential,
therefore, in attempting to provide the most easily identified signal in terms of its
position.
• The effect of head movements will be investigated using another method of sound
simulation. The new method uses HRTF's, which convolve audio stimuli with
pairs of filters that include ITD's, lLD's, ·pinna and ear canal effects. These effects
are measured as the sound enters the ear from a variety of locations. Responses at
resolutions 'finer than those actually measured are estimated using a linear
interpolation technique. Thus the convolvotron produces the same 'surround
sound' effect, without need for making recordings and variables such as stimulus
duration and target locations can be more readily manipulated. However, this
method also adds two computational steps that are unnecessary for manikin
recordings - deriving tbe HRTF and generating new stimuli using these
functions. Nevertheless, using stimuli generated in this way allows a Head
Tracker to be incorporated into the convolving equipment to take account of head
movements. This constitutes the final stage of the research, and it will be of
extreme value to compare the HRTF data with the data from manikin recordings.
• Finally, a comparison of visually aided and visually unaided localization will be
carried out. Throughout, the project aims to asses localization acuity without the
aid of vision - whether an acoustic stimulus provides powerful enough cues to
4
Chapter I: General Introduction
location to be used on its own. Yet if the apparent accuracy of everyday abilities
cannot be matched, a visual element may need to·be introduced. It may well be
that attempting to study auditory-localization alone is akin to separating taste and
smell, in which case its function may be fundamentally reduced if used as a sole
cue. Attempts to measure the effect of vision will be conducted in a freecfield
setting, with and without a visual Enk to the sources of-sound.
1.3 ORIGINAL CONTRIBUTIONS
This thesis offers a number of contributions to the field of auditory localization. The
main areas are outlined below.
• Localization in a normally reverberant-environment is studied. Most studies using
pre-recorded stimuli are either conducted in an anechoic chamber or a sound
deadening room. Others have been conducted in the open air with few reflecting
surfaces nearby, to deliberately reduce echoes. This is because researchers are
generally interested in providing an uncluttered signal for localization. But this
does not represent what normally occurs. Whilst the incident wavefront is
paramount for accurate localization, the reflected waves play an important part in
determining distance and intensity (for example) and form part of our expected
and experienced auditory environment.
• To take account of the complex filtering characteristics, pinnae modelled on a
human subject are used in place of those supplied with the manikin. The
manikin's pinnae are standardised - a characteristic rarely true of an individual's
pinnae and it is of more value ·to use a genuine set of ears, that can be modelled
using a rubber-based substance called Otoform t.
• A thorough breakdown and exploration of the recording-playback relationship and
recording techniques is given in the thesis. This partly involves examining
different stages of the recording process for possible loss of important
information. It also covers the effects of procedural and presentation differences.
I See Chapter 3, Pinna Moulding section.
5
Chapter I: General Introduction
• An investigation into interstimulus .intervalprovided important information about
the constraints -<If memory in absolute judgement tasks. A study by Siegel &
Siegel (1972) argued ·that any memory of a cprevious sound would contaminate
subsequent judgements and the task would no longer -be absolute judgement. In
this thesis, by manipulating the delay between sounds and noting the .effect on
accuracy, it was -possible to establish whether some studies might show increased
accuracy as a-result of constraints imposed by memory of previous stimuli.
• The concept of .eporting localization accuracy was examined by comparing the
commonly-used 'angle error' measurements with an information theory approach.
Angle error gives an average error value, whereas the information transmission
rate is used as an ·estimate of the maximum number of locations which could be
used without confusion. Information analysis may also be important if the
significance of the information coming from a sound source was partly defined by
its location.
• Controlled comparisons of response method and stimulus type and use of realistic,
familiar stimuli and fairly long-duration stimuli are much needed areas of
investigation. Although it is widely accepted that signals with a broad range of
frequencies provide increased localization cues (e.g. Stevens & Newman, 1936;
Sandel et ai, 1955; Wightman & Kistler, 1993), there are a number of broadband
and complex stimuli available. The auditory stimulus itself forms a fundamental
part of every localization experiment and yet there are no genuine indications of
the effects these may have.
• A unique evaluation of the equipment utilised in VR auditory simulations was
conducted. Whilst many studies report the effect of head movements or lack of
head movements, these are usually conducted in free-field situations, which do not
accurately represent VR environments (e.g. Stevens & Newman, 1936; Makous &
Middlebrooks, 1990). Here, 'head-tracked' 3D audio sound was directly compared
to non head-tracked 3D audio, for sounds generated using HRTF's. This
equipment very much represents the technology used in VR systems and although
this is a fast-moving environment, necessary refinements and implementations are
suggested.
6
Chapter I: General Introduction
• Visual cues are .incorporated towards -the.end of the thesis. Although visual cues
·had deliberately been omitted throughout, itbecame·ciear that acoustic cues were
insufficient·to support accurate localization exrept in a -context. Such a context is
typically established by vision and so the effects of a visual correlate were
investigated.
1.4 OVERVIEW OF THE THESIS
A genet"al background and review of the literature covering the area of localization is
in Chapter 2. This provides a framework for the thesis by highlighting the current
status of the literature in several areas of localization research. However,
identification and investigation of the unresolved elements or unidentified variables is
reported in the individual experiment chapters. Chapter 2 not only covers recent
work, but introduces the concept of localization and the development of many current
theoretical issues.
A detailed explanation of several methodological procedures, common to the majority
(if not all) experimental chapters is given in Chapter 3. Firstly, the process of
moulding individual pinnae is described. So-called 'individualized' pinnae can be
used to replace the standard pinnae provided with a KEMAR manikin. Modelling the
pinna involves not only manufacturing a perfect replica of the outer ear, but fitting the
model to the manikin and maintaining the -correct dimensions where necessary.
Secondly, KEMAR recording procedures are described. For all but one experiment,
recordings are made using the manikin. Recordings involve placing the manikin in a
normally reverberant large room amongst a number of speakers. These sound sources
are arranged in the horizontal and sometimes median vertical plane. Typically
between five and nine 'Sound sources are used, although the stimuli and other
variables may change. Since the construction of the recording set-ups is generally
consistent, this chapter gives the reader descriptions, and in some cases diagmms of
the apparatus and equipment used. Finally, the issue of front-back correction is
covered. This occurs during the analysis phase of an experiment and can have a large
impact on the results. A full description of the concept of 'angle error' and front-back
correction for azimuth and elevation is given.
7
Chapter I: General Introduction
The initial investigation is concerned with the mle ·of the pinna (or outer ear) in our
ability to judge the locus of a ·sound source -(Chapter 4). T.hisfundamental issue is
critical to A VR which typicaUy uses nonindividualized HRTF's. Thus, measurements
are taken based on another person's .pinnae, which is the most .practical method since
individual measurements would be gf{)ssly inefficient. However, if using
non individualized pinnae greatly reduces the· potential accuracy of judgements, then
this could have serious implications for systems implementing this method of
generating sounds. An experiment comparing individualized with nonindividualized
and even no pinnae is critical in order to establish the necessity for our own ears.
Manufacturing individualized pinnae is done by taking moulds of the pinna of five
subjects and making identical sets of manikin recordings using the different pinnae.
Each listener is asked to judge the apparent location of the recorded sounds using
either their own pinna or the pinna of the four other listeners. In addition, judgements
are made from recordings made with no pinnae - flat surfaces fitted into the manikin
ear recesses, with a hole at the meatus entrance position. If subjects are significantly
more accurate with their own pinnae then this could have costly implications for the
future of Virtual Reality simulations that rely heavily on auditory cueing.
The errors obtained are surprisingly large in this study and even the use of
individualized pinnae does little to improve matters. A more extensive examination
of the fundamental elements underlying localization is required. But to begin with, it
is necessary to establish exactly how many positions in the horizontal plane could be
identified without confusion, using acoustic·cues alone. This would be particularly
important if the significance of the information coming from a sound source was
partly defined by its location. The problem is approached in Chapter 5 using
information theory (e.g. Attneave, 1959; Edwards, 1969) where listeners are asked to
identify the location of a pre-recorded broadband click which is presented over
headphones or 'in-ear' tubephones (which deliver sound to the tympanum). The
information transmission rate was obtained by asking listeners to judge the location of
the sound from a fixed number of available response choices. The value in bits can be
converted into an estimate of the maximum number of locations which could be used
without error. Information analysis further identifies the response accuracy at each
source location, thus highlighting location-dependent response patterns.
8
,Chapter 1: General Introduction
Some of the issues raised in Chapters 4 and 5, in additi-on to a ·number of new
methodok>gical questions, fonu the·basis of"<:hapters 6 and 7. In an attempt to reduce
the large number of errors, Chapter 6 expiores the techniques used for recording and
playback. The most common and simplest way10 make a recording using a KEMAR
manikin is to use microphones attached to a Zwislocki coupler at a location
corresponding to -the eardrum. The Zwislocki coupler simulates the ear canal.
However, if playback is· through headphones, this creates a mismatch between the site
of recording and the site of playback. In effect, this method causes the sound to pass
through the concha and meatus twice. This problem can be solved in one of two
ways. The playback can be through tubephones, in which case the sound is delivered
close to the tympanum and matches the recording site. Alternatively, the recording
can be made using small microphones placed at the ·external entrance to the meatus
and close to the headphone playback site. Both approaches are explored. Results
from an earlier chapter also drive further investigation into the role of the pinna.
Interstimulus interval and response method are two factors that may have an
important effect on absolute judgement accuracy. Absolute judgement measures the
ability of listeners to judge the position of discrete, isolated sounds. Yet reported
studies rarely examine the extent to which the memory of a previous stimulus can
affect subsequent judgements. This may occur where a response to one stimulus
constrains the response to a subsequent stimulus, thus artificially reducing error
values. This concept was first introduced by Siegel & Siegel (1972). The point at.
which one the memory of one stimulus may interfere with another involves the so
called 'interstimulus interval'. This is the delay between the individual sounds in a
sequence. In Chapter 7 the interstimulus interval is varied from one sequence to the
next and judgement accuracy is examined. If shorter interstimulus delays show
marked increases in judgement accuracy then this would indicate a threshold below
which memory has a strong influence. This would impose constraints on studies of
absolute judgement, by setting a minimum interstimulus interval.
A comparison of the method of eliciting subject responses in a number of reported
studies revealed that for studies using a forced-choice or categorical method, the
apparent accuracy was generally lower than for studies using no guidance or
categories to choose from. Nevertheless, these studies are not directly comparable
since the methodology varies considerably. This experiment offers a controlled
comparison of different response methods.
9
Chapter I: General Introduction
Chapter 8 further investigates the powerful effect that response method was found to
.produce in the previous chapter. This is combined with a .unique comparison 'of the
judgement accuracy of different stimulus types. ~roadband -or complex sounds were
chosen in an attempt to reduce the high lDcalizationerror and ascertain the optimal
signal type for use in Virtual Reality displays. Whilst two of the sounds (clicks and
white noise) are used widely in localization studies, a complex and relatively long
duration speochsound{the wor<l "chips") is rare - vowels and vowel complexes are
more common. 'However, the familiarity and experience of complex speech sounds
should promote maximum acuity, helping to reduce the consistently high errors
obtained, particularly when judging elevation.
The results of Chapter 8 lead {Q a more thorough investigation of the sound
reproduction process, in an attempt to pinpoint a possible source of high angle errors.
Although a number of variables have been examined in an attempt to reduce error
values, few of these refinements have had any effect. Here, the recording process is
eliminated as a source of error by conducting a 'live' relay of the sound through the
manikin, to a subject seated in a remote location and listening through hi-fidelity
tubephones. As a control, these 'live' trials are also recorded and played back to
subjects from a tape, but in identical conditions.
The pinna, whose role for judgements in the horizontal plane remains unresolved, is
also investigated in this chapter. The manikin is therefore fitted with either no pinnae
or nonindividualized pinnae - a pair previously modelled on a human volunteer and
used throughout the thesis for nonindividualized pinna conditions.
The results show that the recording process does not lose any information, since the
accuracy of judgements remains consistent. It therefore appears that the physical
characteristics of the signal are not utilised by the listener sufficiently to ·obtain
adequate localization cues. Therefore, Chapter 10 incorporates two major factors that
have previously been omitted from all experiments - vision and head movements.
However, it has been necessary to exclude these factors so far in order to ascertain the
importance solely of spectral information.
Chapter 10 outlines two experiments. The first is concerned with the possibility that
for sounds recorded on a manikin and played back over headphones in a booth, any
head movements made by the subject will confound the signal. This may be a cause
of inflated angle errors and so must be investigated prior to fully incorporating head
10
Chapter I: General Introduction
movements. A Head Tracker is used .to monitor the movement f<lr a restrained
(clamped) and unrestrained still head. The results showed that the range of movement
for a clamped and non-clamped still head were almost identical, implying that small
head movements made by subjects in the booth would not affect judgement accuracy.
Experiment 2 evaluates the effect of either having a restrained, clamped head or being
able to move the head freely whilst listening. A comparison is also made between
providing a visual link to the sound sources and listening to sounds with no visual
correlate. Stimuli are either played in the 'freecfield, to assess the role 'of head
movements and to include vision, or they are presented in a booth to eliminate the
visual element.
Findings from the free-field investigation provided the motivation for Chapter 11.
The role of head movements, which was not adequately resolved in··the free-field
study, is subjected in this chapter to a more valid and rigorous investigation.
Localized sounds are generated in this experiment using HRTF's, not manikin
recordings. Head movements are incorporated by using a magnetic head tracking
device. This is a more faithful representation of the technique used to generate and
present sound in Virtual Reality simulations. Three conditions are used to evaluate
the effectiveness of head movements using such equipment. Subjects are able move
their head freely as desired in the first condition. The second requires subjects to
make a controlled, specified movement, and in the third the head is kept still
(although it is not physically restrained). For all of these conditions the Head Tracker
is either switched on or off to compare the effect of accounting for, or failing to
account for, different movement patterns on localization acuity.
The final chapter (Chapter 12) summarises and concludes the work covered in the
thesis. A number of issues raised by the thesis are discussed and some strengths and
weaknesses are identified. Some outstanding areas of investigation are identified and
suggestions for approaching these problems are given. The issues raised must be
resolved in order to gain a complete understanding of the psychological and
physiological factors involved in sound localization.
11
--------
Chapter 2: Background and Literature Review
CHAPTER 2
Background and Literature Review
2.1 INTRODUCTION TO LOCALIZATION
With the advent of 'virtual reality' information systems such as simulated cockpit
displays, comes the need for accurate simulation of auditory information as well as
the more obvious visual elements. Perhaps the most-fundamental auditory process in
such systems is localization of sounds, an everyday process of locating sounds within
our e~ternal environment. Localization may be used either to direct visual gaze and
complement fully the visual experience, or as a sole cue for warning or information.
Within current VR systems, localization has presented some problems with accuracy
falling short of apparent 'real life' capabilities. This is perhaps because the subtleties
of localization are either taken for granted or overlooked. We still do not have a
complete knowledge of all of tbe variables involved in localization.
One of the earliest theories of sound localization was first introduced by Lord
Rayleigh (1907). He recognised that if the wavelength of a sound was short relative
to a listener's head, then there would be a 'head-shadow' effect. This 'shadow' would
be cast causing a difference in level between the ear closest to the sound and the ear
opposite the sound - an 'interaurallevel difference' (lLD's). He also noted that the
distance between the two ears would vary, causing 'interaural timing differences'
(ITD's). Rayleigh conducted an experiment with tuning forks and discovered that at
low frequencies a listener was more sensitive to ITD's. This is because the
wavelength is long enough to refract around the head, leaving minimal ILD's. He
thus hypothesized that our ability to localize is governed by ITD's at low frequencies
and ILD's at high frequencies - an idea known as the "Duplex Theory".
12
Chapter 2: Background and Literature Review
A study by Stevens & Newman (1936) ·confirms Rayleigh's findings. They
investigated localization of pure tone bursts on the roof of a building -.the 'Iocation
being chosen to minimise reverberation, giving a more anechoic-type environment but
in free-fiekl conditions. Subjects were required iO estimate the location of the sound
source in the horizontal plane .for a variety ·of frequencies, and results showed that
whilst sounds in the same location in front and behind were often indistinguishable,
left-right judgements were usually reliably accurate to an average error of ±14°.
Stevens & Newman noted that larger errors were rare at very low or high frequencies,
but in the midrange (around 3000 Hz), the error rate rose, indicating two different
mechanisms for sound 10calization; one functioning at very low range frequencies
and the other at high range frequencies, but with neither effectively operative in the
midrange.
Sandel et al (1955) have confirmed such findings regarding midrange inaccuracies,
when using an anechoic chamber. They expanded on the median range issue by
arguing that errors tended to occur between 1500 and 5000 Hz and that the greatest
errors occur at 1500 Hz, not 3000 Hz.
These timing and level differences between the two ears of the incident wavefront
form part of the fundamental framework of directional hearing cues. However, they
are by no means the sole constituents of our ability to localize. Timing and level
differences alone only allow a left-right in location to be perceived. However, these
left-right locations will be heard intracranially (or 'inside the head'), because timing
and level differences alone do not produce an externalised image. The factors
involved externalisation are discussed below. This intracranial perception is known
as lateralization, and differs from localization where the sounds are heard
extracranially, or 'outside the head'.
Most early studies of localization attempted to simulate directional sounds over
headphones by implementing ITD's and lLD's. However, they only achieved
lateralization. In 'real' listening conditions, the sounds are filtered by the head, torso
and outer-ears, or 'pinnae' which causes subtle changes to the stimulus that must be
accounted for in order to produce simulated 3-dimensional sounds.
Batteau (1967) recorded sounds using a metal tube (the width of a head) with
microphones at either end, representing the two ears. The bar was either fitted with
moulded pinnae or with nothing at all. When the recorded signals were played to
I3
Chapter 2: Background and Literature Review
subjects over headphones, the pinna con<!ition produced localized sounds, but where
nothing was used, the sounds were typically lateralized and very poorly judged in
absolute terms. Thus the pinna appears to contribute not only to externalisation, but
also to the general localization of a sound source.
Durlach et al (1992) outline other factors that contribute to externalisation, apartfrorn
the more obvious pinna.(and more minimal head an<! torso) cues. They argue that
head movements can help to identify targets an<! prevent them ·frombeing located
inside or very close to the head. This is done primarily by producing a binaural
change in the stimulus that is 'natural' and corresponds closely.(o a listener's everyday
experience. By holding the head still, unnatural conditions are created, involving the
listener's head position and expectations about a binaural signal, thus weakening
externalisation.
Durlach et al also consider reverberation as a factor in externalisation. They propose
that reverberations somewhat reduce the resolution of direction. However, this
reduction is limited by the so-called 'precedence effect', where the auditory system
enhances perception of the incident wavefront and suppresses subsequent echoes.
Reverberation also aids judgement of distance (which aids externalisation), but only
in reverberant environments. In 'anechoic' (non-reverberant) settings, loudness is the
only available cue to distance, although it is unreliable because the listener must have
an awareness of the original intensity of the sound source. So although reflections are
considered to confuse the listener, they can enhance distance information. It should
be noted, however, that reverberation as a cue to distance can be difficult to resolve.
Woods & Kulkarni (1992)·comparedperceived externalisation of manikin recordings
made in either an anechoic or reverberant setting. They found that the sounds
recorded in a reverberant room produced a far greater perception of externalisation
that those recorded in an anechoic room. Even with the KEMAR's pinnae removed,
the sounds were well externalised for reverberant conditions.
Hartmann & Wittenberg (1996) measured externalization using discrimination tasks
for simulated sounds over headphones. They argued that localizing (as opposed to
lateralizing) depends on the ITD's of low-frequency components (but not high
frequency). But lLD's in all frequency ranges were equally important. They also
demonstrated that it is necessary to deliver a realistic spectrum to each ear and that
14
Chapter 2: Background and ·Literature Review
simply maintaining the interaural spectral level difference is inadequate. A simple
interaural spectral level difference did not produce a well externalised sound.
Thus externalisation appears to depend on a number of features. Most, but not all of
which are essential to 3-dimensional simulated·(or pre-recorded)sound·if they are to
be localized by listeners. However, Durlach et al recognise that externalisation is not
wholly physical and that experimental methodology ·can have a small influence.
Eliminating internalised percepti{)ns can be limited or ruled out by constraining the
available response choices.
2.2 PINNA EFFECTS
Batteau (1967) showed that the pinna plays some role in localization. He
hypothesized that this was due to refraction of the sound by the pinna causing a
transformation that would be unique according to the original source location.
Blauert (1969) offered support for this view by suggesting that the pinna acts as a
filter that attenuates or passes frequencies depending on their direction. Blauert
(1983), Oldfield & Parker (1984a, b) and Wright et al (1974) all went on to establish
the pinna as a direction-dependent filter that did indeed cause spectral changes to an
incoming signal. Hebrank & Wright's (l974b) study also revealed that the
cancellation of reflected sound at certain frequencies by part of the pinna known as
the concha, causes spectral notches that alone may provide azimuth and elevation
cues.
Elevation discrimination has been hypothesized (e.g. Butler, 1969; Gardner &
Gardner, 1973) to be the primary function of the .pinna. Azimuth judgements have
been established by many (e.g. Rayleigh, 1907; Stevens & Newman, 1936; Sandel et
ai, 1955) to be made more on the basis of the dominant interaural difference cues.
However, for elevation, particularly sounds that lie on the median plane, the ITD's
and ILD's are identical since the source is equidistant from both ears. The pinna may
therefore be the principal component of location identification.
15
Chapter 2: Background and Literature Review
Searle et al (1975) examined the role of the .pinna by making physical measurements
of the transfer function fmm a (vertical plane) free-field source to microphone in a
listener's ear canal. They indicated ·thatthere are two independent localization cues
generated by the pinna. The first is a change in frequency response as a function of
elevation in the median vertical plane (MVP), and the second is a disparity between
left and right ear responses, which also changes with elevation angle.·Independent
psychophysical measurements indicate that these pinna cues are detectable by
subjects and that both cues are used in vertical iocalization tasks.
Investigations into the specific effects of -the pinna folds and dimensions ,have also
been conducted by Gardner '& Gardner (1973), who progressively occluded pinna
cavities to see the effect on localization in the median plane. They demonstrated that
localization ability does decrease with increasing occlusion, but found that
localization ability was not uniform. Localization was improved for signals in the
anterior sector of the median plane (as compared to the rear sector), and high
frequency signal content was discovered to be more important for accurate
localization than low frequency content. An experiment by Musicant & Butler (1984)
has shown that the importance of high frequency content is because the pinna
attenuates the high frequency components of a sound, above approximately 9 KHz,
when the stimulus is played behind a subject. This allows listeners to make front-back
distinctions, so providing valuable cues to azimuth as well as elevation and enhancing
localization accuracy.
Butler & Humanski (1992) studied binaural localization of lowpass & highpass noise
in the MVP. Seven speakers were located in a sound-treated room positioned
vertically at 15° intervals between 0 and 90°, 1.2 m from the head. They predicted
that localization performance on lowpass signals would not differ from chance values,
but that for highpass signals, performance would be significantly more accurate than
chance. They argued that this increased accuracy for highpass noise would result
from the availability of pinna cues for higher frequency sound, Their results showed
a mean error of 27° for the 3 KHz lowpass noise. For the 3 KHz highpass noise they
obtained a mean error of just 8°. Since the chance figure was 35°, their hypothesis
was confirmed, reinforcing the view that the pinna plays a dominant role in MVP
localization.
16
-------
Chapter 2: Background and Literature Review
They also compared .binaural and monaural localization of low and highpass noise in
the lateral vertical -piane(L VP). For monaural localization the judgement accuracy
for the highpassnoise was significantly greater than for the lowpass signals (23°
compared to 33°). Fm.binaurallocalization, -the same trend was.observed with errors
of 6° for highpass and 9° {or lowpass signals - both of which were significantly
smaller than the monaural condition overall. They conclude that monaural spectral
cues do contribute toward ·localization accuracy in the L VP up ·to around 45°
elevation. To localize throughout the L VP (beyond 45° elevation), however, requires
interaural timing and level differences in addition to pinna cues.
For the majority of experiments investigating pinna cues, standardised pinnae are
used, although in reality pinna shapes are unique. Freedman & Fisher (1968)
investigated localization with individualized pinna as part of their study. They
proposed that a listener's perception may be hindered by using standard pinnae
beCause the considerable·experience and practice we have with our own ears may be
critical.
Their experiment compared using one's own pinna with using nonindividualized
(standard) pinna and no pinna. The individualized pinna condition involved subjects
listening normally. The nonindividualized condition used IDcm metal tubes to
conduct the sound to the ears, with ·casts of pinnae at the ends of these tubes. The no
pinna condition used sound conducted through the metal tuhes only.
The first part of their study ruled out head movements in order to conduct a pure
evaluation of the role of pinna cues. They found individualized pinna and
nonindividualized pinna to give significantly greater accuracy than no pinna
(nonindividualized .pinna = ±31.6°, no pinna = ±36.5° - surprisingly large results
which are not matched in the literature). However, no differences were found
between using one's own pinnae and standardised pinnae, implying that we do not
seem to require our own ears. A second experiment used the same conditions but this
time head movements were incorporated. Accuracy was similar to the condition with
restricted head movements, but no differences were found between the different pinna
conditions. Therefore, with head movement restricted the pinna appears to provide
important localization cues. But when head movements are incorporated the accuracy
is the same both with and without pinnae. However, the accuracy noted with head
movements was 22.5° overall which is still surprisingly high.
17
Chapter 2: Background and Literature Review
2.3 HEAD MOVEMENTS
Head movements are a factor in Jocalization whose importance was first proposed by
van Soest (1929). He argued that ifone were-to.perceive a sound·from straight ahead,
then the absence of ITD's and llD's means that a-differentiation of front from back is
very difficult (i.e. _0° sounds can often be indistinguishable from 1800 sounds, and
similarly with, say, 30° and 150° sounds). However, if the head is moved, say, to the
right, then sound would reach the left ear first, enabling the listener to disambiguate
its direction. As the majority of studies do not incorporate head movements (unless
conducted 'live' in the free-field as opposed to being pre-recorded and presented over
headphones) front-back azimuth confusions are commonplace (e.g. 12% reported by
Wightman & Kistler, 1989; 26% by Wenzel et ai, 1993).
Van Soest's findings were supported·by Wallach (1939 & 1940), who demonstrated
that moving the head during a sound provides cues for several lateral angles for the
same sound source direction. He argued that this sequence of lateral angles will
accurately determine a particular location. An important part of this motion is also a
disambiguation of so-called 'front-back confusions'. Wallach differs from many
researchers.(e.g. Gardner & Gardner, 1973; Musicant & Butler, 1984) in his belief
that the pinna are only important in ·reducing front-back errors in the absence of head
movements, which is not a common occurrence in everyday listening.
The work of Young (1931) describes the importance of head movements in terms of
externalisation. He studied the effect of either a still head or a moving head on the
binaural stimulus pattern and found that where head movements are available,
reliable, accurate, 3-dimensional localizations are possible. But when head
movements are ruled out, only restricted 2-dimensional judgements can be made.
Experiments by Pollack & Rose (1967) investigated the role of head movements in
localization. In one condition they systematically varied the duration of the sound
source and compared situations where head motion was either allowed (generally or
turning to face the sound source) or restricted. Their results revealed that head
movements do assist in localizing a sound source, but only one condition in their
series of studies yielded a significant improvement - when the subjects turned to
face the sound source. Turning to face the sound may aid localization because sounds
are more accurately located in the midline (Mills, 1958; Perrott, 1984). Thus, if
18
· Chapter 2: Background and Literature Review
subjects did not position the sound at the centre of their heads then stimuli would be
judged 'off-centre' and may therefore be less accurate. Yet other findings, such as
those of Thurlow & Runge (1967) {<mnd a clear improvement in localization for all
positions when head movements were available and did not report any specific
movement or location conditions. However, Thurlow & Runge did find that the
improvement, although statistically significant overall, was typically less than 30%.
Yet head movement is still commonly restricted in studies to maintain consistent
input, which may not be entirely representative of a "normal" hearing experience, but
provides a "pure" measure of human localization accuracy using only the spectral
content of a signal and transformations by the pinnae, head and torso.
2.4 VISION
Visual stimulation may also affect the apparent location of a sound. Along with head
movements, it is little researched and often omitted from studies to examine more
subtle physiological effects such as location-dependent spectral changes.
One of the earliest absolute localization studies incorporating vision was conducted
by lackson (1953). He reports two experiments that compare the judgement accuracy
of an auditory stimulus alone and an auditory stimulus accompanied by a visual
stimulus. His first experiment uses 5 bells placed along an arc in the frontal azimuth
plane, spaced at 22.50 intervals. In the first condition, a bell was rung on its own and
subjects had to report the location of the source from a number of options. In the
second condition, the bell sound was accompanied by a light, independent of the
sound source, shone either at the same or a different location. Subjects had to report
the apparent position of both the bell and the light and it was expected that the
presence of the light would alter the perceived location of the bell. Indeed, the
addition of the light increased accuracy from 46% to 60% if the bell and the light
were in identical locations. However, this difference was not statistically significant.
In the second experiment, 7 whistles were placed 300 apart along the same azimuthal
arc. These whistles were either played alone or were accompanied by an unrelated
puff of steam, presented at either the same or a different location. As in the first
19
Chapter 2: Background and Literature Review
experiment, the addition of vision increased accuracy from 62% to 99% if the whistle
and steam were aUhe same position (a statistically significant improvement).
Where the auditory and visual stimuli deviated by 20 - 30° the proportion of correct
responses to the auditory stimulus fell to 38% in the first experiment and just 3% in
the second experiment. Although the percentage of misguided responses to the visual
cues were 43% for the first ·experiment and 97% in the second - vision clearly
overriding ·the auditory cues. At deviations of 45° or more the number of correct
responses to the auditory signal remained similar for experiment I but was higher for
experiment 2 and the responses to the visual stimuli decreased in both the bell and
whistle experiments. Thus as they reach sufficient distance from each other, the cues
are identified correctly by subjects as being separate.
Jackson's study clearly shows that the effect of vision can be strong and even
misleading when localizing a sound source. This is a good illustration of the so-called
"ventriloquism effect" - where the presence of a corresponding visual object can
bias judgements of the perceived location of auditory objects (Pick et ai, 1969).
Lovelace & Anderson's (1993) study also looked at the effect of vision on auditory
location identification, but where no visual information was associated with the
sounds. The apparent location of a speech stimulus was judged by pointing to a
concealed target sound with subjects either being sighted (able to see an arc marked
out with measurements in degrees, but not the sound sources themselves) or
blindfolded. Unsighted subjects made errors of ±6.8°, compared to the sighted
subjects whose average error was ±3.79°- a statistically significant difference.
These findings illustrate that the general presence of vision appears to increase our
ability to localize. Since no visual link is provided with the sound source, general
vision perhaps informs us of more subtle cues about our acoustic environment that
aids localization. Indeed, these results offer support ·to Shelton & Searle (1980),
whose research compared the mere presence of a visual environment with localization
in darkness. Their findings revealed that localization accuracy is marginally greater
in the light than in the dark. However, Lovelace & Anderson went on to ascertain
that their noted improvement may simply reflect using vision to calibrate hand
movement. Thus, the true role of vision in localization remains rather ambiguous and
unresolved.
20
Chapter 2: Background and Literature Review
It is evident that a number of influential variables are encompasse<l within the
localization -process, although much ambiguity surroun<ls the precise role and
contribution of any of these factors, as is clearly demonstrated by {he diversity of
results yielded by studies in -this area. Without more reliable information about such
processes the application of audition to virtual reality systems will.faII a long way
short of producing the 'realistic' sound that is required. Hence these experiments set
out to solidify and expand upon existing knowledge with the aim of further improving
and refining the auditory element of VR displays.
21
Chapter 3: Methodologies
CHAPTER 3
Methodologies
3.1 INTRODUCTION.
This chapter is intended to elaborate on some of the basic methodological·processes
that are involved in constructing a sound localization experiment. Whilst each
chapter describes individual-relevant procedures and techniques, there are methods
which apply to all experiments and would benefit from a more in-depth explanation.
The first section describes the characteristics of headphone and tubephone listening.
In all experiments, pre-recorded or generated sounds are listened to through
headphones and/or tubephones.
The pinna moulding process is given in some detail, since throughout the thesis, non
standard pinnae were used on the manikin, to simulate a normal hearing experience
more accurately.
KEMAR recording procedures describes the experimental conditions and set up and
gives a detailed description of the equipment that is used in almost all experiments.
Another technique referred to throughout the thesis is 'front-back correction'. The
reasoning behind these corrections and an example of the technique for calculating
front-back errors is given.
22
Chapter 3: Methodologies
3.2 HEADPHONES AND TUBEPHONES
The headphones used are£eyer Dynamic Dl 48 'closed', which cut out most
background sounds. These sit over the entire .pinna ·and deliver sound within the
concha, opposite the meatus ·entrance. Headphones are typically used to play back
sounds to subjects {hat have been recorded using a manikin (see section 3.4 below).
However, the'simplest and most common method of making manikin recordings is to
use fittings known as Zwislocki Couplers. These hold the microphones in place at the
eardrum position and create an artificial ear canal. Thus, headphones do not deliver
sound to the point ·of recording and produce a 'double travel' down the ear canal and
additional concha resonance (discussed more fully in Chapter 6). In an attempt to
deliver sound to the exact recording location, so-called 'in-ear' tubephones (Etymotic
ER-2) were used in .place of or in addition to headphones in several experiments.
These are narrow tubes which are inserted into the ear to within 0.5 cm of the
eardrum. They are held in place by a small foam earplug, which sits just inside the
meatus. The tubephones were expected to increase judgement accuracy by retaining a
more faithful reproduction of the original signal.
3.3 PINNA MOULDING
A silicon-based rubber called Otoforml was used for making individual pinna moulds
that could be used to replace the standard KEMAR pinnae2.
Ethical clearance must be {)btained before the process can begin. The subject is first
required to undergo an examination of the outer ear, meatus and eardrum by a trained
technician. If there are any reports of infection or discomfort by the subject or if the
inspection reveals any sign of infection, the process is terminated for that subject. If
no problems arise up to this point, the procedure is fully explained to the subject. If
1 Otofonn-K2. Condensation-Vulcanising Silicone Impression Material with Hardener, Cat. No.
071K2: By P. C. Werth Ltd., 45 Nightingale Lane, London, SWI2 8SP, UK. Fax: 01816757577.
2 See Appendix I.
23
Chapter 3: Methodologies
they are comfortable with the procedure a consent form is 1iigned. The process of
pinna moulding comprises the following stages:
l. Producing a 'negative' mould of the entire ,pinna area and meatus entrance.
2. Making a 'positive' impression of the .pinna using the negative mould as a
cast.
3. Making a mould of the left and right1<EMAR pinna-fittings.
4. ·Fitting the moulded pinna to{heK£MAR mould.
3.3.1 Producing a 'negative' mould of the pinna area
The moulding composite for a single pinna mould is prepared by mixing 80 g of
Otoform with 0.5 ml of hardener. The compound is then transferred into a 100 ml
syringe. Each subject is prepared by placing a small foam 'otostop' into the ear canal,
as far down as is comfortable for ·the subject but several millimetres from the
tympanum. The otostop prevents any Otofonn from coming into contact with the
eardrum and causing damage. It is easily removed by pulling on the 2 cotton strands
that are sewn into the foam and which hang roughly 3 cm outside the meatus
entrance. These strands are ade1:juately strong without being thick enough to interfere
with the modelling process.
The subject's entire pinna is cleaned with cotton wool soaked with alcohol. With the
head on one side, the Otoform compound is squeezed into the ear, starting at the
meatus entrance and working outwards to fill the concha and pinna-flange. The
Otoform is then left for approximately 20 minutes, after which time the cast can be
removed fairly easily, although considerable care must be taken. The process is
repeated for the second pinna and the casts are left to harden ·fully for a further 48
hours.
3.3.2 Making a 'positive' impression of the pinna
Using the fully hardened, but still flexible negative pinna impression, the positive
mould is produced. The cast is lined with a thin film of petroleum jelly to prevent
sticking, since the negative and positive moulds are made of the same substance. This
time the composite comprises 30 g of Otoform and 0.15 ml of hardener. The Otoform
24
Chapter 3: Methodologies
mixture is again transferred into a ·100 ml syringe and squeezed carefully into the
entire cast (including the ear canal section), ensuring that no air-bubbles form. The
compound is left -for 18 hours and is {hen removed slowly taking care not to tear or
distort the pinna mould. The positive impression is cleaned and any minor
imperfections or small·tears can be repaired by smoothing in more Otoform mixture.
3.3.3 Making a mould of the KEMAR fittings
The KEMAR has two square recesses on either side of the head. into which each
pinna fits. The new moulds of a listener's ears must therefore be fused to a mould of
this recess (a 'KEMAR-fit') before the whole pinna an be fitted to the manikin.
First, a cast is made of the square recesses using the same Otoform compound that
was used to manufacture the pinna cast (section 3.2.1). This cast includes the ear
canal (simulated using a Knowles 'OCCluded-ear simulator, model DB-lOO). A
KEMAR mould was then produced from this cast using a hard substance (lsopon Car
Body Filler). This substance sets completely hard to produce an accurate
representation of the manikin recesses that can be used for all subsequent pinna
moulds. A KEMAR-fit is then manufactured by filling the hard cast with the Otoforrn
substance.
3.3.4 Fitting the moulded pinna to the KEMAR mould
The pinna mould must now be fixed to the KEMAR-fit. Having two separate casts is
advantageous because the two can be .fused -together maintaining the correct angle of
the subject's pinna.
Excess Otoform is removed by making a cut around the pinna impression. The hard
KEMAR cast is then filled with more Otoform mixture (30 g Otoform, 0.15 ml of
hardener) and the pinna impression is placed on top at the correct angle. The pinna is
then pushed down into the Otoforrn substance, causing the mixture to overflow the
KEMAR cast. A spatula is used to remove this overflow and smooth down the
Otoforrn to fuse perfectly with the pinna mould. This is left to set for approximately
48 hours. Once removed, the ear canal entrance must be perforated (to remove a thin
layer of Otoforrn) using a circular chisel with a diameter of 7.5mm to match the
25
Chapter 3: Methodologies
diameter of the ear canal simulator. The completed -pinna moulds can then either be
fitted into the left or right recesses on the manikin head. It should be noted that the
two manikin recess are not identical and it was therefore critical to obtain two
separate KEMAR {:asts.
3.3.5 Flat Pinna 'Replac-ements ('Infills')
Flat surfaces were frequently used to represent hstening with no pinnae. These were
manufactured by squeezing Otoform into theKEMAR moulds but without attaching a
pinna mould. Thus a flat rectangular square, flush with the manikin's head, was
produced, with a hole at the meatus entrance. This hole is slightly funnel-like to
smooth the sound pathway from the head surface into the ear canal.
3-4 KEMAR RECORDING PROCEDURES
A KEMAR (Knowles Electronic Manikin for Acoustic Research) was used to make
the stimulus recordings in all but one experiment {this single case used computer
generated HRTF's3). The fibreglass manikin consists of a head and torso positioned
on a rotating base, manufactured by a technician at Loughborough University. The
manikin stands 5 ft 10 in tall and had the approximate size, build and head dimensions
of an average male. The manikin is also supplied with standardised left and right
pinnae, made of a vulcanised rubber. These slot into recesses at either side of the
manikin's head. Each recess has a hole at the centre, representing the meatus
entrance. Inside the head, microphones can either be connected directly to the inside
of the hole (meatus entrance position) or they can be attached to the end of simulated
ear canals (Zwislocki couplers) at the tympanum position.
For all experiments the manikin was -placed at the centre of a large normally
reverberant room. Speakers were typically located around the manikin in the
horizontal plane at various locations (all at ear-height) or at various locations on the
3 See Chapter I for a brief explanation of HRTFs and Chapter 11 for a full experimental demonstration
and methodology.
26
Chapter 3: Methodologies
median vertical plane. For azimuth sound sources, separate (matched) speakers were
used (see Figure 3.1). For elevation locations, a single speaker was used for the
majority of experiments. T-his could be rotated to different positions on the median
plane at a constant distance from the manikin (see Figure 3.2). Sounds were played
from these speakers and received by the microphones in {he manikin's head.
The left and right microphones were fed into a.pre-amplifler. From the pre-amplifier,
one of two recording methods was conducted. The first used an amplifier to feed the
sound through a pulse code modulator to digitise the recording onto a Betamax video
cassette. The second method used digital audio tape (DA THed directly from the pre
amp. From either the Betamax or DAT, sounds were transferred onto a computer for
editing, using the Audiomedia software package. Editing involved isolating each
stimulus and deleting any mistakes, talking, interruptions and miscellaneous noises
that had occurred during the recording process. ·From here, sounds were ordered as
required and an interstimulus interval (lSI) of the relevant duration was inserted. The
ISI consisted of a section of 'room silence', recorded during the stimulus recordings,
to produce a continuous and {;onsistent background noise.
Where stimuli were presented live and not in a pre-recorded form, the recording stage
was simply omitted4•
4 See Chapter 9 'Method' section.
27
Chapler 3: Melhodologies
i "m-t 1.2m 1.2m
~
Figure 3.1: Diagram (nol 10 scale) of the manikin in the centre of a wooden hoop (3 m in
diameter), used to support the speakers in the horizontal plane. All speaker positions were
fully adjustable. The hoop was supported on wooden struts, which slotted into heavy
metal base units to stabilise and secure construction.
I.Sm ~
I.S7m o I.Sm
Figure 3.2: Diagram (not to scale) of the speaker set-up for median plane (elevation) source
locations. The arrow shows the direction of movement around the manikin. The range of
possible speaker positions was _500 to +3200 elevation (where 00 is straight ahead at ear
level and 1800 is directly behind).
28
Chapter 3: Methodologies
3.5 FRONT-BACK CORRECTION
The uncorrected angle error ~s calculated by measuring ~he absolute distance (in
degrees) between a subject's judgement and the true location <Jf the sound source.
The 'front-back corrected' angle error is calculated by shifting atl judgements that are
incorrectly placed in ·the front or .ear hemifield to the opposite ~mifield. Although
front-back correction refers to'shifting front-to-back as well as back-to-front, the latter
is more common.
Typically. 0° is taken to mean directly in fmnt of the subject, 90° to {he right of the
subject and 180° directly behind. Thus a target of 45° (marked "A" in Figure 3.3) that
is judged by a subject to be at lWo CB "), would first be shifted to the front quadrant
(in which the target lies), making it 70° (position "C"). This is done by flipping it
about the axis of symmetry defined by the line between 90° and 270° (marked
"AXIS"). Then the distance of this shifted judgement from the target would be
calculated-to give the front-back corrected error of 25° ("D").
TARGET
A/~ D
C -1 CORRECTED RESPONSE
27{)O I-------IIE--+----t 90°- AXIS
Figure 3.3: Front-back correction of sound source ("A") judged to be at position "B". The
judgement is first shifted to the opposite hemisphere ("C") then the angle error from this
new shifted position is calculated ("D").
29
Chapter 4: The Role of the Pinna in Sound Localization
CHAPTER 4
The Role of the Pinna in Sound Localization.
ABSTRACT
Several studies have shown the pinna to assist in the localization of a sound source
(e.g. Batteau, 1967; Freedman & Fisher, 1968). The pinna is primarily considered to
facilitate elevation discrimination (e.g. Butler, 1969; Gardner & Gardner, 1973).
However, subtle pinna cues may be lost or hindered if using unfamiliar pinnae.
This study investigates the benefit of using individualized pinnae compared with
nonindividualized or no pinnae at all. A KEMAR manikin was fitted out with moulds
of each subject's pinnae or no pinnae - flat infills with a hole representing the meatus
entrance. Clicks were digitally recorded using the manikin with microphones placed
at the internal meatus entrance. When played back over headphones in a sound
attenuating booth the recordings gave a realistic 3D sensation. Subjects were then
asked to identify the location of the clicks in the horizontal and vertical planes.
The angle errors for both azimuth and elevation judgements were unexpectedly high.
Simply producing 3D recordings and realistic pinnae is clearly not sufficient to
maximise localization accuracy.
A small but statistically insignificant benefit was found for individualized pinnae over
nonindividualized and no .pinnae for azimuth. This was as expected since interaural
timing and level differences are the dominant cues for azimuth discrimination. No
effect was also found for elevation. This is surprising in view of the purported role of
the pinna - elevation determination. However, the failure to obtain a significant
result may relate to the fact that the sample size was small or perhaps that overall task
difficulty (reported by subjects) masked any subtle pinna effects.
30
Chapter 4: The Role of the Pinna in Sound Localization
INTRODUCTION
The role of the pinna in localization is considered to be particularly important when
judging the elevation of a sound source in the median plane (e.g. Butler, 1969;
Gardner & Gardner, 1973). This .proposal·does seem reasonable since there is a
paucity of interaural time and level differences (ITD's and lID's) for elevation in the
median plane. Although the pinna cannot be ruled out as a factor in judging sound
sources played in the horizontal plane. A study by -Batteau (1967) illustrates this: He
examined the effects of the pinnae on localization accuracy by recording sounds using
microphones which were inserted into moulds of pinnae held onto a bar (representing
the diameter of the head). The recorded sounds were .played to the subjects via high
fidelity headphones. This resulted in the impression of the sounds being "out in
space" and not latera1ized within the head. Subjects were able to make reasonably
accurate judgements of both left-right dimensions and elevation. However, when the
pinnae were removed, judgement accuracy was significantly reduced in both planes.
Batteau reasoned that the role of the pinnae in localization was to facilitate the
production of numerous micro-second delay paths caused ·by the different pinna folds
and cavities. The incoming signal is thus transformed by the pinna and interpreted by
the listener to have originated at a particular point in space, depending upon this
transformation.
Since then, studies have focused more upon the spectral transformations of the sound
that are caused by the pinnae and -less upon time delays. Wright et al (1974), for
example, looked at the effect the ·pinna has on incoming sound, and found that pinna
reflections cause spectral changes which may provide (at least partially) the cues
necessary for localization. A number of similar studies (e.g. Blauert, 1983; Oldfield
& Parker, 1984a) have further established the pinna as a direction dependent filter that
causes spectral-changes that can be used as a cue to the location -of a sound source. In
support of the notion that pinnae are useful for localization in every direction, and not
just in the median plane, is the finding that the pinna attenuates the higher frequency
components of a sound, above approximately 9 KHz, when the stimulus is played
behind a subject. This might enable listeners to make front-back distinctions, thus
providing valuable cues to azimuth as well as elevation (e.g. Freedman & Fisher,
1968; Musicant & Butler, 1984). Shaw & Taranishi (1968) additionally showed that
blocking the ear canal had little effect on the sensitivity to sound source azimuth
31
Chapter 4: The Role of the Pinna in Sound Localization
measured in the ear canal at up to 12 KHz, indicating that the longitudinal resonance
of the ear canal contributes little to direction dependence. This implicates 'further the
need for the pinna in sound source localization.
Investigations into the more specific effects of the-pinna cavities and dimensions have
been conducted by Gardner "& -Gardner (1973), who progressively occluded pinna
cavities to see the effect on -localization in the median plane. They demonstrated that
localization ability does decrease with in{;reasing ~cclusion, but found that
localization ability was not uniform. Indeed, it was better for signals in the anterior
sector of the median plane (as compared to the rear sector), and high frequency signal
content was discovered to be more important for accurate localization than low
frequency content.
The cues provided by the pinna appear to be subtle and complex and may therefore be
altered by using foreign pinnae. Each person's pinna is unique in shape, and many
studies ignore this by using nonindividualized, or standardised, pinnae. Freedman &
Fisher (1968) overcame this problem -by attempting to measure localization accuracy
using individualized, nonindividualized or no pinnae. Sounds were either channelled
through metal tubes (with and without pinna casts attached) or subjects listened
normally. However, to test a{;curately individualized against nonindividualized
pinnae, casts of the listener's own pinnae should also have been attached to the metal
tubes. Nevertheless, they found no difference between the individualized and
nonindividualized pinnae, but both gave an improvement over no pinnae at all.
This study addressed the issue of whether individualized pinnae give greater
localization accuracy than nonindividualized pinnae and no pinnae. However, the
methodology of Freedman & Fisher was improved by comparing identical listening
conditions using different pinnae. Recordings were made using a KEMAR manikin
fitted either with casts of the individual subject's pinnae, or with a standard set of
moulds taken from a non-participant listener. Finally a set of infills were used, which
represented a no pinna condition.
Subjects were required to identify the locus of clicks that had been digitally recorded
using the manikin and were played back over headphones. Judgement accuracy in
both the horizontal and median vertical planes was investigated.
32
Chapter 4: The Role of the Pinna in Sound Localization
METHOD
Subjects
5 male subjects were recruited by opportunity sampling. All were undergraduate
students with no prior experience ·of auditory localization tasks. ·Subjects were
examined for infection using an Otoscope.
Design
A 6*5*5 repeated measures design is used. There were six listening conditions which
were all the same, but consisted of different recording conditions; using
individualized (own) pinnae, no pinna and nonindividualized pinnae. Each condition
consisted of recordings made at 5 azimuths (0°,40°,80°, 140°, 180°) and 5 elevations
(_50°, _25°, 0°, 25°, 50°). Thus 150 clicks were presented in total.
The stimuli for each of the 25 target locations (5 azimuths x 5 elevations) were
recorded through the manikin using 5 different sets of pinnae l (l for each of the 5
subjects) and using infills2 (no pinnae). Each set of 25 sounds was randomised and
the 6 listening conditions were presented to subjects in a random order.
Stimuli
Broadband clicks (with cut-off frequencies of 1 KHz and 17 KHz) were generated
using a Masscomp computer3. The clicks were played through a Radio Spares Wide
Range 6" speaker placed on a 1.Sm wooden pole that was pivoted at the manikin and
could be adjusted to any elevation between _900 and +90° in front and behind.
I Chapter 3 for methods of pinna moulding
2 See Appendix I
3 Although the Masscomp generates a flat spectrum stimulus, when played through the speaker it
becomes distorted and a non-flat spectrum is produced. The signal (played through the speaker) is
therefore channelled back into the Masscomp which then generates an inverse spectrum of the
waveform such that when played through the speaker again, a flat spectrum click is obtained.
33
Chapter 4: The Role of the Pinna in Sound Localization
Azimuth positions were obtained by the use of a rotational device (with 1° azimuth
markings) built into the bottom of the manikin torso.
Breul and Kja:r 4134, OS' microphones were placed into the eardrum position of the
KEMAR manikin. The ear canals were replicated using Zwislocki Couplers, each
with a length of 2.3 cm. The stimuli were recorded at each of the 25 target locations
with the 5 diffcrent sets of pinnae and for ·the infills(no pinna condition), III a
normally reverberant large room.
Stimuli were recorded onto a Betamax video cassette using a pulse code modulator to
digitise the recording. Recorded clicks were sampled by an "Audiomedia" sound
editing package run on a Macintosh Computer for randomisation, which was different
for each subject. A 5-second interstimulus interval of 'room silence' was inserted.
,
Procedure
Stimuli were played through tubephones (Etymotic ER-2) in a sound-attenuating
booth. The 6 conditions were each presented to subjects twice, once for the subject to
make azimuth judgements and once to make elevation judgements. The order of
vertical and horizontal localization tasks was counterbalanced between ·conditions.
Thus, for half of the conditions subjects made azimuth judgements first, and for the
remaining half, subjects made elevation judgements first.
Subjects were first provided with instructions<!. Tbey were then given response sheets
(see Figures 4.la & b) atihe beginning of the first condition for azimuth and·elevation
and instructed in which order to make judgements. After making a set of azimuth and
elevation judgements f(lr one condition, there was a 10 minute break to counteract any
practice and boredom effects. Response sheets for subsequent conditions were
provided at the end of the 10 minute break.
For the onset of each stimulus, subjects were instructed to re-locate their heads to a
forward- facing position by focusing on a cardboard spot straight ahead of them. They
were told to keep their head still during the stimulus but were permitted to move their
head after the stimulus to make their response.
4 See Appendix SA.
34
Chapter 4: The Role of the Pinna in Sound Localization
Front I
I
()
Back
Figure 4.1 a: Response diagram given to subjects for azimuth judgements (actual size). The
head and horizontal plane are viewed from above. A separate diagram was used -for each
response and subjects were free to put the cross anywhere on or within the circle. Distance
was not a variable and was ignored in the results.
Up
Down
Figure 4.1 b: Response diagram given to subjects for elevation judgements. The head is
shown in profile and facing the median vertical plane. One diagram was used for each
judgement.
35
Chapter 4: The Role of {he Pinna in Sound Localization
RESULTS
Mean angle errors were calculated for aU subjects for azimuth and elevation. The
data is represented below -in Figures 4.2 and 4.3, where the mean values for both
uncorrected and front-back corrected ~udgements are given.
The azimuth judgements .produced error values of 16.4° for no -pinnae, 16.7° for
nonindividualized pinnae and 13.7° for own -pinnae. However, analysis of variance
(see Table 4.la) revealed that these differences were not statistically significant. For
elevation, the errors were much larger, even when front-back corrected, giving values
of 42.90 for no pinnae, 44.4° for nonindividualized pinnae and 41.60 for own pinnae.
Again, these differences were not statistically significant (see Table 4.1 b).
The number of front-back errors was similar for all three pinna conditions; 8% for no
pinnae, 9.2% for nonindividualized pinnae and 7.8% for own pinnae. These small
differences were not statistically significant.
36
Chapter 4: The Role of the Pinna in Sound Localization
Anova: Single Factor
SUMMARY
Groue.s Count Sum AveralJ.e Variance own pinna 5 "67.64 13.53 14.21 non individualized pinna 5 80.32 1"-6.06 12.11 no pinna 5 81.8 1·6.36 0.62
Pl'DVA Source of Variation SS df MS F P·value F crit
Between Groups 24.23 ·2 12.12 1.35 0.30 3.89 Within Groups 107.77 1 2 8:98
Total 132.oD 1 4
Table 4.1a: Analysis of variance for azimuth judgements. There are no statistically
significant differences between the three different pinna conditions; own,
nonindividualized and no pinna. Data is corrected for front-back errors.
Anova:Single Factor
SUMMARY
Groue.s Count Sum AveralJ.e Variance own pinna 5 208.04 41.61 98.68 nonin all 5 221.86 44.37 95.13 no pinna 5 217.28 43.46 81.93
Pl'DVA Source of Variation SS df MS F P-value F crit Between Groups 19.81 2 9.91 0.11 0.90 3.89 Within Groups 111)2.99 1 2 91.92
Total 1122.80 14
Table 4.1 b: Analysis of variance for elevation judgements. No statistically significant
differences are found between the different pinna -conditions; own, nonindividualized and
no pinna.
37
70
60 ~
0
~ 50 ... 0 ... 40 ... W
Q) 30
Cl c 20 «
1 0
0
Chapter 4: The Role of the Pinna in Sound Localization
Error Values {or different Pinna Conditions for Azimuth
20 ----If--__ ~
No Pinnae Nonindividualized Pinnae
Target Position
Individualized Pinnae
--Uncorrected -0- FIB Corrected
Figure 4.2: Mean angle error values for azimuth judgements for all subjects combined. Data is both
uncorrected and corrected for front-back azimuth error.;. Statistically significant differences were found
between the uncorrected and front-back corrected data, although no differences were present for the
different pinna conditions (ANOY Al.
38
60 70 I
~ 50 ~
o :: 40 W
Cl) 30 Cl
~ 20
10
o
Chapter 4: The Role of (he Pinna in Sound Localizat.ion
Error Values for different Pinna Conditions for Elevation
No Pinnae
r L..
Nonindividualize<f Pinnae
Target Position
Individualized Pinnae
--Uncorrected -0- FIB Corrected
Figure 4.3: Mean angle error values for elevation judgements. The results are both uncorrected and
corrected for front-back azimuth errors. No statistically significant differences were found either between
uncorrected and front-back corrected data, or between the different pinna conditions.
39
Chapter 4: The Role of the Pinna in Sound Localization
DISCUSSION
This study set out to examine the effect of using individualized, non individualized or
no pinnae on localization judgements. 'f.hemost immediately surprising outcome
were the large angle errors. Several published studies have produced error values
well below those obtained here (e.g. Makous & Middlebrooks, 1990; Stevens &
Newman, 1936). However, fundamental differences exist between this and such
published studies, which may have an important bearing on the angle error.
Many studies that have reported particularly low ermr values have been free-field
experiments. Makous & Middlebrooks (1990) achieved very low angle errors - ±9°,
and although Stevens & Newman's (1936) average was similar to this study -±14°,
errors near the midline were around 5°. These studies were both conducted in the
free-field with -head movements (but no vision) allowed. Although the contribution of
head movements to localization is unresolved, studies have generally found them to
increase acuity quite markedly (e.g. Pollack & Rose, 1967; Schlegel, 1994). For
manikin recordings, no head movements can be accounted for and the error is
expected to be significantly higher. However, the aim was to examine whether the
spectral transformations caused by the pinna are sufficient-to aid localization and thus
head movements would have been a conflicting variable. It should also be noted that
for published studies where head movements were not incorporated (e.g. Wenzel et
aI, 1993, Wightman & Kistler, 1989), errors even larger than those reported here were
obtained (±26° and ±21 ° respectively).
The recordings in this study were made in a normally reverberant room. This ensured
greater ecological validity than an anechoic setting and cepresented the task of the
pinna in everyday hearing conditions. However, Giguere & Abel (1993)
demonstrated that reverberation could reduce accuracy, even for sounds with a brief
onset (such as clicks). Bekesy (1960) also showed that in a non-anechoic
environment, the spatial image of a sound became more diffuse depending on the
distance the sound source was away from the head. Since in this study the
loudspeaker was over a metre away and the recordings incorporated reverberation,
these may well have been factors contributing to errors of judgement.
40
Chapter 4: The Role of the Pinna in Sound Localization
For azimuth, the angle ermrs show a statistically significant impmvement .for front
back corrected data compared to the uncorrected ·data. 'But for elevation there is no
real improvement when the correction is made. Verticallocalization .. provided a much
harder task for subjects than horizontal localization, such that fmnt-back correction
made liule difference. Indeed, task ,difficulty was reinforced both by subjects'
comments subsequent to the experiment and by the results obtained for all three
conditions.
Regardless of actual error values, it was expected ·that using personalised pinnae
would produce the greatest acuity. Since pinna shapes are unique, using unfamiliar
pinnae may produce subtle differences in sound transformation and reduce our
localization accuracy. However, for·both azimuth and elevation the variation in angle
error between conditions was very small, although the general trend shows that using
one's own pinnae pmduces a small improvement over using no pinnae at alL
Nevertheless, ANOV A revealed all differences between conditions to be statistically
insignificant, showing no effect for no pinnae, nonindividualized pinnae or
individualized pinnae. This result contradicts some publicised findings that argue the
importance of the some pinna (over no pinna) in localization (e.g. Baueau, 1967;
Freedman & Fisher, 1968; Musicant & Butler, 1984). Although this study does
contradict these findings, there is support from Freedman & Fisher (1968). Whilst
they found a difference between using pinna over no pinna, they found no
improvement with one's own pinna over another person's pinna. Nevertheless, the
small sample size in this experiment may be the reason for obtaining very liule
difference between pinna conditions.
Finding no differences between the three conditions for azimuth was as expected,
since for azimuth the main cues to location are obtained fmm interaural timing and
level differences between the two ears. For elevation, the findings contradict those
pmponents of spectral theory (e.g. Blauert, 1969; Gardner & Gardner, 1973) who
showed that pinnae are important for elevation discrimination. Such studies argue
that the pinna is useful for elevation in the median plane and this study uses elevation
stimuli that vary in azimuth. Therefore the effect of the pinna are combined with
interaural differences and so their subtle influence may be masked. Yet however
subtle the pinna effects are, the results show that there is only a small variation (if
any) between using individualized or nonindividualized pinnae. It therefore appears
that we don't require personalised pinnae to maintain accurate judgement.
41
Chapter 4: 1be-Rolc of the Pinna in Sound Localization
The pinnae also aid front-back azimuth distinction -by attenuating high frequency
sound when behind the listener. The.flumberDf front-back confusions should
therefore be smaller for the conditions where pinnae are used, compared to using no
.pinnae. However, all three conditions show similar numbers offront-back errors.
Furthermore, the range of 7.8% - 9.2% is small ~ompared to some absolute
localization studies (e.g. Good & Gi1key, 1996, 30% overall mean; Wenzelet aL,
1993,26% overall mean). Although Wightman & Kistler .(1989) obtained similar
values (11 % when averaged) to those in this study. The small percentage of azimuth
reversals offers support to the notion that these are high-fidelity and realistic sounds.
Also, whilst subjects find it difficult to pinpoint the locus of the stimulus, they can
identify the quadrant in which it lies.
This study has demonstrated that despite a small improvement for individualized
pinnae, one's own pinnae do not produce any real benefit and no justification is given
for the time consuming and costly procedure of constructing individual moulds. It
was also revealed that it is imperative to report certain methodological characteristics
when reporting angle errors of absolute judgement tasks, since results are strongly
affected by a number of variables (for example, whether the task is free-field or
recorded and whether the environment is anechoic or reverberant). -It may be that in
this particular task, pinna cues were not being fully utilised by the listener. However,
in other tasks, such as interpreting the effects of head movement, they may be critical.
42
Chapter 5: Localization Judgements in the Azimuthal Plane
CHAPTER 5
Localization Judgements in the Azimuthal Plane.
ABSTRACT
Absolute auditory localization in the horizontal plane was conducled for sound
sources played in a normally reverberant environment, using recordings made with a
KEMAR manikin. A 65dB sharp onset, flat spectrum click, with cut-off frequencies
at 1 KHz and 17 KHz was recorded at 9 locations in the frontal horizontal plane.
When the recordings were played back through headphones at 4 second intervals, a
mean value of 1.74 bits of information was transmitted, corresponding to an average
of 3.34 source locations that can be reliably judged without error in a 1800 arc. These
results indicate that even rich localization cues are not enough to generate auditory
images which are consistently associated with the objective locations of the stimuli
within an azimuthal quadrant.
43
Chapter 5: Localization Judgements in the Azimuthal Plane
INTRODUCTION
Developments in our ability to engineer 'virtual' sounds have been accompanied by
the construction of ·new working environments where .the operator's auditory world
can be completely manufactured and controlled·by'computer. This raises a number of
practical and iheoretical issues concerning the human listener's ability to process
information -deli vered through this new medium.
The basis of the technique is the application 'of spectral transforms to a sound to
generate new auditory inputs for the ·Ieft and ,·ight ear. By using appropriate
transforms ·the sound can be localized 'outside of the head', which contrasts with
normal stereophonic images which are always perceived, unrealistically, to be
lateralized inside the skull·(Gelfand, 1990). Unique transforms are applied for each
different location of the sound source; a technique that achieves a considerable sense
of realism. There are many potential applications of this method, some of which
require the listener to locate a sound source accurately in space. Below we shall be
considering how effectively a listener can utilise location information when the
stimuli are delivered using this approach.
Wightman & Kistler (1989) evaluated simulated sound source judgements by asking
subjects to identify the apparent location of clicks generated artificially using subjects'
individual 'head-related transfer functions' (HRTF's). These HRTF's had been created
earlier by measuring the spectral characteristics of sounds arriving at the subjects' left
and right ears from a range of locations. Subject's mean judgements showed a very
high correlation .(0.982) with the intended location of the sound, indicating that
HRTF's are a high fidelity means of simulating real sounds. Indeed, similar
correlations were obtained (0.95 being the lowest) by Wenzel et al (1993) using
non individualized HRTF's of wideband noise bursts.
These studies, however, showed that single judgements were often well off-target
even though in the long run the averaged judgements were accurate. In both studies
the judgements yielded large mean absolute angle errors: ±21.1 0 for Wightman &
Kistler and approximately ±26° for Wenzel et at. These do not represent a limitation
of the simulation technique because Wenzel et al were able to show that subjects were
equally poor in free-field situations with actual sound sources. Wenzel et al also
report high rates of front-back confusions for both virtual sound presentations and
44
Chapter 5: Localization Judgements in the Azimuthal Plane
free-field stimuli which, if left uncorrected, would further increase the reported error
rate. Such azimuth confusions are a common facet of localization studies, particularly
where stimuli are located'on or near the median plane (Blauert, 1983). Such reports
of listeners' surprisingly poor performance is an ~mportant cha,acteristic of human
sensory judgement and needs to be taken fully ~nto account when designing 'virtual
reality' devices for use with -human operators.
The present study was driven by the simple.question of how many different auditory
locations could be identified without confusion by human operators in the horizontal
plane. This would be important if the significance of the information coming from a
sound source was partly defined by its location. The .problem was approached using
information theory (see Attneave, 1959; Edwards, 1969) where subjects were asked to
identify the location of a click presented over headphones or tube phones. Nine
different locations were used and the number and type of confusions noted. The
information transmission rate was used as an estimate of the maximum number of
locations which could be used without confusion (Attneave 1959 p68).
Stimuli were digitally recorded using microphones placed at the entrance of the
auditory canal of a KEMAR manikin. By playing these back directly to subjects it
was possible to avoid ·the two computational steps of deriving HRTF's and generating
new stimuli using these functions. In this way the fidelity of the click presentations
should be increased. The manikin was fitted out with artificial pinnae which had been
moulded in the laboratory using the ears of a volunteer.
The stimuli were presented over headphones for the majority of subject trials.
However we took the opportunity to study the·effect of using recently developed high
fidelity tubephones which introduce sound directly into the ear canal. Thus they
create a sound which should not be subject to unwanted possible resonance effects
generated in the region of the concha when using headphones.
As a further control, the listener's ability to localise these sounds using only monaural
versions of the clicks (i.e. one ear only) was explored. The monaural spectra do vary
considerably as a function of location and it was thought possible that locations might
be distinguishable purely on the basis of monaural stimulation. The stimuli used were
flat-spectrum clicks recorded in a normally reverberant room. This was intended to
provide a rich range of cues, including interaural level and phase differences In
spectral profiles and temporal cues originating from modest reverberation effects.
45
Chapter 5: Localization Judgements in the Azimuthal P1ane
METHOD
Subjects
An opportunity sample of 14 untrained undergraduate students and 2 academic staff
of varying ages were used. For the tubephone ·condition, 2 subjects used in the
headphone trials were used along with 6 members of academic staff, all of whom
were inexperienced listeners. Four of this group were again used for the monaural
presentations.
Design
Subjects heard 54 recordings of flat spectrum clicks made with a KEMAR manikin.
These randomly presented stimuli were l()cated in one of nine different source
positions varying in azimuth between ±90° in the frontal plane. Subjects were asked
to identify which out of the nine locations they perceived the sound source to be
located.
There were three different conditions: the main trial using binaural headphone
presentation (n=16), binaural ear canal tubephone presentation to check for
headphone reproduction fidelity (n=8) and trials presented monaurally, as a control to
examine spectral content as a localization cue (n=4).
Stimuli
The basic stimulus was a 65dB flat spectrum click, with cut-off frequencies at 1 KHz
and 17 KHz. This sharp onset, broad-band signal gives an optimum indication of
source location through inclusion of ITD and lID cues (see Moore, 1989). The click
was generated using a Masscomp computer' and played through a Kef C35 8"
speaker, placed on a speaker stand. Breul and Kjrer 4134, OS' microphones were
placed into the removable pinnae of a KEMAR manikin at the external entrance of
the ear canal. Although the microphones were placed inside the head, the Zwislocki
, See Chapter 3 Method Section
46
Chapter 5: Localization Judgements in the Azimuthal Plane
coupler was not used. A sound level meter (Breul and Kjrer 2203) with a 0.5"
microphone, was used to measure the received stimulus intensity at the manikin's ear.
Stimuli were recorded in a normally .reverberant r<Jom with recordings made at 9
angles (0, 23,45,68,90,270,293,315, and 338°),1 elevatKln (i:6m - ear height of
the manikin) and 1 distance of 1.83m (6ft). All recordings were made in the front
hemisphere to eliminate the1'roblem of front..IJack confusions.
Stimuli were recorded onto a Betamax video cassette using a .pulse code modulator to
digitise the recof{iing. The recorded tones were transferred to the "Audiomedia"
(sound editing) package on an Apple Macintosh computer to be isolated and re
recorded onto the betamax cassette. Each of 6 repetitions of the set of 9 clicks were
re-recorded in a fixed randomised sequence, identical for each of the 16 subjects. A 4
second silence'was inserted between each stimulus to provide adequate response time.
Procedure
The sound was channelled into a sound-attenuating booth through headphones (Beyer
Dynamic 01 48) or tubephones (Etymotic ER-2) and the volume adjusted such that
the stimulus levels, as received by the manikin during recording, was identical to that
received by subjects through the headphones during playback. Each subject was then
seated in the booth and given a set of instructions2 and a response sheet which
included a diagrammatic representation of the stimulus positions, numbers I to 9 (see
Figure 5.1). Subjects were told to map any stimuli they might hear beltind them to the
corresponding position in the frontal plane.
2 See Appendix 5B.
47
Chapter 5: Localization Judgements in the Azimuthal Plane
5 '6
3 7
1 o 9
Figure 5.1: Response diagram (actual size) given to subjects. For each stimulus sound
heard, subjects were forced to place their judgement at one of the speaker locations (I -
9). Subjects recorded their actual responses on a separate sheet.
48
Chapter 5: Localization Judgements in the Azimuthal Plane
RESULTS
Headphones
Individual subject responses were transcribed into confusion matrices and the
transmitted information was calculated' for each subject (TableS. 1 below).
The mean was found to be 1.74, with a standard deviation of 0.17 and a standard error
of 0.042. The range of 1.55 to 2.20 bits corresponds to a range of 2.93 to 4.60 (mean
3.34) source positions that subjects could reliably locate without error in a 1800 arc.
r-----s1;;;:;----T;;:;;;;-~;.:;;;~r;;;;;:;~~;;,;:;i-1
4
5 1.58
6 1.74
7 1.83
8 1.66
I ~o ::~ ! 11 1.92
12 1.56
13
14
15
1.55
1.65
1.70
16 2.20 ~ ........................................................................................................................................................ ..
Table 5.1: Individual information transmission scores (in 'bits') for all 16 subjects.
3 See Appendix 2
49
Chapter 5: Localization Judgements in the Azimuthal Plane
Tubephones
Binaural tubephone trials yielded a mean transmission value of 1.72 bits, with a
standard deviation 0.33 bits and standard error of 0.12 bits, for the 8 subjects tested.
This corresponds to reliable judgement of a mean of 3.29 source positions - similar to
the above condition of 0.02 bits.
Monaural Control
The control condition gave an average of 0.49 bits which represents a value little
better than chance judgements; accurate placement of 1.4 sound sources.
Pattern of Confusions
Figure 5.2 (overleaf) shows the total combined confusion matrix for all 16 subjects. It
can be seen that right or left general target areas are distinguishable, as is the target in
the midline, yet there is very little accuracy for individual positions beyond this.
Figure 5.3 shows a breakDown of angle error at each of the target locations. The
lowest error (8.44°) is indeed straight ahead at 0° and the highest error (31.4°) was to
the right - 90°.
50
Chapter 5: Localization Judgements in the Azimuthal Plane
RES PO N S ES
1 2 3 4 5 6 7 8 9
P 1 34 35 15 10 1 1
A 0 2 33 41 14 8
C S 3 9 31 35 21
T 4 4 12 23 48 9
U T 5 2 3 67 21 1 2
A 6 27 32 29 8
L 0 7 2 2 11 30 34 17
N 8 1 9 24 45 17
9 2 12 23 38 20
1.74 bits
Figure 5.2: Matrix showing the total transmission scores for all 16 subjects in the binaural
headphone condition.
51
Chapter 5: Localization Judgements in the Azimuthal Plane
Position 1 2 3 4 5 6 7 8 9
Angle -90 -68 -45 -23 0 23 45 68 90 Mean
Pasn. Totals ·E r ro r
Frequency 1 96 24.84·
Frequency 2 96 14.77·
Frequency 3 96 16.41·
Frequency 4 96 15.94·
Frequency 5 96 8.44·
Frequency 6 8 96 26.72·
Frequency 7 ·17 96 20.86·
Frequency 8 17 96 14.53·
Frequency 9 20 96 31.4,.
Total Mean Error = ±19.36°
Figure 5.3: Information matrix showing the mean angle error values for individual source
positions. The frequency of response is given for each stimulus. The mean error for each
source position is given in the extreme right hand column, with the total mean angle error
shown below.
52
Chapter 5: Localization Judgements in the Azimuthal Plane
DISCUSSION
The results clearly show that absolute judgement for the locations of click sound
sources under these conditions is very .poor. The average transmission. rate of 1.74
bits can be construed as a maximum of 3.34 ·locations whioh can be identified without
confusion. Ifthe 180° fr<lntal azimuthal plane is divided into 3.34 sections (each 54°
wide), we can roughly estimate the absolute angular error of judgement to be ±27°.
Thus the target stimuli must·be a minimum of 54° apart to ensure accurate location
judgement.
Inspection of Figure 5.2 clarifies-this point. The centrally located click is identified
with little confusion with its neighbours, but ·clicks to the left or the right side are
heavily confused with nearby locations. It is as if the system can distinguish, in
absolute terms, left from centre from right and liule more. The very small individual
differences across listeners suggests that this may be a reliable aspect of absolute
judgement of sound source location using this technique.
These findings deviate markedly from those of Wightman & Kistler (1989) who
found a high correlation between the intended and objective location of simulated
sound sources using HRTF's, and not just a distinction of left-centre-right dimensions.
However, Wightman's data is derived from points that each represent the centroid of
at least 6 judgements. These individual judgements may be considerably off-target,
but about a central point, such that when averaged they produce a mean judgement
value close to that of the actual target position.
Indeed, Wightman & Kistler obtained a large mean angle error of ±2 1.1 0,
corresponding with the mean angle-error of ±19.36° shown in Figure 5.3. This figure
also ag£ees fairly well with Wenzel et aI's (1993) figure of ±26° obtained using both
free-field and headphone delivered stimuli, but is somewhat higher than Stevens and
Newman's (1936) value of ±14°, even though this data is for free-field. The results
are also higher than in the previous chapter, the reason for which is unclear. Absolute
judgement of pre-recorded sounds (involving labelling the location of a sound source)
is very much poorer than our ability to say whether two successive sounds originate
from the same or separate locations - the 'minimum audible angle' (e.g. Mills, 1958;
Perrott, 1984).
53
Chapter 5: Localization Judgements in (he Azimuthal Plane
The comparison of headphone and tubephones showed no significant difference for
this particular experimental arrangement. Indeed they gave almost exactly the same
result and served as a useful replication of the main findings, although subjects were
much less happy with the use of tubephones because of the discomfort of wearing
them. This similarity of results .is somewhat surprising. Appendix 3 shows the
response of the tubephones, particularly up to IQ KHz, to be more·faithful to the
original stimulus than the -headphones. Between I'D and 13 KHz ·the headphone
response lies closer to the original stimulus, and above this frequency, the tubephones
again corresponds more closely. One would expect the tubephones 'to give improved
accuracy of judgement based on this finding, but as recordings were made at the
external entrance-of the meatus, then .perhaps of the two,·theheadphonespresent the
stimulus atthe nearest point of original recording.
The control condition using only monaural stimuli had to -be abandoned after four
subjects because listeners were unable to make any useful distinctions among the
stimuli. Our initial concern that the head-related spectral differences between the
stimuli might act as a cue were not justified by the results.
The binaural stimuli used were rich in Jocalization cues by virtue of being recorded
using a manikin with exact copies of human pinnae in a normally reverberant room.
It is clear, however, from -the results that these are not enough to generate auditory
images which are consistently associated with the objective locations of the stimuli
within an azimuthal quadrant. Under less controlled conditions, the listener would
normally be exposed to the sound for longer than the duration of a click and would be
able to take advantage of head movements to facilitate 'triangulation' of the source. In
everyday life, such judgements are normally made in collaboration with visual
stimulation with a resultant impression that auditory absolute judgement f{)r sound
source location is better than these results indicate.
54
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
CHAPTER 6
Methodologies: Site of Recording, Playback Method and Pinna Effects.
ABSTRACT
Recordings made using a KEMAR manikin can be used to assess absolute
localization accuracy in the absence of head movements. Judgements using this
technique have been shown to be consistently poor (on average ±25°). The following
study systematically explores various aspects of the recording and playback
techniques used. These include the site of recording (eardrum versus meatus
entrance), playback position (headphones for meatus entrance and tubephones for
eardrum) and whether or not pinnae were used.
Digital recordings were made of white noise bursts of a 1 second duration at 7
azimuths (0° to 180°) all at 0° elevation, and 7 equally spaced elevations (-45° to
+90°) in the median plane. These were made under two conditions: either the manikin
was fitted with pinnae or with 'infills' - a flat surface with a hole representing the
meatus entrance.
Subjects were presented with these pre-recorded noise bursts, over tubephones or
headphones and were required to judge the sound source location. It was expected
that, the tubephones would produce the most accurate localization judgements for
sources recorded at the eardrum. This is because the recording and playback position
would correspond. Similarly, for headphones, the most accurate judgements were
expected with stimuli recorded at the meatus entrance. However, the expected
relationship between recording and playback position was not upheld and no
statistically significant effects were found.
55
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
It was further hypothesized that pinnae would .produce a significant reduction in
judgement errors for elevation trials, compared with infills, but that little effect would
be noted for azimuth. This is because other factors, such as interaural time and
intensity differences, play a principal role here. In'the absence of pinnae, statistically
significant reductions in accuracy were indeed observed for ,elevation judgements but
not azimuth judgements.
The fact that results were not affected by recording or playback location demonstrate
the robustness of this large mean angle 'error. These results therefore have
implications for human interface systems using virtual auditory sounds, since absolute
localization of a sound source seems to be greatly reduced in the absence of certain
cues such as head movements or a visual correlate,
56
---- - ---
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
INTRODUCTION
Increasingly, published studies of absolute localization, in addition to those reported
in previous chapters, are showing large overall mean angle errors. This seems to
conflict not only with other published localization studies, but with our apparent
abilities in the free-field. Stevens & Newman (1936), for example, investigated
localization of pure tones in the free-field, and found mean angle errors of ±14°.
Similarly, Shelton & Searle (1978) have obtained mean errors of ±3° using white
noise bursts, Makous & Middlebrooks' (1990) average errors were between ±2° and
±20° for broadband sounds, and Schlegel's (1994) averages ranged from ±4 to ±100
for pure tones, white noise and clicks.
By contrast, some studies that have compared headphone presented and free-field
stimuli have shown that error values can be high for both. Wenzel et al (1993)
compared headphone presented broadband stimuli with free-field presentations of the
same stimuli. They found an overall mean error of approximately ±26° headphones
and ±24° for free-field - a surprising result in the light of other free-field experiments.
Yet previously, Wightman & Kistler (1989) had obtained very much the same results
in a very similar study.
This inconsistency in mean angle error values may be explained by differences in
design and procedure as well, perhaps, as the inclusion of variables that have hindered
the localization process. Therefore, this study systematically investigated various
aspects of the recording and playback techniques used.
For this absolute localization task (in the azimuth and elevation planes), a I second
white noise stimulus was used. This broad frequency sound provides a varying signal
to aid perceptual judgement. The accuracy with which these judgements can be made
may depend on a number of factors. One fundamental question is whether presenting
playback sounds at the same location as recording improves accuracy. This will be
explored by making recordings using microphones placed both at the meatus entrance
and at the eardrum. Playing back the pre-recorded stimuli through either headphones
(which deliver sound to the outer ear) or tubephones (delivering sound to the
eardrum) should then provide a correspondence to the original recording position -
thus replicating more accurately free-field listening.
57
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
The effects of using pinnae or no pinnae were examined with the specific aim of
illustrating it's .proposed function for elevation discrimination through improved
accuracy with its use. For this purpose either .nonindividualized pinnae;modelled on
a human subject, or 'infills' Ca 'flat surface with a hole representing the meatus
entrance) were used.
58
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
METHOD
Subjects
An opportunity sample of 9 untrained undergraduate students and academic staff, 6
male and 3 female.
Design
Subjects made 168 judgements based on recordings of white noise bursts made using
a KEMAR manikin. These stimuli were presented as two identical blocks of trials,
each block consisted of 84 stimuli, made up of 28 sounds varying in azimuth plus 56
sounds varying in elevation. Subjects listened to one block through headphones and
the other block through 'in-ear' tubephones - the order of which was counterbalanced.
The seven azimuth recordings were all made at 00 elevation and were spaced 300
apart between 00 and 1800 (the right hemisphere). At each of these locations
recordings were made with and without pinnae and with microphones placed at the
internal and external meatus entrances.
Elevation recordings were made at 00 and 450 azimuth at seven locations lying
between -45 0 and +900 at approximately 22.5 0 intervals. Again, for each location,
recordings were made with and without pinnae and with microphones at the internal
and external meatus entrance.
Stimuli
The stimulus was a 25 msec raIriped white noise burst of a 1 second duration, with
cut-off frequencies of 20 Hz and 20,000 Hz. The noise burst was generated using the
Sussex Synthesizer package on an Apple Macintosh and played through a Radio
Spares Wide Range 6" speaker placed on a 1.5m pole pivoted at the manikin. This
could be rotated 3600 through either azimuth or elevation dimensions. Breul and
Kjrer 4134, OS' microphones were placed either at the internal (Zwislocki coupler) or
external entrance of the ear canal of a KEMAR manikin, which was either fitted with
59
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
removable -pinnae or infills (no -pinnae). :For recordings at the ear canal entrance,
microphones were held in place by the external meatus hole in the pinna or infills,
without distortion to the rubber mould.
Recordings were made in anormaHy reverberant -room with the manikin and speaker
placed about the centre. Stimuli were pulse code modulated using a Betamax video
recorder and then transferred to the" Audiomedia" 'sound editing .package. The noise
bursts were isolated and re-recorded onto the Betamax cassette in a random order, but
sub-divided into azimuth and elevation sequences, with a 4 second interstimulus
interval time_
Procedure
The sound was channelled into a sound-attenuating booth through headphones (Beyer
Dynamic D I 48) and tubephones (Etymotic ER-2). Each subject was then seated in
the booth and provided with instructions to read through J. When ready to commence,
subjects were given blank diagrams - different for azimuth and elevation (see
Figures 6.la & b), on which to mark perceived location of each stimulus.
Headphone and tubephone presentations were separated by a one week period, after
which debriefing took place.
J See Appendix Se.
60
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
. Front I
o Back
Figure 6.13: Response diagram used for azimuth trials. Subjects were instructed to mark a
cross at the point of perceived sound origin.
Up
Down
Figure 6.1 b: Response diagram given to subjects for elevation trials. Sound source location
was indicated by placement of a cross, anywhere on the perimeter of the circle '(distance
was not a factor in this experiment).
61
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
RESULTS
The absolute angular error across all variables fDr azimuth was 66,7° for uncorrected
data and 24.6° with front-back errors corrocted2• 'For elevation the mean errors were
55.3° for uncorrected results and 44:6° when front-back corrected. As for azimuth,
stimuli played in the median ..plane in .front and behind at the same position will
-produce the same interaural time and level differences. Confusions will occur as a
result and therefore front-back correction eliminates these errors.
Tables 6.1 a & b overleaf show that more detailed differences exist when the overall
means are broken down into presentation method (headphones or tubephones), pinna
or no pinna and internal or external microphone position.
For both headphones and tubephones a statistically significant difference (ANOY A,
see Table 6.2b) was found between elevation judgements with and without pinna after
correction for front back errors. However, the expected interaction between
microphone location and playback method was not found (see Figure 6.2). For
azimuth, no statistically significant differences were found between pinna and no
pinna, nor was there an interaction between microphone location and playback
method (see Table 6.2a).
Figure 6.3 shows the spectra of the original and playback signals (at 0° azimuth, 0°
elevation, with pinnae). 'Left External Original' refers to the signal played through
the loudspeaker and heard (in the left channel) by the manikin with the microphones
at the external meatus entrance. 'Left Internal Original' is the same signal recorded
by the manikin but with the microphones placed at {he eardrum. 'Left Headphone' is
the stimulus received by the manikin when the pre-recorded signal is played back
through headphones. Similarly, 'Left Tubephone' refers to the signal heard when
played back through tubephones (see Figure 6.4 for 'headphoneltubephone and
internal/external microphone placements).
The tubephones give a high-fidelity reproduction of the original signal. The
headphones, on the other hand, show poor reproduction of both the externally and
internally recorded signals.
2 See Chapter 3, section on front-back correction.
62
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
Headphones
AZIMUTH mean se mean se
( uncorrected) PINNA 64.6 16.3 . INTERNAL 65.39 16.5
NO PINNA 66.1 18A EXTERNAL 65.29 18.2
AZIMUTH mean se mean se
(fIb corrected) P[NNA 23.71 2.2 INTERNAL 24.52 2A
NO PINNA 23.19 2.3 EXTERNAL 22.37 2. [
ELEVATION mean se mean se
( uncorrected) PINNA 48.06 3.7 INTERNAL 52.30 4.5
NO PINNA 59.06 5.5 EXTERNAL 54.82 5.1
ELEVATION mean se mean se
(fib corrected) PINNA 40.[9 3.0 [NTERNAL 45.71 4.1
NO PINNA 48.89 3.0 EXTERNAL 43.36 4A
Table 6.1 a: Mean angle error values for headphone presentation with pinnae/no pinnae and
internal/external microphone placement for both azimuth and elevation judgements.
Statistically significant resuits (ANOV A, f = 3.96, df = 27) are in red.
63
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
Tubephones
AZIMUTH mean se mean se
(uncorrected) PINNA 70.79 17.9. INTERNAL 68.16 17.5
NO PINNA 65.47 17.4 EXTERNAL 68.10 17.4
AZIMUTH mean se mean se
(fib corrected) PINNA 25.58 2.6 INTERNAL 26.51 2.6
NO PINNA 24.53 2.9 EXTERNAL 23.04 2.9
, ELEVATION mean se mean se
(uncorrected) PINNA 54.83 3.8 INTERNAL 59.14 3.7
NO PINNA 59.51 3.8 EXTERNAL 54.58 4.0
ELEVATION mean se mean se
(fib corrected) PINNA 42.45 3.1 INTERNAL 45.80 3.3
NO PINNA 48.21 3.7 EXTERNAL 41.88 3.6
Table 6.1 b: Mean error values for tubephone presentation of stimuli. Results are shown for
pinnae/no pinnae and internal/external microphone positions for azimuth and elevation
judgements. Statistically significant differences (ANOY A, f = 3.96, df = 27) are given in
red.
64
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
Factor Type Levels Values p/np fixed 2 1 2 hp/tp fixed 2 1 2 i/e fixed 2 1 2
. Analysis of Variance for Azim
Source DF SS MS F P p/np 1 8.80 8.80 0.10 0.753 hp/tp 1 36.16 36.16 0.41 0.525 i/e 1 97.26 97.26 1.10 0.298
. hp/tp*i/e 1 3.30 3.30 0.04 0.847 Error 51 4495 .. 62 88.15 Total 55 4641.14
MEANS
p/np N Azim 1 28 24.646 2 28 23.854
hp/tp N Azirn 1 28 23.446 2 28 25.054
i/e N Azim 1 28 25.568 2 28 22.932
Table 6.2a: Analysis of variance table for azimuth with front-back confusions corrected.
"p/np" represents 'pinna' or 'no pinna' conditions (with or without), "hpltp" refers to
headphone or tubephone playback method and "i/e" represents internal or external
microphone placement. No statistically significant effects were found.
65
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
Factor Type Levels Values p/np fixed 2 1 2 hp/tp fixed 2 1 2 i/e fixed 2 1 2
Analysis of Varjance for Eley
Source OF SS MS F P p/np 1 1462.3 1462.3 3.96 0.050
. hp/tp 1 18.0 18.0 0.04 0.834 i/e 1 156.3 156.3 0.38 0.537 hp/tp*i/e 1 0.0 0.0 0.00 0.997 Error 107 43537.4 406.9 Total 111 45174.0
MEANS
p/np N Elev 1 56 41.323 2 56 48.550
hp/tp N Elev 1 56 44.536 2 56 45.338
i/e N Elev 1 56 46.118 2 56 43.755
Table 6.2b: Analysis of variance table for elevation, front·back corrected data. "p/np"
represents pinna condition (with or without), "hp/tp" refers to head phone or tubephone
playback method and "i/e" represents internal or external microphone position.
Statistically significant effects are shown in bold type.
66
~ 0 ~ .. 0 .. ..
(;1;1
'" -OJ)
= -< = '" '" ~
40
30
20
1 0 --
0
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
Relationship between Recording and Playback Location for Azimuth and Elevation
i 1 __ Headphones (fib corrected)
-0-Tubephones (fib corrected)
Internal External
Recording Location
Figure 6.2: The effect of the internal and external recording positions with headphone and
tubephone playback for front-back corrected angle errors_ ±2 standard errors are shown in
each case.
67
Chapter 6: MethodologIes: SIte of Recordmg, Playback Method, and Pmna Effects
Spectra of Original and Playback Signals
60
40
20 Left External Origin a
..J - - - - Left Headphone
Cl. - - - - Left Tubephone en 0 ID q Left Internal Original "0
- 2 0
- 4 0
- 6 0 Frequency
Figure 6.3. Spectra of original (internal and external) stimuli with comparisons of playback
through headphones and tubephones (for sound recorded at the internal position) .
68
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna·Effects
0:]=----
Left External Original (signal received by
microphone)
Left Tubephone (signal playback)
Left (internal) Original (signal received ·by
microphone)
Left Headphone (signal playback)
Recorded Sound
Figure 6.4: Diagrams of original recording and playback positions. '(Internal) Original
shows the microphone at the eardrum location of the manikin. External original is the
signal received by the manikin with the microphone at the meatus entrance. The playback
positions. to human subjects, show the tubephones . at the eardrum, and headphones -
close to the meat us entrance.
69
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
DISCUSSION
The overall mean angle errors are, once again, surprisingly large. Although they
agree with studies of a similar nature (e.g. Wightman & Kistler, 1989; Wenzel et ai,
1993), they do not correspond to our abilities in the free"field (Stevens & Newman,
1936; Shelton & Searle, 1978; Makous & Middlebrooks, 1990; Schlegel, 1994).
Furthermore, this finding is not unique to one condition, but is large both with and
without pinnae and regardless of recording and playback position.
The physical characteristics of the stimuli at different recording and playback
positions offer some support for this finding. The original 'internal' signal should
produce a close match to the tubephone-presented stimulus, since the recording and
playback position are similar. Indeed, this was the case up to 7.5 KHz, but between
7.5 and 15 KHz, there is some separation of the two signals, in the order of 4 - 8 dB
(see Figure 6.3). The fidelity of tubephone reproduction is, however, far grater than
that of headphones. It is expected that headphones should not match the original
(internally recorded) signal, since there is a difference between recording and
playback position. But they also fail to produce a high fidelity reproduction of the
externally recorded original stimulus, where there is a recording-playback
relationship.
Yet the hypothesis that headphones would match the externally recorded signal was
based upon the assumption that headphones would deliver the sound to the point of
recording, In reality, they sit a small distance away from it and therefore produce
additional resonance around the concha. 'Since the concha causes important
reflections and cancellations before the sound enters the ear canal, this 'double travel'
through the pinna can have a substantial effect. Indeed, Figure 6.3 shows that
head phone playback considerably boosts lower frequencies (between 2 and 4 KHz)
and attenuates higher frequencies (between 4 and 10 KHz), perhaps indicating the
effect of additional concha resonance. Playing the externally recorded sounds
through tubephones should actually give greater accuracy than headphones (although
no difference is apparent) because even though some travel through the ear canal is
lost in this process, the effects of this are minimal in terms of altering the sound.
So the listener should be more accurate with the tubephones, regardless of recording
position. But as the experimental data shows, there are no real differences between
70
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
headphones and tubephones. Even where slight . differences do exist, the trend is
opposite to that expected, with headphones baving the edge on accuracy. Perhaps the
subtle spectral improvements shown for tubephone ~elivery are being nullified by
overall task difficulty.
Pinnae, as expected, reduced the number·of errors for elevation judgements. The
result was replicated in both headphone and tubepboneconditions, demonstrating its
robustness. This statistically significant finding supports the hypothesis that the pinna
are necessary for discriminating locations in this .plane. Another important function is
distinguishing front from back in the horizontal plane.
The number of front-back azimuth reversals were large - approximately 43% with
pinnae and 33% without pinnae. This difference is not statistically significant, but the
results do not reflect the expected pattern. One would expect the pinna to reduce the
number of front-back azimuth confusions in accordance with a number of studies that
have demonstrated the pinnae to be functional in distinguishing front from back
(e.g. Musicant & Butler, 1984).
There was also no difference found between headphones and tubephones, with both
giving front-back errors of around 40%. This finding is surprising, especially in view
of the apparent 'double travel' of sound through the pinna when using the
headphones. Despite this, no difference was observed and it may be that the
frequencies altered by this 'double travel' are either too high or too low to be of any
significance to the listener.
This study has produced a surprising set of results in view of the large angle error
values obtained. However, the fact that the results were not affected by recording or
playback location demoustrates the robustness ·of this finding. There may, however,
have been some factors whose overriding influence inflated the overall error values.
The lack of a visual reference and even possible discrepancy between the recording
and playback environment could have substantially increased task difficulty. Playing
back the signals in identical conditions to which they were recorded could prove to be
an essential element in our ability to localize accurately. In addition, confusion may
have resulted from false head movements. Although subjects were instructed to keep
their heads as still as possible, even slight head movements are likely. This may
conflict with the signal, which would not make the appropriate transformations
relative to the subjects head movement, since sounds were recorded on a still manikin.
71
Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects
Nevertheless, the results obtained, although large, remain similar to the findings of
other studies investigating absolute ·Iocalization (e.g. Wenzel et ai, 1993, errors of
±26°; Wightman & Kistler, 1989, ±21°). 'Clearly, problems with accuracy would
arise in man-machine interfaces incorporating simulated 3-dimensional sounds where
head movements, and ·perbaps visual references are not incorporated.
72
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
CHAPTER 7
The Effect of Interstimulus Delay and Response Method on Localization Accuracy.
ABSTRACT
Studies of absolute localization test the accuracy of listeners' judgements of the
position of discrete. isolated sounds. Yet most experiments present more than one
stimulus which may conflict with the concept of absolute judgement. Reported studies.
however, rarely examine the point at which the memory of a stimulus affects
subsequent judgements by introducing a constraint and therefore artificially reducing
error values - a concept first introduced by Siegel & Siegel (1972).
This study looks at the effect of different interstimulus durations (2, 5, 8 & 12 seconds)
on localization acuity. Subjects heard white noise bursts recorded using a KEMAR
manikin with microphones located at the eardrum position. Sounds were recorded in
the frontal horizontal plane at 5 locations between 0° and 90° and played back over
headphones in a sound-attenuating booth.
No consistent effect (ANOV A) was observed across interstimulus interval conditions.
Therefore, memory did not appear to be involved in the response to current stimuli.
The method of eliciting subject responses was also investigated. Half of the subjects
used a forced-choice (categorical) method and half were given a non-categorical
method - allowing subjects to respond anywhere in a 360° horizontal plane. The
categorical response method produced a statistically significant (pS;O.OI, AN OVA)
improvement in judgement accuracy over the non-categorical method. Therefore, part
of the explanation for the high angle errors clearly lies with the method of collecting
responses.
73
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
INTRODUCTION
Considerable variability in absolute angular error has been found for studies of
localization reported in previous chapters and in ,the literature (e.g. Stevens &
Newman, 1936, ±14°; Wightman & Kistler, 1989, ±21°; Makous & Middlebrooks,
1990, ±1.5°-16°). A number of influential factors have·been identified in the process
of ascertaining angle error, such as the use of pinnae, the stimulus type and whether
sounds are simulated or freecfield. However, one important and unspecified variable
involved in absolute localization is the point at which the task becomes absolute, as
opposed to relative. Since relative judgements are more accurate (e.g. Mills, '1958;
Perrott, 1984) this may be a confounding factor.
Absolute localization is judging the position of a single sound source, without using
any cues from a previous sound. But the critical point at which sounds cease to be
relative is not established, since published studies either use very diHerent
interstimulus times or fail to report it at all. Studies discussed in earlier chapters have
used either a 4 or 5 second interstimulus delay - adequate time for the subject to
respond, yet brief enough to hold their attention. However, it may be that even a 4
second interval is too short for one sound not to affect the response to subsequent
sounds.
The present study was motivated by the original attempts to establish localization
accuracy in terms of information theory (see Chapter 5). Miller (1956) asserted the
accuracy of information theory for establishing absolute judgement channel capacity.
Information analysis, according to Miller, is unaffected by memory span or retention
interval - factors which may make absolute judgement indistinguishable from
relative or 'paired associate' tasks. Absolute judgement, he argued, is limited by the
amount of information in a stimulus, whereas memory span is affected by the number
of items. Thus, information theory is unaffected by practice effects.
However, Siegel and Siege! (1972) criticise Miller's view. They demonstrate that
judgement accuracy is not limited by the amount of information in a signal, but by
failure of subjects to hold several successive stimuli in memory. Thus if signals are
closely spaced, the information transmission will increase. This shows a failing of
information theory studies in terms of the contamination of the measurements by the
memories of recent judgements. The aim of this experiment was therefore to study
74
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
this effect by introducing variable delays ·that should affect memory to a different
extent.
Another influence whose effect has not been measured, is the method of eliciting
subject responses. Published studies have used a number of methods such as head
pointing (Makous & Middlebrooks, 1990), reporting target co-ordinates (Wenzel et
al, 1993) and naming speaker positions {Stevens'& Newman, 1936). All of these
studies have reported very different angle error values. Experiments discussed in
chapters 3 to 5 have. used a mixture of categorical (forced choice) or non-categorical
(a blank diagram for subjects to mark with no guidance as to target locations)
methods. A comparison of these studies alone shows a tendency for categorical
response methods to produce lower angle errors.(approximately ±10-16°, Chapter 3)
than studies using non-categorical techniques (±26° in chapter 5). Although Chapter
4 used a categorical method and produced a mean error of ±19°. Evidently, these
studies are not directly comparable and a controlled comparison is required.
Hake & Garner (1951) used an information analysis approach to study the effects of
discrete and continuous scales on error rates. They found that subjects who
responded by identifying discrete steps made fewer errors than those given a free
choice, particularly when the number of possible choices was small (in the order of
five).
This experiment will use five sound sources, located in the right frontal azimuth
plane. Subjects will either be provided with a forced-choice (categorical) response
method or a non-categorical method - where no guidance as to target angle will be
given. It is expected that categorisation will yield significantly lower angle errors
than non-categorical judgements.
The interstimulus delay times used will reflect a range that is just adequate to allow
subjects to respond and as far apart as maintained subject attention allows. The
briefer the interstimulus duration, the more likely it is that the memory of the previous
sound will remain and constrain subsequent judgements. The stimulus heard by
subjects will be a I-second white noise burst, since pure tones, as used by Perrott
(1984) and Mills (1958), are more difficult to judge by virtue of only having a single
frequency component, rather than a broad range.
75
Chapter 7: Effect of lnterstimulus Delay and Response Method on Localization Accuracy
METHOD
Subjects
Ten subjects were taken from an opportunity sample. All were undergraduate
students, 6 male and 4 female.
Design
The stimulus recordings were made at 5 azimuths in the right hemisphere (0°, 23°,
45°, 68°, 90°) all at 0° elevation. Subjects heard .the recordings in sequences of 25
stimuli (5 repetitions of the sound at each location). The interstimulus delay times for
these sequences were 2, 5, 8 and 12 seconds. Four sequences in total were presented
in a fixed quasi-random order and this enabled each listener to begin at a different
point in the sequence. In addition, the order in which sequences were presented was
varied. The ordering for sequences and stimulus starting points is shown below in
Table 7.1. .
Subject Number Sequence Order Stimulus Start Point
1 7 2s 5s 8s 12s 1 7 13 19
2, 8 5s 2s 12s 8s 2 8 14 20
3 9 8s 12s 5s 2s 3 9 15 21
4 10 12s 8s 2s 5s 4 10 16 22
5 2s, 8s, 12s, 5s 5 11 17, 23
6 5s, 8s, 2s, 12s 6, 12, 18, 24
Table 7.1: Starting points for all ten subjects for both sequence (different interstimulus
delays) and stimulus position within the sequence. Note that the ordering for 'sequence' is
a quasi-random selection taken from the full range of 4-faclOrial permutations. Thus, a
different interstimulus delay is used at least once as a starting sequence, then two randomly
picked sequence orders were added to make 6 in total - the number required to give the
full range of stimulus start points.
76
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Procedure
Stimuli were played over headphones (Beyer Dynamic D 1 48) in a sound-attenuating
booth. Following instructions I ,subjects were either provided with a response sheet
and asked to mark the perceived location with a cross (the non-categorical method) or
they were asked to choose a location.(categorical method) (see Figures 7.1a, b & c).
Between each sequence subjects had a -brief break, during which time new response
sheets were provided.
I See Appendix 5D
78
~
° ~ ~
0 ~ ~
w
80
70
60
50
40
30
20
10
0
Chapter 7: Effect of lnterstimulus Delay and Response Method on Localization Accuracy
I 0
Errors by Target Angle for the Non-Categorical Response Method.
+-- I-
23 45
Target Angle (0)
68 90
--2 secs 1 --5 secs
8 secs __ 12 secs
-Jt- random response~
Figure 7.4: Errors by target angle for all interstimulus delay times for the non-categorical
response condilion . There were no statistically significant differences for response
accuracy of the targel angles within each inlerstimulus interval condition (ANOY Al .
Random response values for a full 360· range of possible responses are included to show
chance levels.
89
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
DISCUSSION
This study set out to investigate whether interstimulus duration had an effect on
localization accuracy. It was hypothesized that the shorter the interstimulus delay, the
more likely it was that judgements would be constrained (and thus made more
accurate) by the memory of previous stimuli. The results were divided into
categorical and non-categorical to correspond with subject response conditions.
Both conditions show an improvement for the 5-second interstimulus interval.
However, if this effect were the result of improved retention of previous stimuli, then
one would expect any interstimulus interval·below 5 seconds ·to show the same
pattern, since memory would only.be enhanced for more ·c1osely spaced sounds.
However, this pattern is not evident and the 2-second condition shows a drop in
judgement accuracy compared to the 5-second condition. It may be that whilst the 2-
second response condition does result in strong retention which would constrain
subsequent judgements, the response time is too short for subjects to be accordingly
accurate. Many subjects reported that there was insufficient time to record the
response and prepare for the next stimulus. Although all subjects responded to every
stimulus, it may be that subjects were forced to guess or make random judgements
fairly frequently, in order to keep up with the sequence.
What does show a clear effect, however, is response method. As Figure 7.2 shows,
there is a large improvement in the categorical (forced choice) method over the non
categorical method, which is maintained for each interstimulus delay time. This
supports Hake & Garner's (1951) findings that there are fewer errors with scales
using discrete steps. This is because the categorical method constrains judgements in
two ways. Firstly, by giving the subject an awareness of the range of source
locations. Secondly, errors that occur in a non-categorical method typically become
0° in a categorical method, since the subject is forced to choose the nearest (often
correct) category. By using this means of eliciting responses, potentially huge errors
are ruled out by automatically eliminating judgements that might otherwise be placed
well outside the range. Hence it's validity as a method of assessing true localization
ability is questionable.
The breakdown of judgements into errors by target angle (Figures 7.3 & 7.4), show
surprisingly different trends for the different response methods. The categorical
90
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
method shows increased accuracy at 0°. Sounds emanating from around the midline
typically show the greatest accuracy (Stevens & Newman, 1"936; Mills, 1958; Shelton
& Searle, 1978; Middlebrooks et ai, 1989; Makous & Middlebrooks, 1990). This is
because a fixed angular difference produces a larger interaural timing difference near
the midline (Gelfand, 1990). In addition, stimuli presented at 45° and 68°, which
show the largest errors in this study, lie within the cone of confusion where judgement
accuracy decreases.
However, Siegel & Siegel (1972) argue that stimuli at the ends of a scale are judged
more accurately than those in the middle. This phenomenon they term 'end point
effects' and they occur because subjects who perceive sounds off the extremes of the
scale will then attribute their judgements to the end points. Thus the end points have
a higher chance of being judged correctly and this may be the reason behind the
pattern showed by the categorical data in this study. However, the categorical method
used in Chapter 4 also showed the greatest accuracy to be at 0°, where 0° was not an
end point, but in the middle of the scale. It should also be noted that in this
experiment, only one end point shows a dramatic increase in accuracy. The 90° end
point shows values similar to those at 23° and so it rnay well be that the cone of
confusion is the major reason behind increased errors in the middle of the range.
What is clear from the inclusion of the random set of responses is that the categorical
data are not artifactual. A range of possible responses were randomly inserted that
either corresponded to the available category choices (categorised random responses)
or to any value within the target range of 0° - 90° (non-categorical random
responses). The similarity of these two curves shows that the values obtained are not
a facet of categorisation.
The non-categorical response method shows much higher angle errors than the
categorical response method. B ut whilst the values seem high, the graph shows that
the results are well below chance. In terms of where the greatest accuracy occurs,
almost the opposite findings to the categorised condition are displayed. These results
are puzzling, but one explanation for the increased error at 0° may be that several 0°
targets were judged to be to the left of the subject. Several subjects made judgements
between 270° and 359°, despite the fact that no sources were located in this region. A
270° judgement constitutes an error of 90°, which when made more than once or
twice per subject, produces a large increase in the overall error for the 0° target
position.
91
Chapter 7: Effect of lnterstimulus Delay and Response Method on Localization Accuracy
These misplaced judgements are perhaps due to pr-esenting the sounds in one
hemisphere only. The ·right ear, receiving all the direct sound becomes somewhat
saturated, whilst the left ear, receiving no direct 'sound, becomes over-sensitive. Thus
when a sound is presented in the midline, the sound is artificially 'shifted' over to the
left because of an imbalance between the two ears. A second possible 'reason may be
response bias. Subjects may ·feel uncomfortable placing all judgements to'one side,
and so expect some sounds over to the left. Sounds located at 0° are the-closest to
these expectations and thus are placed incorrectly by the listener (NB: no stimuli other
than 0° sources were placed in the left quadrant).
The manipulation of interstimulus interval in this study has failed to reveal a clear
point below which judgement accuracy increases substantially. Therefore, memory
did not appear to be involved in the response to current stimuli, but part of the
explanation for the high angle errors clearly lies with the method of collecting
responses. The introduction of discreet categories constrains error both by limiting
the subject's response range and by informing subjects of the actual target locations.
The ambiguity in judging 'virtual reality' sounds is clearly reduced by categorisation
and so it is a factor that should be considered when attempting to establish 'true'
localization ability.
92
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
CHAPTER 8
The Effect of Stimulus Type and Response Method on Judgement Accuracy.
ABSTRACT
This experiment firstly attempts to evaluate the most effective stimulus for
localization tasks by comparing three different stimulus types, and secondly, two
different methods of eliciting subject responses are examined.
Subjects were required to localize either clicks, white noise or speech in one of three
response conditions. The first was an unguided method where subjects were given
blank diagram with 10° markings around the outside. The second used the same
diagram but subjects were seated in front of a marker strip, with 10° azimuth
markings that matched 10° markings on the response sheet. This was to establish
whether there was difficulty in relating 3-dimensional perception to a 2-dimensional
response sheet. The third was a categorical method, where subjects were forced to
choose from a number of specified locations.
The speech stimulus and clicks ·produced the greatest accuracy for azimuth, showing a
statistically significant (p$0.05, related t-test) improvement over white noise.
However, no significant differences were found for elevation, where subjects found
the task in general considerably more difficult.
The categorical response method gave large improvements in judgement accuracy for
all three stimulus types in the horizontal plane. There was little effect ·found for
elevation. The lack of any effects for elevation judgements for either stimulus type or
response method, seem to indicate that subtle elevation cues are being lost. The
93
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
overall task difficulty may be overriding any other effects, such as response method
or stimulus type. Nevertheless, for azimuth judgements, the means of eliciting
response and stimulus type are clearly important in determining localization accuracy.
94
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
INTRODUCTION
Experiments have shown that a listener's ability to localize a sound source depends on
a number of factors. In previous (unpublished) work, we found that the overall mean
angle error can range from ±8° to ±27° in different experiments. Response method I
seemed to play an important part. However, since previous studies confounded
different response methods with different stimulus types, error values are difficult to
evaluate.
Localization studies using pure tones are therefore thought to produce the greatest
error, since there is only one frequency present. Depending on that frequency, either
interaural ·timing differences (ITD's) or interaural level differences (lLD's) are
effectively utilised, but not both. Noise or speech, on the other hand, comprise
several frequency components (or all), allowing both ITD's and liD's to play a part in
the localization process and so increase judgement acuity.
Makous & Middlebrooks (1990) looked at free-field localization using broadband
sounds which varied in both the vertical and horizontal planes. Subjects reported the
location of the sound source by orienting their head towards it. The angle errors
obtained were between 1.5° and 16.3° when averaged across all six subjects.
Begault & Wenzel (1991) used a speech stimulus that was filtered using
nonindividualized Head Related Transfer Functions (HRTF's). They asked subjects
to identify azimuth and elevation location by reporting spatial co-ordinates (e.g. "up
30, right !O"). They found the mean angle error to be ±27° for all subjects. They
compare this value to their earlier study (Wenzel et ai, 1991) where broadband noise
produced approximately the same mean angle error's. Whilst these error values are
much larger than those of Makous & Middlebrooks, there is some evidence
(Freedman & Fisher, 1968) that head movements increase accuracy, a variable
eliminated using Wenzel's technique.
Wenzel et al (1993) demonstrated that free-field and headphone listening were
comparable, and that both sets of data produced mean angle errors were much greater
than those obtained by Makous & Middlebrooks.
I See chapter 7.
95
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
HRTF's, therefore, have not proved to be an effective means of simulating free-field
listening. The use of a 'KEMAR manikin may replicate the free-field situation more
accurately by removing the extra step of deriving the stimulus - a complex variable
that could be contributing to the large errors when using HRTF's. Furthermore,
dummy head recordings eliminate head movements, making it possible to investigate
the location information derived from spectral information only.
Stimulus type also appears to influence localization accuracy. In earlier chapters a
number of different types of stimuli have been used in different experiments. Chapter
3 used clicks and a mean error of ± 190 was obtained using a category system for
subject responses. In chapter 4, white noise bursts were used in the azimuth and
elevation planes, with mean angle errors of ±24-27° for azimuth and ±26-30° for
elevation with an unguided response method.
It seems unclear which of the broadband stimuli; clicks, speech or white noise might
truly provide the most effective cues for localization, in either plane. Although, if
familiarity plays a part (as demonstrated by Coleman, 1962, in localization distance
judgement tasks) then one might expect that speech would give the greatest accuracy.
However, white noise should logically give the greatest accuracy, since its full
frequency spectrum offers a greater opportunity for the auditory system to analyse the
reflections and refractions caused by the pinna and the environment, of a flat
spectrum sound.
To address this problem, the following experiment uses a controlled comparison of
stimulus types. A I-second white noise burst, a click and a speech stimulus - ~chips'
(chosen for its broad-spectrum components) will be compared in azimuth and
elevation dimensions. The experiment also used three different response methods;
(1) categorical, (2) unguided (no indication of stimulus locations) and (3) unguided
with an azimuth judgement aid.
The judgement aid used in the third method was a marker strip put in the booth
around the subject at eye level. . The strip had equally spaced measurements on it
allowing subjects to refer to marked points on their response diagram. Using the
marker strip would ascertain whether or not subjects have difficulty mapping 3-
dimensional perception of the sound to a 2-dimensional response sheet. If use of the
marked judgement aid produced significantly fewer errors then it would suggest that
listeners need some point of reference between the booth surroundings and their
96
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
response diagram. An equivalent .marker strip for elevation was not incorporated at
this stage, since the necessary dimensions would have been more awkward for the
subject to view and so be less helpful. If the marker strip proved to be effective, a
second phase would be run with an elevation judgement aid included.
97
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
METHOD
Subjects
An opportunity sample of 27 undergraduate and postgraduate students were used, 20
males and 7 females. Ages ranged from 18 t046.
Design
2 experiments were presented:
1. Azimuth
2. Elevation
Within each experiment were the following conditions:
a) 3 response methods
b) 3 stimulus types
c) 7 locations (2 repetitions of each).
Each experiment consisted of three trials which were all recorded using a different
sound stimulus. The stimulus was either white noise, a click, or speech - the word
"chips". The stimulus recordings were made using a KEMAR manikin with
microphones placed at the internal meatus entrance. These were made at 7 azimuths,
all at 00 elevation, and were spaced 300 apart between 00 and 180. Elevation
recordings were made at 00 and 450 azimuth at seven locations lying between _45 0
and +900 at 22.5 0 intervals.
Stimuli were randornised within each trial and presented to subjects as two separate
experiments in quasi-random order.
Response Methods
All subjects listened to the stimuli through tubephones, but were divided into three
response groups.
1. Non-categorical (marker).
For the first condition, subjects were given a blank response diagram with lO°
markings around the circumference (see Figures 8.1 a & b), which
98
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
corresponded to 10° markings on the judgement aid erected in the booth. This
allowed subjects to associate what they might perceive 'in space' to the
diagram in front of them. No information about stimulus target location was
given in this condition.
2. Non-categorical (no marker).
The second group was given the same diagram, but no judgement aid was
used. Again, no stimulus target locations were indicated.
3. Categorical.
Stimuli
The final group used a categorical response method, which involved choosing
a response position from a number of specified (actual) target locations
provided diagrammatically (see Figures S.2a, b & c).
The stimuli were a I second whitc noise burst, a click and a speech stimulus. The
click was generated using a Masscomp Computer2 and played through a speaker
(Radio Spares Wide Range 6"). The white noise stimulus was produced using the
Sussex Synthesizer software on an Apple Macintosh computer. The speech stimulus
was a recording of a male voice reciting the word "chips". Microphones (Breul and
Kjrer 4134, OS') were placed at the inside end of a Zwislocki coupler, at the eardrum
position of a KEMAR manikin, which was fitted with (nonindividualized) pinnae.
For azimuth, all stimulus sounds were recorded using a ring, 3m in diameter, with the
7 speakers (Radio Spares, SQ, 3") placed around it in the specified locations. The
manikin was placed in the centre of this ring, with the speakers at its ear height.
Elevation recordings were made using a single speaker (Radio Spares Wide Range 6")
attached to a wooden pole at a distance of 1.5m. This was pivoted at the manikin and
rotated to the relevant position.
2 The stimulus produced by the Masscomp was had a non-flat spectrum. When played through a
speaker. the stimulus produced was flat-spectrum.
99
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
Each set of stimuli (white noise, clicks and speech, for azimuth and elevation) were
digitally recorded onto digital audio tape and then edited using "Audiomedia". T-he
stimuli were divided into type, and sub-divided into azimuth and elevation trials, and
a 6 interstimulus delay, using 'room silence', was added. The trials were then
randomised and re-recorded on the Betamax cassette. Stimuli were then played back
to subjects over tubephones (Etymotic ER-2).
Procedure
Subjects listened to the stimuli whilst seated in a sound attenuating booth. Each
subject was provided with response sheets corresponding to their response group (see
Figures 8.la & band 8.2a, b & c) and given instructions (see Appendices SE. I &
SE.2). Each subject began at a different point in the experimental trial sequence.
Thus subject I would start with azimuth and one of the three stimulus types. Subject
two would begin with elevation and a different stimulus type3. Between each of the
three azimuth trials (one for each stimulus type) and the three trials for elevation, the
experimenter entered the booth and gave the subject a new response sheet (6 in total).
This was intended to reduce practice and boredom effects.
3 See Appendix 4 for trial sequences.
lOO
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
Subject ......................... .
Sequence ..................... .
Trial ............................. .
350 o 10
320 40
310 50
270 o 230
220
190 180 170
Figure 8.1a: Blank response diagram for azimuth with 10° markings around the
circumference. For subjects in the non-categorical condition. either with or without the
judgement aid (marker strip).
101
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
Subject ......................... .
Sequence
Trial ............................. .
100 90 80
130
140 40
-140 -40
-50
-100 -90 -80
Figure 8.1 b: Blank response diagram for elevation, with 10° markings. This was the
response sheet provided for the non-categorical response condition.
102
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
Azimuth Positions
o
o 90
150
180
Figure 8.2a: Diagram of stimulus locations for azimuth. Each subject was provided with
this guidance diagram for reference throughout the categorical condition of the experiment.
103
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
Elevation Positions
90 68
45
-45
Figure 8.2b: Diagram showing the elevation stimulus locations. This diagram was provided
throughout the categorical response condition for reference.
104
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
Subject ........................ .
Sequence .................... .
Trial ............................ .
Stimulus I Response
1
I
2
3
4 I 5
6
7
8
9
10
11
12
13
14
Figure 8.2c: Response table used in conjunction with the categorical response condition.
The same sheet was provided for all azimuth and elevation trials (6 in total per subject).
105
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
RESULTS
Overall mean angle errors were calculated and corrected for front-back errors, for the
three stimulus types and for each different response method (see Tables 8.1 a & b
below). The categorical method of response produced a considerably lower error rate
than either of the non-categorical methods (significant at p~O.05 for all three stimulus
types, using an unrelated t-test) for azimuth judgements, but not for elevation.
To establish the effect of having knowledge of the speaker locations, responses in the
non-categorical condition were grouped into categories identical to those used in the
categorical method (see Tables 8.2a & b).
However, there is very little difference between the stimulus types. This is true for all
but one condition. Azimuth judgements using the non-categorical response method
(with no judgement aid) show a statistically significant increase in error when judging
noise, compared to clicks or 'chips'. All other differences between stimulus types are
minimal and not statistically significant using related t-tests.
106
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
The subjects weFedivided into two groups, each of which heard identical sets of
stimuli, but who were given different methods of response. The first group used a
categorical system, which comprised a forced-choice from a diagrammatic
representation of the actual -target positions (see Figure 7.la). Letters were attributed
to each of the 5 target positions and these were judged and noted down on a grid by
the subject (see Figure 7.lb). The second group were given a blank diagram
representing a 3600 horizontal plane surrounding the subject (see Figure 7 .lc) and
·listeners were asked to mark a cross at the perceived location of the sound source.
Hence in this non-categorical response method, no guidance was given as to the range
or exact positions of the speakers.
Stimuli
A I second white noise burst, with cut-off frequencies of 0 and 19KHz, was
generated using the Sussex Synthesizer package on an Apple Macintosh 11.
Sounds were recorded in a normally reverberant room using microphones (Breul and
Kjrer 4134, OS'), which were placed at the internal meatus entrance of a KEMAR
manikin, using a Zwislocki coupler. Stimuli were played in the frontal azimuth plane
through a speaker (Kef 8") placed on a speaker stand at ear height of the manikin.
This was moved around the manikin at a fixed distance of 1.5m, to the specified
locations.
The interstimulus delay comprised 'room silence' - identical to that in which the
stimuli were recorded, such that the ambient sound on the tape remained the same for
the duration of the trial. This was done by simply recording the quiet background
noise in the recording room for IS seconds, using the same manikin set-up as the
stimulus recordings.
The 25 stimulus sounds were digitally recorded onto a betamax cassette then these
recorded stimuli were transferred to the Audiomedia sound editing software on a
Apple Macintosh 11 computer. The sounds were edited to produce four playlists, each
with· different interstimulus intervals.
77
Chapter 7: Effect of Interstimulus Oelay and Response Method on Localization Accuracy
Front I
I
()
Back
Figure 7.1a: Blank response diagram given to subjects for the non-categorical response
condition. Subjects marked a cross on a separate diagram for each sound heard. The
diagram is actual size.
A
E
Figure 7.1 b: Guidance diagram (half actual size) provided to subjects in the categorical
response. condition. Subjects used the diagram in a forced-choice paradigm.
79
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Subject ........................ .
Sequence .................... .
Start No ...................... .
Stimulus I Response 1
2
3
4
5 ~
,. 23
24
25
Figure 7.lc: Response sheet used in conjunction with the guidance diagram (Figure 7.lb).
Subjects indicated a response letter from the guidance diagram next to each stimulus
number. A new response sheet was provided for each interstimulus delay sequence.
80
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
RESULTS
Angle errors were calculated for all subjects and ·for the non-categorical response
condition, these values were front-back corrected2• The mean overall errors for the
non-categorical method were ±20° (front-back corrected) and for the categorical
method were ±8.2°.
When response method is compared, there is a large difference, as shown in Figure
7.2. The values for the non-categorical method were typically more than double those
of the categorical method - a statistically significant result (p:'>O.O I, ANOV A) for all
interstimulus delay times (see Tables 7.la-e {or ANOV A tables showing all
interstimulus intervals combined and· each interstimulus interval separately).
Figures 7.3 & 7.4 show mean angle errors averaged to give an overview of values for
each sequence (interstimulus delay time) at each target angle, for both the categorical
and free response methods. The two response methods show very different patterns
of judgement error, with the categorical condition showing increased accuracy at 0°
and 90° and the free condition giving the greatest accuracy between 45° and 68°.
Chance errors were added by generating a random range of possible response values
within the Excel analysis spreadsheet. Ten sets of random values were calculated and
the average taken. The results are superimposed onto each graph of judgement error
by target angle for the different response methods (see Figures 7.3 & 7.4).
2 See Chapter 3 "Methodologies"
81
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Factor Levels Values category 2 1 2 angle 5 1 2 3 4 5 interval 4 1 2 3 4
Analysis of Variance for ISI
Source DF Seq SS Adj SS Adj MS F P category 1 8602.8 8602.8 8602.8 88.58 0.000 angle 4 637.9 637.9 159.5 1. 64 0.165 interval 3 417.0 417.0 139.0 1.43 0.235 Error 191 18549.4 18549.4 97 .1 Total 199 28207.1
Means for ISI
category Mean Stdev 1 8.224 0.9855 2 21. 341 0.9855
angle. 1 18.130 1.5582 2 14.908 1. 5582 3 13.496 1.5582 4 14.274 1.5582 5 13 .105 1.5582
interval 2-sec 16.332 1.3937 5-sec 16.090 1.3937 8-sec 13.082 1.3937 12-sec 13.626 1.3937
Table 7.1 a: Analysis of variance table for all interstimulus intervals combined showing the
differences between different response methods. A statistically significant improvement is
found for the different (category) response methods (shown in bold type).
82
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Factor Levels Values category 2 1 2 angle 5 1 2 3 4 5
Analy§is of V§,riance for 2-se~
Source DF Seq SS Adj SS Adj MS F P category 1 3424.61 3424.61 3424.61 43.78 0.000 angle 4 151.11 151.11 37.78 0.48 0.748 Error 44 3442.17 3442.17 78.23 Total 49 7017.89
Means for 2-sec
category Mean Stdev 1 8.056 1.769 2 24.608 1.769
angle 1 16.500 2.797 2 18.580 2.797 3 13 . 540 2.797 4 15.480 2.797 5 17.560 2.797
Table 7.lb: Analysis of variance table for the 2-second interstimulus interval condition. A
statistically significant difference (bold type) is found for response method (category) but
not between the different target locations.
83
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Factor Levels Values category 2 1 2 angle 5 1 2 3 4 5
Analysis Qf Variance for 5-ses;:;
Source DF Seq SS Adj SS Adj MS F P category 1 1958.1 1958.1 1958.1 11.93 0.001 angle 4 263.0 263.0 65.7 0.40 0.807 Error 44 7221.1 7221.1 164.1 Total 49 9442.2
Means for 5-sec
category Mean Stdev 1 9.832 2.562 2 22.348 2.562
angle 1 20.060 4.051 2 15.980 4.051 3 16.420 4.051 4 13.090 4.051 5 14.900 4.051
Table 7.lc: Analysis of variance table for the 5·second interstimulus interval condition. A
statistically significant improvement (bold type). for the categorical response method was
noted.
84
Factor category angle
Analysis
Source category angle Error Total
Means for
category 1 2
angle 1 2 3 4 5
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Levels 2 5
Values 1 1
2 2 3
of Variance for 8-sec
DF Seq SS Adj SS 1 1791. 61 1791. 61 4 538.42 538.42
44 2942.80 2942.80 49 5272 . 83
8-sec
Mean Stdev 7.096 1. 636
19.068 1. 636
17.460 2.586 12.690 2.586 13 _ 580 2.586 14.320 2.586
7.360 2.586
4 5
Adj MS F P 1791. 61 26.79 0.000
134.61 2.01 0.109 66.88
Table 7.ld: Analysis of variance table for the 8-second interstimulus interval condition. A
statistically significant difference (bold type) was found for response method only.
85
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Factor Levels Values category 2 1 2 angle 5 1 2 3 4 5
Anal:::z:::sis of Variance for 12-sec
Source DF Seq SS Adj SS Adj MS F P category ~ ~632.49 ~632.49 ~632.49 ~7.7~ 0.000 angle 4 368.15 368.15 92.04 1.00 0.419 Error 44 4056.48 4056.48 92 .19 Total 49 6057.12
Means for 12-sec
category Mean Stdev 1 7.912 1.920 2 19.340 1. 920
angle 1 18.500 3.036 2 12.380 3.036 3 10.445 3.036 4 14.205 3.036 5 12.600 3.036
Table 7.1e: Analysis of variance table for the 12·second interstimulus interval condition.
Statistically significant improvements (bold type) were found for the categorical response
method.
86
~
0 ~
~
0 ~ ~
w
Chapter 7: Effect of lnterstimulus Delay and Response Method on Localization Accuracy
35
30
25
Angle Errors of Categorical and Non-Categorical Response Methods
20 ---- Categorical mean error
1 5 ---- Non-Categorical
I---I e rro r
10 f 1 5
0
2 secs 5 secs 8 secs 12 secs
Interstimulus Delay
Figure 7.2: Chart showing the mean angle errors of the categorical and non-categorical
response methods, broken down into interstimulus delay time. Statistically significant
differences exist (p~O.OI. ANOVA) for all interstimulus delay times between the two
response methods, as shown by the ±2 standard error bars. There are no significant
differences between the different interstimulus intervals within each response condition.
87
mean
80-
70
60
Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy
Errors by Target Angle for the Categorical Response Method.
--2 Secs
--5 Secs
-:--8 Secs ;;- 50
i::~ ~ lIC~_::::=---lIlIEE--=;:oo_--qllC~/' --12 Secs
~categorical random responses
110 -1IC---- ~ 20
1~ .~_--+I ___ :I-_~_~_:? o 23 45 68 90
Target Angle (')
-X- non-categorical random responses l __ _
Figure 7.3: Errors by target angle for each of the interstimulus delay times for the
categorical response condition. No statistically significant differences were found for
judgement accuracy of each target angle within each interstimulus interval condition
(ANOY Al. Random response values (0' to 90° range) are given for the categorical and
non-categorical response conditions to illustrate chance levels.
88
Chapter 8 Effect of Stimulus Type and Response Method on Judgement Accuracy
Non-Categorical (marker) 24 . 1 Non-Cat~gorical (no marker) 22 .1
24.8 20.2
25.1
27.4 24.7 23.2
15 . 0
Table 8.1 a: Summary of overall mean angle errors for front-back corrected azimuth resulls.
There are 9 different subjects in each condition: non-categorical (with reference marker
strip in the sound booth, but for azimuth only), non-categorical (with no reference marker
strip) and categorical
Non-Categorical @zim. marker only) 45.9 Non-Categorical (no marker) 46.2
B·3 43.5
42.1
43.2
46.4 44.3 43.3
Table 8.1 b: Overall mean angle elTors for elevation trials. The subjects are the same in each
condition as those in the azimuth trials. Responses are corrected for front-back errors . The
unusually large angle errors are little better than chance.
107
Chaptet 8 . Effect 01 Shmulus f ype anu Response Method on Judgement Accuracy
a)
b)
Tables 8.2 a & b· Mean angle errors for the non-categorical condition showing 'actual' anti
'grouped ' responses . 'Grouped responses' are mean errors of the actual non-ca tegorical data
when banded into categories identical to those lIsed in the categorical method.
108
Chapter 8: Effect of Stimulus Type and Response Method on Judgemenl Accuracy
DISCUSSION
This experiment looked at the difference in localization accuracy of three stimulus
types; white noise, clicks and speech. This was done using 3 different response
methods; non-categorical, categorical and categorical with a response-maping guide
in the horizontal plane. Tables 8.1 a & b show the mean angle errors for the 3
stimulus types in each different response condition.
For azimuth judgements, clicks and speech produced significantly fewer errors than
white noise. For elevation, however, white noise produced the greatest accuracy,
although the difference was very small. This was surprising, since subjects reported
clicks to be more substantially more difficult to .Iocalize than either white noise or
speech. In fact, speech was deemed to be the stimulus most easily localized for both
the vertical and horizontal planes, although these reports were not upheld by the
numerical results.
For azimuth, the categorical response method produced statistically significant
improvements for all three stimulus types (clicks, noise and speech) when compared
to the non-categorical response method (for front-back corrected data). This result
was as expected. For the non-categorical method subjects had no indication of the
target locations and were given a free response range of 3600• Indeed, in this
condition subject placed some stimuli well outside the true range, increasing the mean
angle error considerably. With the categorical method, subjects were aware of the
target locations and their response choices were confined to those locations.
Therefore large errors from judging outside the actual range of speakers does not
occur.
The effect of having knowledge of speaker positions can be tested by grouping the
non-categorical data into categories identical to those used in the categorical method.
Thus, non-categorical responses are grouped into 300 bands that fall symmetrically
about the target location. This should reduce the mean error values since categories
'pull in' outliers to a fixed, correct response. However, in this case, there was only a
10 improvement (see Tables 8.2a & b) to the overall mean angle errors for all three
stimulus types. This clearly demonstrates that responses in the non-categorical
condition were off-target by more than half a category width, and therefore fell
109
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
outside the 'correct' category grouping. Thus, prior knowledge of the target positions
can almost halve the error rate.
Elevation judgements were not improved using the categorical method. Thus, the
lack of improvement when the non-categorical responses were grouped into
categories was not surprising, as it had been for azimuth. The angle errors for
elevation were large and subjects clearly have difficulty in distinguishing the
individual positions - a fact that seems to far outweigh the method of response. For
elevation judgements, there are no interaural time and intensity cues that provide
strong location information. We rely more heavily on pinna cues for judging
elevation - a finer and less robust cue - which may be more easily affected by
other variables. A visual correlate or loss of information in the recording process may
have weakened the available elevation cues.
The marker strip was expected to enhance the accuracy of judgements (in the
horizontal plane), since it should enhance the relationship of 3-dimensional perception
to a 2-dimensional response sheet. It seems reasonable to assume that perceiving a 3-
dimensional sound and attempting to accurately pinpoint its location is not well
served by providing subjects with a 2-dimensional response diagram. Problems may
arise when trying to relate an 'immersive' perception to a 'gods-eye' judgement.
Hence the judgement aid was erected at eye level such that subjects could locate the
sound source, attribute a location to it (using the markings on the strip) and then
correlate that with the appropriate point on their response diagram. However, there
was no difference in error between using the marker strip or not, which argues against
the hypothesis that listeners have problems matching perception with response.
This study has shown that for azimuth judgements the shorter duration stimuli
produced the greatest accuracy. Subject head movements may be confounding longer
signals (noise), thus confusing the subject and causing an increase in error. For
shorter stimuli, such as clicks, the stimulus is too brief for head movements to have
any noticeable effect. For elevation, however, no such effect was noted and here it is
likely that the overall task difficulty has overridden more subtle effects such as
interstimulus differences. However, these effects may also be combined with the fact
that noise provides only spectral information - useful only for elevation. This also
explains the increase in error for azimuth judgements but not for elevation
judgements.
110
Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy
In the horizontal plane, the substantial improvement in accuracy when UStng a
categorical response method clearly reduces much of the ambiguity inherent in
localization tasks. Indeed, mean angle errors are almost halved using this technique
- a finding which could extend to the large error values reported in similar published
studies (e.g. 8egault & Wenzel, 1991; Wenzel et ai, 1991) and which could explain
the small errors reported by other studies (e.g. Stevens & Newman, 1936; Shelton &
Searle, 1978). Thus it is questionable whether using a categorical system really tests
the ability of the auditory system to resolve spatial ambiguity. This makes response
method an important factor when determining localization accuracy for use in an
'open-field' virtual interface.
III
Chapter 9: Live Relay using the KEMAR Manikin
CHAPTER 9
'Live' Relay using the KEMAR Manikin.
ABSTRACT
Presenting pre-recorded sounds made using a KEMAR manikin has not produced the
localization accuracy of some reported free-field studies (e.g. Makous &
Middlebrooks, 1990; Stevens & Newman, 1936). A number of variables have been
investigated in an attempt to reduce the consistently large angle errors obtained,
although these refinements have had little effect.
One major factor not yet examined is the recording process. The steps involved in
creating a pre-recorded play list may cause some information to be lost, despite the
fact that high-fidelity digital recordings are made.
The following experiment compares a 'live' relay of stimuli presented in the horizontal
plane, with recorded presentations of these live trials. Stimuli were played to the
manikin and subjects heard these sounds as they were presented, but through
tubephones channelled into a sound-attenuating booth. It was expected that a direct
relay of the sounds, cutting out the recording stage, would increase the judgement
accuracy. The pinna, whose role for judgements in the horizontal plane is unresolved,
was also investigated. This was done by presenting the trials either through
nonindividualized pinnae, modelled on a listener, or no pinnae (infills).
The mean angle errors were similar for both the live and recorded conditions; ±24.7"
and ±19.9° respectively, when corrected for front-back errors. The difference was not
statistically significant (pS;0.05, related Hest). These values are high, but consistent
with previously reported experiments. It is clear from these results that the recording
112
Chapter 9: Live Relay using the KEMAR Manikin
INTRODUCTION
Studies reported in earlier chapters have identified a number of variables that may be
influential (to a varying extent) in the error values obtained in localization
experiments. These have included response method, stimulus type, relationship
between recording and playback location, and whether or not pinnae are used.
Although the experiments have been successful in identifying such factors as
influential in the localization process, there are still variables that remain unidentified.
It is the lower angle errors reported by some studies that has brought about an
awareness of other variables which are likely to be playing an important role.
However, those studies presenting lower error values use different methodologies,
which in themselves have helped to highlight where these other contributors may lie.
For example, the studies by Makous & Middlebrooks (1990) and Stevens & Newman
(1936) that report overall mean errors of±9° and ±14° respectively, are both free-field
studies, where head movements are allowed, thus implicating them as contributory
factors.
However, it is first necessary to examine all other variables that may be playing a part
in judgement accuracy, before moving on to free-field studies to incorporate head
movements or indeed, vision. One other possible contributor within a pre-recorded
method is the recording process. Many free-field studies give markedly smaller errors
than the results obtained in previous chapters. Perhaps important information is lost
by recording the sounds and playing them back from a tape, despite the fact that they
are hi-fidelity digital recordings.
The aim of this experiment was therefore to conduct a live relay through the manikin
such that subjects heard stimuli as they happened, and not from a pre-recorded tape.
The sounds were also presented in a richer auditory environment by including some
low level background sounds in an attempt to establish whether this aided
localization. However, to ensure that any observed improvement was a direct result
of cutting out the recording stage, and not the addition of background noise, the live
trials were also digitally recorded and presented from a tape as a comparison - both
with the live trials and with previous studies.
The majority of sounds chosen will vary from those used in previous experiments to
investigate a wider variety of complex sounds. But the sounds will also differ from
114
Chapter 9: Live Relay using the KEMAR Manikin
each other, such that the subject can distinguish each sound. Since there will be
background activity the stimuli must be readily identifiable and of sufficient duration
to attract attention and allow time for judgements to be made. This would give a
more realistic portrayal of everyday listening, since sounds vary in type as well as
location amidst other sounds.
In Chapters 4 and 6 the role of the pinna was investigated and found to have little
influence in the discrimination of sounds varying in azimuth. However, some
reported studies (e.g. Batteau, 1967; Freedman & Fisher, 1968) assert that the pinna is
functional in all dimensions. A pinna versus no pinna comparison is included in this
study to serve as further clarification of its function in the horizontal plane.
liS
Chapter 9: Live Relay using the KEMAR Manikin
METHOD
Subjects
Nine undergraduate students (age range 18 - 32) were used, 4 males and 5 females.
All were inexperienced listeners and reported nonnal hearing.
Design
Subjects heard stimuli relayed live through a KEMAR manikin in two conditions;
recorded versus live. These conditions -comprised 2 -identical sets of stimuli which
were relayed with and without pinnae. The pinna and no-pinna trials were given as
separate blocks and the blocks were presented in a counterbalanced order across
subjects. The stimuli were played in the following fixed randomised locations in the
horizontal plane:
I. 1800 - Metronome (4 clicks, total duration approximately 1.5 seconds)
2. 81 0 - Hand Claps (2 claps, duration approximately I second)
3. 341 0 - Xylophone (2 strikes, duration approximately I second)
4. 2580 - Paper tearing (duration approximately 2 seconds)
5. 00 - Male speech - the word "Chips" (duration 1.4 seconds)
6. 1280 - Bunch of keys rattling (duration I second).
Stimuli
Six different stimuli were used; metronome clicks, hand claps, xylophone strikes,
paper tearing, speech (the word "chips") and rattling keys. These were presented at
the locations specified in the Design, at a fixed distance of 1.5m from the manikin.
The stimuli were presented in one of two conditions:
I. The main condition was a live perfonnance of each of the stimuli in turn at
various positions around a KEMAR manikin. The stimuli were played in a
nonnally reverberant large room which was a working laboratory. This involved
people typing, doors opening occasionally quiet talking or whispering, printer and
116
Chapter 9: Live Relay using the KEMAR Manikin
computer noise and general movement of the four occupants around the room.
The manikin was placed at the centre of the room and the six stimuli were played
around the manikin at equal distances from it. The manikin had microphones
(Breul and Kjrer 4134, OS') placed at the eardrum location which were held in
place using Zwislocki Couplers. The microphones were fed through a Breul&
Kjrer power supply and pre-amp (Rote I RC-850) to an amplifier (Marantz PM-
45). From the amplifier, tube phones (Etymotic ER-2) were used to feed the sound
into the sound-attenuating booth to the subject, who was listening to the stimuli
live as they were played to the manikin.
2. For pre-recorded condition a selection of these live trials were recorded using a
Digital Audio Tape (DA T) player (Sony TCD-D7) which fed off the pre-amp.
Thus a selection of trials that might have subtly different background or
presentation characteristics were available for pre-recorded presentation.
Procedure
Subjects were seated in the booth and following the instructions l , they were played
examples of all six stimuli that had been pre-recorded onto digital audio tape in a
normally reverberant quiet room. This recording was to familiarise subjects with the
stimulus sounds so that there would be no confusion about the descriptions provided
about each sound in the instructions. Subjects were then provided with two response
sheets (see Figure 9. 1) - one for each pinna condition.
For those in the live condition the six sounds were played by the experimenter around
the manikin in the specified order with the aid of the male laboratory technician to
speak one of the stimuli (the word "chips"). This procedure was for the first pinna
condition and then after 30 seconds the experimenter said: "the second phase will
begin in 10 seconds". This was the listener's cue to prepare for the second set of
stimuli (different pinna condition) and mark their responses on the second response
sheet.
For trials that were being recorded, the DA T player was started after closing the booth
door and was left running until the subject had left the booth. This ensured that
I See Appendix SF
117
Chapter 9: Live Relay using the KEMAR Manikin
subjects in the.pre-recorded condition experienced an identical situation to those in
the live condition. Thus subjects in the recorded condition are unaware at the outset
that they were listening to a recording of a live situation. All subjects were fully
debriefed after the experiment.
o
Figure 9.1: Response diagram provided to subjects (I for each pinna condition). The square
represents the environment/room in which the stimuli are played, viewed from above. The
head shows the manikin's position at the centre of the room. The dimensions are not to
scale and no furniture or fittings are shown.
118
Chapter 9: Live Relay using the KEMAR Manikin
RESULTS
A two-way within subjects analysis of variance was used to analyse the results (see
Table 9.1). Absolute angular error for uncorrected judgements was ±63.1° for the live
condition and ±S9.4° for the recorded condition, with pinna and no pinna combined.
When corrected for front-back errors the results were ±24.7° for the live presentation
and ±19.9° for recorded presentation, for pinna and no pinna combined. Figure 9.2
shows the mean error values (front-back corrected) for each condition for pinna and
no ptnna.
For the live condition the front-back errors made were 30% for pinnae and 48% for no
pinnae, a statistically significant difference (p$O.OS, related t-test). For the recorded
conditions, pinnae and no pinnae produced similar values of 46% and 42%
respectively, which was not statistically significant (see Figure 9.3).
119
Chapter 9: Live Relay using the KEMAR Manikin
Anova: Two-Factor With Replication
SUMMARY Live Recorded Total Pinna
Count 9 9 1 8 Average 29.63 45.83 37.73 Variance 331.75 95.52 270.56
No Pinna
Count 9 9 1 8 Average 48.15 41.67 44.91 Variance 586.45 381.87 466.79
Total
Count 1 8 1 8 Average 38.89 43.75 Variance 522.86 229.25
#DIA Source ot Variation $ dt MS F P-value F crit Pinna/no Pinna 463.51 1 463.51 1.33 0.26 4.15 Live/Recorded 212.67 1 212.67 0.61 0.44 4.15 Interaction 1157.64 1 1157.64 3.32 0.08 4.15 Within 11164.72 32 348.90
Total 12998.54 35
Table 9.1: Analysis of variance for live and recorded presentations for pinna and no pinna.
No statistically significant effects were found, although the interaction between pinna effects
and presentation method was only marginally insignificant.
120
35
30 ~
0
25 ~
0 20 ~ ~
uu
~ 15
Cl I: 10 ...:
5
0
Chapter 9: Live Relay using the KEMAR Manikin
Mean Error for Live and Recorded Presentations.
Pinna
Pinna Condition
No Pinna
-D--Live ____ Recorded
Figure 9.2: Mean angle errors (front-back corrected) for the two presentation conditions;
live and recorded, for pinna and no pinna. ±2 standard error bars are shown.
121
III ~
0 ~ ~
w
-" (J
'" m , -c: 0 ~
u.
" Cl
'" -c:
" (J ~
" Q.
Chapter 9: Live Relay using the KEMAR Manikin
Front/Back Errors
80
70
60
50 -O-Live
40 ---- Recorded
30
20
10
0 Pinna No Pinna
Pinna Condition
Figure 9.3: Percentage of front-back errors for pinna and no pinna for the live and recorded
presentations. ±2 standard error bars are shown. Although the standard errors are large, a
statistically significant difference (p$;O.05, related t-test) was found between pinna and no
pinna for the live condition. There is also a small, statistically insignificant (ANOY A,
f = 3.32, df = 34) interaction between presentation method and pinna condition.
122
Chapter 9: Live Relay using the KEMAR Manikin
DISCUSSION
The mean front-back corrected angle errors of ±24.7° for the live presentation and
±19.9° for the recorded presentation are consistent with -previous measurements in
this thesis, but high in comparison to some reported localization studies (e.g. Makous
& Middlebrooks, 1990; Stevens & Newman, 1936). Although they do reproduce the
findings of other studies with a more similar methodology (e.g. Freedman & Fisher,
1968; Schlegel, 1994; Wenzel et ai, 1993; Wightman & Kistler, 1989). Cutting out
the step of recording to and playing back from digital audio tape did not increase
judgement accuracy, as was expected. In fact, it was the recorded condition that gave
a smaller error, but more front-back errors, although this was not statistically
significant.
The high errors, therefore, do not seem to be a result of the recording process, since
the live condition gave similar results. This leaves 3 alternatives; that there may be
problems with the manikin's relay of the sound or that the absence of visual cues, or
head movements, are causing inflated angle errors. Visual cues might include
knowledge of room size and surfaces as well as the more obvious links to the
perceived sound sources. Indeed, whilst for the live and recorded sounds subjects
reported a strong sense of presence, the number of judgement errors clearly indicates
that there are elements missing that may have increased the accuracy considerably.
For the pinna/no pinna comparison, no significant differences were revealed.
However, for the live presentations, the use of pinnae did produce a marked drop in
error (statistically significant, p$O.OS, related t-test) compared to no pinnae. The
results in earlier chapters have indicated that no difference could be expected and
furthermore, the pinna is only a secondary cue for azimuth discrimination. However,
the literature is conflicting regarding the role of the pinna for localization in the
horizontal plane. B ut these results clearly show that the pinna cannot be ruled out as
an influential cue for azimuth localization. Indeed, the finding for the live condition
that using pinnae resulted in fewer front-back errors, lends strong support to the
findings of Batteau (1967) and Freedman & Fisher (1968) that the pinna is useful for
localization in all dimensions. However, only the live presentation produced this
finding. For the recorded condition the values were reversed, with pinnae giving
higher front -back errors, although the difference was not significant.
123
Chapter 9: Live Relay using the KEMAR Manikin
The role of the pinna for detennining horizontal locations clearly remains unresolved.
Whilst it does not appear useful in aiding precise determination of locus, it has
resolved front-back errors, which plays some part in the process of making azimuth
judgements. The remainder, and majority, is done through interaural differences.
The sense of presence, mentioned earlier, was widely reported by the subjects for both
the live and recorded conditions. Listeners reported feeling very much as if they were
sitting in a 'busy' or 'active' room. Such reports are likely to be a result of including
background sounds, since no such sensations had been reported for previous
experiments. Yet these background sounds clearly did not aid the localization
process.
This study set out to investigate the effect of recording on localization judgements and
has revealed that the recording process does not confound judgement accuracy.
Indeed, the angle errors remain high despite eliminating the recording stage, although
these errors are consistent with earlier reported experiments. However, the manikin
recorded and live sounds do give a strong sense of presence and thus it is clear that
other important elements are missing in these presentations. A number of factors
have been investigated and perhaps the most obvious of those elements that remain
are vision and head movements, which now require investigation.
124
Chapter 10: Vision and Head Movements in Localization
CHAPTER 10
Vision and Head Movements in Localization.
ABSTRACT
Findings from previous chapters have shown that the spectral content of an auditory
signal is not used accurately by listeners to obtain information about a sound source
location. Optimising the stimulus type and using individualized pinnae has had little
effect on the results. While response method has been shown to halve the error
values, the results have failed to match those reported by some free-field experiments
(e.g. Makous & Middlebrooks, 1990).
One possibility is that any head movements made by the subject will confound the
signal when sounds are recorded on a manikin and played back over headphones in a
booth. Experiment I uses a head tracker to monitor the movement for a restrained
(clamped) and unrestrained still head. The results showed no statistically significant
differences (ANOY A) between the two conditions, indicating that the small head
movements made by subjects in the booth would be unlikely to affect judgement
accuracy.
In experiment 2, VISIOn and head movements are investigated by using sounds
presented in three listening conditions. Subjects listened to 3 stimulus types; speech,
clicks and noise presented in the horizontal plane in (a) a free-field condition or (b) a
pre-recorded condition with a visual correlate or (c) a pre-recorded condition with no
visual correlate. Head movements were allowed for half of the subjects, the other half
were kept stationary by use of a head clamp.
The pre-recorded stimuli played back with no visual correlate produced mean angle
errors similar (±1O.8° when corrected for front-back errors) to those obtained in
125
Chapter 10: Vision and Head·Movements in Localization
previous chapters employing similar test arrangements. The pre-recorded condition
with a visual correlate gave errors smaller than had previously been obtained (±3.7°
front-back corrected). The free-field results were surprisingly accurate with errors of
±O.3°. No improvement was noted for subjects in the 'head motion' condition.
These findings do not rule out the importance of head movements in localization, but
the accuracy of subject judgements with the addition of vision was so high, that head
movement cues were clearly too subtle to be detected.
126
Chapter ID: Vision and Head Movements in Localization
INTRODUCTION
Vision and head movements have purposely been omitted from earlier studies to
reveal whether the information provided by spectral cues alone are sufficient for
localization. However, results have been consistently poor despite changes in
variables such as stimulus type, number of sources, speaker span and using
personalised pinnae (see Chapters 4 to 8). Only a change in response method caused
a significant decrease in angle errors, but still these values are high compared to some
free-field studies.
One explanation for these large errors is that spontaneous head movements during
playback of recordings made using a still manikin may confound the signal. If a
subject moves during a signal, the percept does not remain in the same fixed position
in space but moves with the subject. Even though a subject is instructed to remain
still during experimental trials, small movements may result in confusing information
and poor judgements.
Experiment I examines whether head movements confound the signals in pre
recorded trials. A head-tracking device was used to gauge how much a subject moves
his/her head either when restrained using a head clamp, or unrestrained but
intentionally holding their head still. The Head Tracker can be used to monitor
azimuth, elevation and roll relative to a base unit.
Pollack & Rose (1967) carried out a series of studies to establish the role of head
movements in localization. The only condition that showed a clear and significant
improvement in acuity was with head motion - where subjects were presented with
a signal that remained until the head was aligned with the locus of the source.
Schlegel (1994) looked at free-field and headphone azimuth estimates of white noise
and clicks. Subjects, either blindfolded or wearing goggles, were required to turn and
face the sound and their position was recorded. Free-field azimuth errors were around
±3° when averaged across all angles. For locations at the sides, judgements were off
target by up to 10°, but in the midline errors were around 0°. Schlegel was also
interested in response method and as a comparison of motor and cognitive tasks, he
asked subjects in a separate series of trials, to report the location verbally in 5° classes
instead ofturning to face the sound (although the head was not fixed). The responses
127
Chapter 10: Vision and Head Movements in Localization
were equally accurate for both the verbal and cognitive tasks, with the standard
deviations being generally higher·for the verbal task.
Headphone estimates of stimuli generated using HRTF's were considerably poorer
than free-field judgements, with large numbers of overestimates. Sixty percent of
subjects were off-target by 20° and 32% were off-target by 40°. In fact, some
subjects overestimated angles by an astonishing 50°. Schlegel gives positive bias as
the reason for slight overestimation, since 5_10° systematic overestimation has been
reported by Oldfield & Parker (l984a, b) and is evident in Schlegel's own free-field
condition. However, he argues that errors in the order of 40c50° cannot be explained
by positive bias alone, indicating failure to take account of head movements as a
likely cause.
Since head movements have been shown to be useful in the free-field, Experiment 2
incorporates two 'head motion' conditions. Subjects will either have their heads
restrained by the use of a head clamp, or they will be allowed to move their heads
freely. For the free-field condition, this should help establish whether or not head
movements contribute to localization. For subjects listening in the pre-recorded
stimulus conditions (made with the manikin), a comparison of a restrained or
unrestrained head will determine whether head movements confuse subjects by
confounding the signal.
Shelton et al (1982) looked at the role of vision in sound localization but where the
sound sources themselves could not be seen. Their free-field study, which allowed
head movements, involved subjects reporting the location of narrow band sounds by
pressing a button on a control box that was held out of sight in the lap. They found
that under normal seeing conditions localization was significantly more accurate than
when vision was obscured by goggles.
Lovelace & Anderson (1993) based their study of vision in sound localization on the
findings of Shelton et at. They investigated the possible benefit of general vision
during the presentation of sounds, but where the targets themselves could not be seen
(similar to Shelton et all. Subjects were required to localize a 2-second speech noise
by pointing to the perceived origin of the unseen sound source. Speakers were spaced
at 10° intervals in the front left quadrant with a cloth separating the speakers from the
subject. All subjects took part in the two conditions; eyes closed (with a blindfold)
and eyes open. A statistically significant increase in error was found for the no vision
128
Chapter 10: Vision and Head Movements in Localization
condition compared to.the vision condition - from ±3.79° to ±6.18°. However, a
second experiment revealed that it is not the presence of vision per se that improves
accuracy. When subjects were asked to point their closed eyes towards the sound,
then open their eyes and verbally report the position, the accuracy was higher than
when a finger was pointed with eyes permanently open. So, it is .likely that vision in
this case was simply used to calibrate hand movement. Support for Shelton et aI's
findings is therefore not offered by Lovelace & Anderson, which leaves the role of
vision in localization unresolved.
Visual information was added in Experiment 2 by sitting the subjects in front of the
speakers in the free-field listening condition. Visual cues were also added to a pre
recorded listening condition which was created by placing the manikin amongst the
speakers (in place of the subject for the free-field condition) and recording the
experimental trial. The subjects were then seated where the manikin had been and the
pre-recorded sounds were delivered through headphones, rather than the speakers
themselves. This was compared to a third listening condition in which the same pre
recorded sounds were heard, but with no visual link to the sound sources.
Apart from vision and head motion, a third factor was varied. Since error may be a
facet of the distance between individual speakers, two speaker spacing intervals were
chosen. The greater interval (30°) lay outside mean error of 25° established from all
previous studies. If subjects are an average of 25° off-target, then 30° speaker
spacing should produce highly accurate judgements. The smaller interval - 20°, was
inside the mean error of 25° and thus should produce a very low target-response
mappmg.
129
Chapter 10: Vision and Head Movements in Localization
EXPERIMENT 1
METHOD
Subjects
Five male postgraduate students and academic staff served as volunteers.
Design
A repeated measures design was used to compare the degree of movement with:
1. a restrained head using a head clampl
2. an unrestrained, still head.
The order of the head restraint measuring conditions was counterbalanced.
Procedure
The subject was seated in a chair at the centre of a wooden hoop, with 7 speakers
placed on the hoop in front of the subject2. The subjects were instructed to keep their
heads as still as possible and to keep their eyes fixed on the speaker directly in front
of them, until the experiment was finished. Subjects were not informed of the exact
length of the experiment to prevent them from anticipating the time and moving their
head before the experiment had finished. They were simply told to remain still for a
couple of minutes.
The Head Tracker3 took measurements of the three dimensions of head movement;
azimuth (side-to-side), elevation (up and down tipping) and roll (pivoting). These
measurements were taken simultaneously at I-second intervals for a period of I
I See Experiment 2 Method section for a description of the head clamp.
2 A full description of the hoop and speaker set·up is given in the Procedure seclion of Experiment 2.
3 See Chapter I1 for a full description of the Head Tracker.
130
Chapter 10: Vision and Head Movements in Localization
minute. During this time subjects were seated in a normally reverberant room in front
of the apparatus used in Experiment 2 (see Experiment 2 Method). No sounds were
played, although there was a background noise level of 50 dB SPL, measured using a
sound level metre (Breul & Kj<er 2203).
RESULTS
For azimuth, elevation and roll, the self-restraint condition (no clamp) produced a
slightly greater range of movement than the forced-restraint (clamp) condition.
However, these differences were not statistically significant (p$0.05 related t-test)
(see Figure I O.l.l).
131
-o~ o
10
",~6 en c_ '" c a: '" '" E en'" '" > ~ 0 2 ~:;; «
-2
Chapter 10: Vision and Head Movements in Localization
Differences in Motion between a Restrained and Unrestrained Head
~"----I o Free Head
• Fixed Head,
Elevation Roll
Plane of Movement
Figure 10" 1.1: Movement of the head when either restrained by a head clamp or held still
but unrestrained. (NB: The y-axis does not represent absolute angles). The measurements
for azimuth, elevation and roll dimensions were taken simultaneously by the Head Tracker.
±2 standard error bars are included. No statistically significant differences (p';;0.05, related
t-test) between the different head restraint conditions were found.
132
Chapter 10: Vision and Head Movements in Localization
DISCUSSION
The results show that there is no statistically significant difference in head motion
between simply instructing subjects to keep their head still and fixing their head in a
clamp. This is despite a slightly greater range of movement for the unrestrained head
in all three movement dimensions. It should be noted, however, that these results
demonstrate how small spontaneous movements are when a subject is deliberately
keeping their head still, and not that head movements per se are not important.
Indeed, Thurlow & Runge (1967) profess the significance of head movements by
asserting that head turning is a spontaneous action performed by most listeners in
attempting to determine the location of a sound source.
In a free-field situation, where head movements are freely made, a large improvement
in localization acuity may well be demonstrated and this will be investigated in
Experiment 2. However, where only small changes in head position occur, no effect
would be expected. This supports the findings of Makous & Middlebrooks (1990),
who showed that movements in the order of 10 were effectively the same as a
stationary head.
Thus the movements that subjects make whilst listening to stimuli recorded using a
manikin, are not large enough to have any effect on the perception of those stimuli.
This has positive implications for the large angle errors reported in previous chapters,
in that they are unlikely to be the result of having an unrestrained head, such that head
movement confounds the signal.
133
Chapter 10: Vision and Head Movements in Localization
EXPERIMENT 2
METHOD
Subjects
Participants were 48 undergraduate and postgraduate student volunteers (18 males
and 30 females). All reported having normal hearing and had no previous experience
of hearing experiments.
Design
A 3*2*2*3 design is used:
3 listening conditions a) free field (see Apparatus section),
b) pre-recorded stimuli in the free-field set-up (with a visual
correlate) and
c) pre-recorded stimuli played back in the booth (no visual
correlate).
2 head motion conditions - head either restrained by a clamp or able to move freely.
2 speaker spacings - 30° and 20°.
3 stimulus types a) speech ("chips")
b) clicks and
c) white noise.
Four different subjects were used for each combination of listening condition, head
motion condition and speaker spacing condition ( 4 x 3 x 2 x 2) giving a total of 48
subjects. All subjects heard all stimulus types.
For the free-field condition, sounds were presented in the horizontal plane at 7
locations (_90°, _60°, -30° 0°, 30°, 60°, 90° in the 30° speaker spacing condition and-
60°, -40°, _20°, 0°, 20°, 40°, 60° in the 20° speaker spacing condition).
134
Chapter 10: Vision and Head Movements in Localization
For the pre-recorded conditions, the manikin was placed where the subject had been
and the experiment was recorded. These digitally recorded sounds were played back
to subjects whilst sitting amongst the speakers or in the booth.
Stimuli
Stimuli were either a I second white noise burst, a broad-spectrum click, or speech -
the word "chips" (all identical to those used in Chapter 8). The sounds were played
through 7 speakers (Radio Spares 8Q, 3"), placed on a wooden hoop (see Procedure).
The 3 stimulus types constituted separate trials, of which each comprised 14 stimuli
(2 repetitions of the signal at 7 speaker locations).
For the second two conditions, the sequence of trials was recorded onto Digital Audio
Tape (DA T) using microphones (Breul and Kjrer 4134, OS') placed at the eardrum
position (using a Zwislocki coupler) of a KEMAR manikin.
Apparatus
Condition 1-Free-Field
A wooden hoop, 3m in diameter and 3" in depth, was located l.lrn off the ground,
supported by wooden struts. The subject was seated at the centre of the hoop and
speakers were attached to the inside of the hoop, facing the subject and in front of
them (referred to as the 'ring' set-up). The subjects' chair was adjusted such that the
speakers were at ear-height. For subjects in the 'restrained head' condition, a head
clamp was placed behind the subjects' chair. This consisted of an upright wooden
pole with a semi-circular head support around the top into which the subjects head
was placed. The head was held firmly into the support by means of a thick adjustable
canvas strap that was secured around the forehead. Subjects in the 'free head'
condition were encouraged to move their heads freely once the signal had begun, but
were to relocate their heads to a central position after each response and during the
stimulus onset. The central position was the 0° azimuth speaker which subjects were
instructed to align their nose with.
135
Chapter 10: Vision 3Jld Head Movements in Localization
Condition 2 - Pre-recorded with a visual correlate Cm the ring)
Subjects were seated in the ring set-up, identically to those in Condition 1. The only
difference was that subjects were hence listening to recorded, not live, stimuli. These
recordings were made by placing the manikin at the centre of the wooden hoop, where
the subject had been placed in the free-field condition. The experimental trial was
then played from the speakers, as for condition I, and recorded by the manikin.
Subjects were then placed back where the manikin had been and listened to these pre
recorded stimuli over headphones, but were visually within the free-field set-up.
Condition 3 - Pre-recorded in the booth (no visual correlate)
Subjects were seated in a soundproof booth and heard the same recordings as in
condition 2. A diagrammatic representation of the speaker locations was provided -
either 30° or 20° spacing (see Figures 10.2.2a & b).
Procedure
Each subject was provided with three response sheets - one for each stimulus type
(see Figure 10.2.1). The stimuli were played in a fixed randomised sequence, with a 5
second interstimulus delay. These were controlled through a switch-box, operated by
the experimenter in another room. Each subject began the sequence at a different
point4.
4 Each subject heard a different permutation of the three stimulus types. However, since 8 subjects
were used in each condition and only 6 permutations result from 3 different variables, two of the
sequences were heard twice. Ordering was similar to that shown in Appendix G - for the azimuth
trials only.
136
Chapter 10: Vision and Head Movements in Localization
Subject ........................ .
Sequence .................... .
Trial ............................ .
Stimulus I Response
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Figure 10.2.1: Response diagram given to subjects in all conditions. Beside each stimulus
number a response letter had to be recorded (A - G). according to the perceived location of
the sound source.
137
Chapter 10: Vision and Head Movements in Localization
Azimuth Positions
D
A o G
Figure 10.2.2a: Diagram given to subjects showing the speaker locations in the horizontal
plane. Speakers were spaced at 30° intervals. Subjects heard each sound source and were
required to choose one of the letters, which represented actual target locations.
138
Chapter 10: Vision and Head Movements in Localization
Azimuth Positions
c D E
B F
o
Figure 1O.2.2b: Diagram given to subjects representing 20° speaker spacing ID the
horizontal plane.
139
Chapter 10: Vision and Head Movements in Localization
RESULTS
Error values for all subjects were averaged to give overall means for the three
presentation conditions. In the free-field condition the mean angle error was ±O.3°.
For pre-recorded stimuli in the ring, the mean error was ±3.7° and for the pre
recorded stimuli in the booth, the error was ±10.8°. A statistically significant
difference was found between the three listening conditions (p:;O.O 1, ANOV A).
For each of the three listening categories, values were broken down into speaker
spacing, head restraint and stimulus type, as shown in Tables 10.2.2a, b & c. A
significant effect was also found between speaker spacings for the clicks stimulus,
where the mean for the 20° spacing was 5.4° and for the 30° spacing was 7.3°
(p:;0.05, ANOVA). However, no other significant effects were observed (see Tables
1O.2.la-d).
The angle errors shown in Tables 10.2.2a, b & c typically show the values for 20°
speaker spacing condition to be smaller or roughly equal to the 30° spacing condition.
This contradicts the hypothesis that the 20° spacing would increase judgement errors.
However, angle error may not be useful in determining whether there was greater
confusion in one condition. If a subject is one speaker off-target, with 20° spacing a
20° error is obtained, whereas a 30° error is obtained for the 30° spacing. Thus,
whilst there may be more confusion with the 20° spacing condition, the 30° condition
may give larger errors and falsely indicate that the 30° condition was more poorly
judged.
Means for the different stimulus types were taken across all speaker spacing and head
restraint conditions. For Clicks the overall mean was higher (±6.38°) than "Chips"
(±4.l2°) and noise (±4.33°). This shows the opposite trend to the comparison of the
same three stimulus types in Chapter 8 (±l9.6° for clicks, which is lower than the
±20.8° and ±22.5° for "Chips" and noise). (NB: The values in the earlier experiment
are higher because a different response method was used. Also, the errors for this
study include low free-field error values). The differences between the stimulus types
in this experiment and in Chapter 8, however, are not statistically significant.
Information analysis is a method that examines the genuine degree of confusion in
such cases. The results are given in Tables 1O.2.3a, b & c. Confusion matrices for
140
Chapter to: Vision and Head Movements in Localization
each group show the pattern of responses (Figures IO.2.2a-I). A near perfect
transmission value of 2.76 bits was obtained overall for the free-field condition - a
perfect set of responses would give a value of 2.81 bits. For the ring playback
condition there was a mean value of 2.23 bits and for the booth condition an overall
mean of 1.51 bits.
141
Chapter 10: Vision and Head Movements in Localization
Factor spacing place movement
Levels 2 3 2
Values 1 1 1
2 2 2
3
Analysis of Variance for all sound
Source DF Seq SS Adj SS spacing 1 582.8 582.8 place 2 11977.8 11977.8 movement 1 665.3 665.3 Error 43 15144.4 15144.4 Total 47 28370.3
Unusual Observations for allsound
Obs. allsound 23 151. 430
Fit Stdev.Fit 45.824 6.057
Adj MS 582.8
5988.9 665.3 352.2
Residual 105.606
R denotes an obs. with a large st. resid.
Means for allsound
place Mean Stdev free-field 0.3 3.485 ring 3.7 3.485 booth 10.8 3.485
F 1. 65
17.00 1. 89
St.Resid 5.95R
P
0.205 0.000 0.176
Table 1O.2.la: Analysis of variance for all stimulus types combined for different; speaker
spacings ("spacing") - 20° and 30°, playback locations ("place") - free-field, ring and
booth, and head restraint conditions ("movement") - fixed (clamped) or free. Statistically
significant resu1ts are shown in bold type.
142
Chapter 10: Vision and Head Movements in Localization
Factor Levels Values spacing 2 1 2 place 3 1 2 3 movement 2 1 2
Analysis of Variance for "chips"
Source DF Seq SS Adj SS Adj MS F P
spacing 1 10.89 10.89 10.89 0.83 0.366 place 2 661.67 661. 67 330.84 25.33 0.000 movement 1 7.19 7.19 7.19 0.55 0.462 Error 43 561.60 561.60 13.06 Total 47 1241. 35
Unusual Observations for "chips"
Obs. "chips· Fit Stdev.Fit Residual st. Resid 11 17.1400 3.7498 1.1664 13.3902 3.91R 14 15.0000 4.5240 1.1664 10.4760 3.06R 23 17.1400 10.0590 1.1664 7.0810 2.07R
R denotes an ohs. with a large st. resid.
Means for "chips"
place Mean Stdev 1 0.1787 0.9035 2 3.6606 0.9035 3 9.1956 0.9035
Table IO.2.lb: Analysis of variance for the speech stimulus "Chips" for the two different
speaker spacings, three playback locations and two head restraint conditions. Statistically
significant results are shown in bold type.
143
Chapter 10: Vision and Head Movements in Localization
Factor Levels Values spacing 2 1 2 place 3 1 2 3 movement 2 1 2
Analysis of Variance for clicks
Source OF Seq SS Adj SS Adj MS F P
spacing 1 47.68 47.68 47.68 5.70 0.021 place 2 1089.05 1089.05 544.52 65.06 0.000 movement 1 23.49 23.49 23.49 2.81 0.101 Error 43 359.91 359.91 8.37 Total 47 1520.13
Unusual Observations for clicks
Obs. clicks Fit Stdev.Fit Residual St.Resid 14 15.0000 7.7687 0.9337 7.2313 2.64R 36 10.0000 4.3762 0.9337 5.6238 2.05R 38 0.0000 5.7754 0.9337 -5.7754 -2.11R 39 12.8600 5.7754 0.9337 7.0846 2.59R
R denotes an obs. with a large st. resid.
Means for clicks
place Mean Stdev 1 0.7588 0.7233 2 6.0725 0.7233 3 12.4113 0.7233
Table 10.2.1 c: Analysis of variance clicks, for the two speaker spacing conditions, the three
playback locations and two head restraint conditions. Statistically significant results are
shown in bold type.
144
Chapter 10: Vision and Head Movements in Localization
Factor Levels Values spacing 2 1 2 place 3 1 2 3 movement 2 1 2
Analysis of Variance for noise
Source DF Seq SS Adj SS Adj MS spacing 1 194.2 194.2 194.2 place 2 2709.7 2709.7 1354.9 movement 1 333.6 333.6 333.6 Error 43 10960.8 10960.8 254.9 Total 47 14198.3
Unusual Observations for nojse
Obs. noise Fit Stdev.Fit Residual 23 119.290 21.657 5.153 97.633
R denotes an obs. with a large st. resid.
Means for noise
place Mean Stdev 1 0.0000 3.99l 2 2.4181 3.99l 3 17.0094 3.99l
F 0.76 5.32 1. 31
St.Resid 6.46R
P
0.388 0.009
0.259
Table 10.2.1 d: Analysis of variance for the noise stimulus for the different speaker spacings, .
playback locations and head restraint conditions. Statistically significant resulls are shown
in bold.
145
Chapter 10: Vision and Head Movements in Localization
Tables 10.2.2a--<:: (shown on the next page) Mean angle error values for the free-field
condition are shown in 1O.2.2a. Averages are broken down into speaker spacing (30° and
20°), head restraint (either fixed in a clamp or free to move), and stimulus type ("Chips",
Clicks and Noise). Identical breakdowns are given for subjects listening to pre-recorded
stimuli in the original recording set-up (i.e. with a visual correlate) in (1O.2.2b) and for pre
recorded stimuli played back in the booth (10.2.2c).
146
Chapter 10 : Vision and Head Movements in Localization
10.2.2a .---------F-r-ee- .-F-i-e-Id- ----,
30° Spacing
"OllpS" Oick Noise
HeadRxed 0 .0 0.7 0.0
Head Fr ee 0 .0 2 .1 0.0
MEANS 0 .0 1 .4 0 .0
20° Spacing
Head Fi xed 0 .7 0.4 0 .0
Head Free 0.0 0.0 0 .0
M EANS 0 . 4 0 . 2 0.0
10.2.2b Ring Playback
30° Spacing
"Ol i ps" Oick Noise
Head Fi xed 3 .2 7.5 2 .3
Head Fr ee 3 .8 5 .9 2.1
M EANS 3.5 6.7 2.2
20 ° Spacing
Head Fixed 1 .8 5 .7 2 .9
Head Fr ee 3 .2 4.6 1 .8
M EA NS 2. 5 5.2 2.3
10.2.2c Booth Playback
30° Spacing
"Ol i ps" Oick Noise
Head Rxed 12 .3 16.1 13.9
Head Fr ee 7.0 11 .8 7 .0
M EA NS 9 .6 13 .9 10 . 4
20° Spacing
Head Fi xed 9 .3 11.4 10 .0
Head Fr ee 8.2 10.4 12 .1
MEANS 8.7 10.9 11 .1
147
Chapter 10: Vision and Head Movements in Localization
(a)
Free-Field
Speaker Head Transmitted I Number of Overall Spacing Motion Information (bits) Positions Mean
3(J' fixed .2.75 I 6.7 I I
free 2.71 I 6.6 2ff
I fixed 2.75 f 6.7
! free I 2.81 I 7 6.75
I
(b)
Ring
Speaker Head Transmitted I Number of Overall Spacing Motion Information (bits) i Positions Mean ,
30" I fixed I 2.22 i 4.7 I ! I
I free 2.27 ! 4.8 -----
2ff fixed 2.45 : 5.5 ,
free 1.96 I 3.9 4.73 I !
(c)
Booth
I
Speaker Head Transmitted I Number of Overall Spacing Motion Information (bits) I Positions Mean
3(J' fixed 1.45 I 2.7 free 1.75 I 3.4
2ff fixed 1.37 I 2.6
free 1.47 I 2.8 2.88
Tables 10.2.3a--<:: Information analysis for the free-field (a). ring playback (b) and booth
playback (c) conditions. Breakdowns are given in terms of speaker spacing and head
motion conditions. Transmitted information is in bits and the corresponding number of
reliably identified positions. from the total of 7. is also shown.
148
Chapter 10: Vision and Head Movements in Localization
Figures 10.2.2a--d: (shown on page /52) Confusion matrices showing the pattern of
responses for the free-field condition for (a) 30° speaker spacing with head fixed, (b) 30°
speaker spacing with head free, (c) 20° speaker spacing with head fixed and (d) 20°
speaker spacing with head free.
Figures 10.2.2e-h: (shown 011 page 153) Confusion matrices forthe ring playback condition
for (e) 30° speaker spacing with head fixed, (I) 30° speaker spacing with head free, (g) 20°
speaker spacing with head fixed and (hj' 20° speaker spacing with head free.
Figures 10.2.2i-l: (shown on page 154) Confusion matrices showing the response pattern
for the booth playback condition for (i) 30° speaker spacing with head fixed, U) 30°
speaker spacing with head free, (k) 20° speaker spacing with head fixed and (I) 20° speaker
spacing with head free.
\
149
c o
<= --'" o ~
(a)
-90
"60
-30
0
30
60
90
(b)
-90
-60
-30
0
30
60
90
-90
-60
-30
0
30
60
90
(d)
-90
-GO
-30
0
30
60
90
Chapter 10: Vision and Head Movements in Localization
-90 -60 -30 0 30 60 90
24
24
24
24
24
22 2
24
-90 -60 -30 0 30 60 90
24
24
24
24
24
23 1
2 22
-90 -60 -30 0 30 60 90
24
24
24
24
24
24
2 22
-90 -iiO -30 () 30 60 90
24
24
24
24
24
24
24
Response Position
150
= o .--.-rJl o
Q...
(e)
-90
-60
-30
0
30
60
90
(0
-90
-60
-30
0
30
60
90
(g)
-90
-60
-30
0
30
60
90
(h)
-90
-60
-30
0
30
60
90
Chapler 10: Vision and Head Movements in Localization
-90 -60 -30 0 30 60 90
20 4
7 17
5 19
1 1 22
24
2 21 1
4 20
-90 -60 -30 0 30 60 90
18 6
3 20 1
1 3 20
3 21
23 1
24
3 21
-90 -60 -30 0 30 GO 90
21 3
24
2 21 1
24
21 3
22 2
1 23
-90 -GO -30 0 30 60 90
18 5 1
3 18 3
1 3 20
1 23
1 18 5
13 11
6 18
Response Position
151
= o .--.-fIl o ~
(i)
-90
"60
-30
0
30
60
90
-90
-60
-30
0
30
60
90
(k)
-90
-60
-30
0
30
60
90
(I)
-90
-60
-30
0
30
60
90
Chapter 10: Vision and Head Movements in Localization
-90 -60 -30 0 30 60 90
18 6
6 10 8
2 6 1 1 5
8 14 1 1
3 13 7 1
3 9 12
1 4 19
-90 -60 -30 0 30 60 90
19 4 1
3 17 4
9 13 2
1 5 17 1
1 18 5
3 16 5
3 21
-90 -60 -30 0 30 60 90
17 6 1
6 12 5 1
1 6 13 4
1 6 16 1
2 12 7 3
7 8 9
1 1 1 12
-90 -60 -30 0 30 60 90
15 8
4 15 5
3 1 1 9
1 1 3 18 1
12 9 2
4 1 1 8
1 " 12
Response Position
152
Chapter 10: Vision and Head Movemems in Localization
DISCUSSION
The free-field localization task gave very accurate results, with angle error values no
larger than ±2°. This result seemed to be unaffected by changes in speaker spacing or
subject head motion conditions. However, there is a strong floor effect for the free
field results where several subjects obtained 0° error and so no effect of the different
conditions can be seen. Localization of pre-recorded signals in the ring set-up
produced higher errors than the free-field presentation (overall mean of ±3.7° for pre
recorded and ±O.50 for free-field - a statistically significant difference, ANOV A).
A significant difference (p!>O.Ol, ANOV A) was again found between the ring
presentation and the booth presentation of pre-recorded sounds (overall mean of
±1O.8° for the booth compared to the ring value of ±3.7°). Indeed, the error values
obtained for the booth condition in this study are comparable to those of previous
studies that have used the same (categorical) method of playing back in the booth
(where mean errors have ranged from ±8° to ±19°).
The three different stimulus types showed no statistically significant differences.
However, for all three listening conditions clicks produced the greatest error. Whilst
all three are broadband sounds, clicks are by far the shortest duration and subjects
reported the greatest difficulty in localizing this stimulus. Such brief signals may not
give subjects adequate time to glean useful information and therefore longer duration
sounds should be used to optimise performance.
Speaker spacing showed only a small effect, with the angle error for the 30° spacing
being higher than for the 20° spacing. This was contrary to the hypothesis that the
errors for the 20° spacing would be higher, since subjects are more likely to be
confused if the sound sources are closer together. As -previous chapters have shown,
25° spacing is necessary, on average, to give reliable localization accuracy. Thus if
the spacing is less than 25°, the error is likely to increase.
However, when attempting to ascertain confusion within a task, angle error provides
only limited information. Angle errors obtained when the speakers were spaced at
20° intervals can only be smaller than error values from the task where the speakers
were spaced 30° apart (because the possible error is made smaller by decreasing the
speaker spacing). Information analysis is therefore more appropriate, since it gives a
153
Chapter 10: Vision and Head Movements in Localization
measure of confusion within a task, rather than the overall error. The degree of
confusion for all presentation conditions is similar for both speaker spacings. This
shows that a similar type and number of errors were made for the 20° (mean 2.14 bits)
and 30° (mean 2.19 bits) spacings, even though the angle errors for the 30° spacing
are higher. For the free-field condition, however, it is unclear whether any
differences in confusion exist, since there is an apparent floor effect.
Thus the only large effect was between the different presentation conditions. The
most obvious cause for the discrepancy in angle error values between the free-field
and the ring presentations is that the ring uses recorded sounds delivered over
headphones. This implies that either some information is lost in the recording
process. This may have been a result of using nonindi vidualized pinnae, although
previously this has been shown to have little effect (see Chapter 4). It may also be
due to more subtle effects produced by an artificial ear canal, or a hollow torso cavity.
Head motion is the other main difference between the ring and the free-field.
However, the results show that better head restriction has no effect in the free-field
and this is surprising given the number of studies that have professed the importance
of head movements.
Pollack & Rose (1967) have shown that moving the head improves localization in a
number of ways. Firstly, movement allows the listener to orient their head towards a
sound and as studies have shown (e.g. Stevens & Newman, 1936; Makous &
Middlebrooks, 1990; Schlegel, 1994), people can determine location and distinguish
sounds more accurately in the midline. Thus by centring the head on the sound
source, judgements are likely to be more accurate. Secondly, movement increases
acuity because we can judge relative sounds more accurately than absolute sounds, as
is highlighted by the findings of Minimum Audible Angle studies (e.g. Mills, 1958;
Perrott, 1984; Perrott et ai, 1993). By moving the head the sound becomes a
continually changing stimulus which can be determined more exactly by the listener.
Finally, moving the head eliminates front-rear azimuth confusions by disambiguating
identical interaural timing and level differences that occur when the sound is in the
same location in front or behind the listener.
But whilst Pollack & Rose found that turning to face a sound source significantly
reduced the error rate, this only occurred with a sustained signal that allowed time for
subjects to orient their head towards it. For short-duration sounds, where the listener
154
Chapter 10: Vision and Head Movements in Localization
does not have time to face the sound source, there is no evidence that the information
gained from movement is any more beneficial than when the head is stationary.
This might explain the finding that -head movements had no effec~ since none of the
signals had more than a I-second duration. But the·presence of vision may also explain
this result. In cases where moving the ·head has clearly been an aid to localization (e.g.
Pollack & Rose, 1967; Makous & Middlebrooks, 1990; Stevens & Newman, 1936),
subjects have been unsighted (either with closed eyes, blindfolded or hidden from the
speakers by acoustically transparent cloth). In this study, where vision is incorporated,
subjects may be using visual information as the dominant cue, overriding the cues
provided by head movement. This is certainly supported by the finding that the visual
(ring) presentation produced more accurate performances than the in booth. Although
the differences between the two playback settings may also have had an effect.
For example, playing sounds back in a room congruent with the recording environment
may have increased accuracy. Subjects are denied knowledge of the original recording
room acoustics by playing sounds back in a sound-attenuating booth, which may
confuse the listener. Therefore, the increase in acuity in this study for pre-recorded
stimuli, may not be totally due to providing a visual element (the presence of speakers).
It is more likely to be a combination of playing back with visual aids in an acoustically
similar environment. Determining the effect of having identical room acoustics for
recording and playback could be done by providing subjects with a diagrammatic
representation of the speakers in the recording room, rather than the booth.
This study has found vision to play a major role in enhancing auditory localization.
However, head movements have not been ruled out as a useful cue. Indeed, they may
play an important part in localization, but subjects were so accurate with the addition of
vision, that head movement cues were clearly too subtle to be detected. Perhaps in the
absence of vision, head movements would have the same highly beneficial effect on
accuracy. But for this study, a visual correlate appears to be the major factor in
increasing judgement accuracy.
These findings have important implications for VR implementations. If a high
judgement accuracy relies on vision, then it is unlikely that auditory cues alone can be
used to identify targets. Auditory information can only be used to supplement or guide
VISIOn.
155
Chapter 11: Head Movements using the Head Tracker
CHAPTER 11
Head Movements using the Head Tracker.
ABSTRACT
The importance of head movements in localization tasks have been emphasised by
many researchers (e.g. Van Soest, 1929; Young, 1931; Wallach, 1939, 1940; Thurlow
& Runge, 1967; Wightman et al, 1987). Previous attempts in this thesis to assess
head movements (Chapter 10) were not effective in isolating head motion and no
effect was found. Furthermore, this attempt did not incorporate the technology used
in auditory virtual reality.
This study attempts to establish the contribution of head movements for head-tracked
and non head-tracked HRTF-generated stimuli - the method typically used in VR.
Listeners were asked to judge the location of white noise bursts presented in the
horizontal plane. For each of three head motion conditions (still, controlled
movement or free movement) the Head Tracker was either switched on or off, thus
either accounting or not accounting for head movements.
The overall angle errors were high; ±19.So with the Head Tracker on and ±21 0 with
the Head Tracker off, for all head movement conditions combined. Only one head
tracked motion condition showed an improvement over any of the non head-tracked
conditions - when subjects were able to move their heads freely (a 60 improvement
which was statistically significant, ANOVA, f = 6.99, df = 2).
Support for the literature is offered by these findinjSs. While head movements can
improve accuracy substantially (when subjects can move their heads as desired), head
motion does not produce near-perfect accuracy. However, equipment constraints
156
Chapter 11: Head Movements using the Head Tracker
were identified which may have increased the error values. Once the spatial
resolution in current technology has been refined, the potential for reducing error is
likely to be much higher.
157
Chapter 11: Head MoYements using the Head Tracker
INTRODUCTION
Head movement is a fundamental method of reducing the ambiguities relating to a
sound source location (e.g. Van Soest, 1929; Young, 1931; Hirsch, 1971). Primarily,
head motion is argued to increase accuracy by disambiguating front-rear azimuth
confusions (e.g. Wightman et al; 1987). Although the work of Young (1931) showed
that moving the head resulted in a changing binaural stimulus pattern, which in itself
aided localization accuracy.
Wallach (1939, 1940) argues a similar case to Young. He demonstrates that if one
turns one's head whilst a sound is being delivered, then cues are obtained for several
lateral angles for the same sound source direction. He shows that "Geometrically, a
sequence of lateral angles obtained in this manner completely determines a given
direction". Wallach also argues that the head movements are the primary means of
disambiguating front-back confusion and that the pinna only plays a part in the
absence of head movements (not very frequent in ordinary listening conditions) and
that front-back discrimination is the pinna's minor role.
The significance of head movements in localization was also emphasised by Blauert
(1983). He argued that if head movements are available, then all cues acquired from
monaural signal characteristics are overridden and motion becomes the dominant cue.
Support for this proposal was offered by Makous & Middlebrooks (1990), who
obtained very low mean angle errors (between 1.5° & 16.3°) for their free-field study
in which head movements were used.
This study attempts to isolate and investigate the effect of head movements on
localization accuracy. In Chapter 10, where head movements and vision were
investigated together, the effects of head motion were overshadowed by visual cues.
Furthermore, the study was conducted in the free-field, which fundamentally differs
from simulated 3-dimensional sound.
Sounds were generated using 'head related transfer functions' (HRTF's). The HRTF
determines the set of filter coefficients that models the 3D sound. For any given
source position, the HRTF will produce a specific set of filter coefficients based on
the azimuth and/or elevation location. Head movements can then be incorporated by
adding a head-tracking device. This device monitors the head position, which is read
158
Chapter J J: Head Movements using the Head Tracker
by a computer and fed back to the sound-generating equipment. The new position is
then accounted for by producing a new set of HRTF's and the process is then repeated
for each new head position. This method of generating sound and monitoring head
movement is akin to that used in current virtual reality set-ups. The equipment should
therefore give a true representation of what can be expected from the motion cues
provided by this technology but without the visual element.
Thurlow et al (1967) looked at the types of head movements listeners made when
asked to localize a sound source. Subjects' movement was filmed and later examined
in terms of rotation (side-to-side), tip (up and down) and pivoting (roll- causing an
increase in height of one ear and a decrease in the other). A rotate-tip movement was
the most frequent with a mean of 66% of subjects performing this motion pattern.
The most common single action was rotation (with a mean of 36° total movement)
and the least common was pivoting (only 4% of subjects showing this pattern with an
average movement range of 10° - although there is less scope for movement in this
plane). The HT! has restricted capabilities and can only account for movement in 2
dimensions simultaneously. Therefore, for the head motion conditions, azimuth
(rotation) and elevation (tipping) were tracked and roll (pivot) was ignored. Thurlow
et aI's findings suggest that this should have very little effect on the results.
Indeed, as Wallach (1939, 1940) argues, any movement taken account of should give
some benefit, since it will disambiguate front from rear as well as providing a
changing stimulus which will increase localization cues.
This experiment will not just assess accuracy with and without head movements.
Subjects will be allowed to move their heads 'freely' whilst listening, since this will
allow them to exercise their 'natural' and experienced method of locating sounds.
However, if subjects are told to move their heads generally in a head movement
condition, each subject will move their head differently. Thus, a controlled
movement condition will be used in addition to a 'free movement' condition, to
maintain a cOri.sistent, and comparable, set of head movements. The subject will be
instructed to move their head 45° to the right at the onset of each stimulus. Since all
sounds are between 0° and 180°, this might be the closest movement to mimicking the
natural action in such a situation, since we naturally turn towards or in the direction of
a sound source (Thurlow & Runge, 1967; Pollack & Rose, 1967).
159
Chapter 11: Head Movements using the Head Tracker
METHOD
Subjects
Ten subjects were recruited by opportunity sampling. All were postgraduate students
(6 male and 4 female) between 22 and 40 years of age. All had reported normal
hearing and 3 had taken part in one hearing experiment previously.
Design
A 2*3 repeated measures design was used. Seven sound sources were played at 7
locations (0°, 30°, 60°, 90°, 120°, 150°, 180°) in the horizontal plane, all at head
height. The sound sources were repeated twice to form a trial comprising 14 sounds.
Each trial was played in a fixed randomised sequence and all subjects listened to these
trials under 6 conditions:
Head Tracker On - subject's head remains still
- subject's head moves to the right
- subject's head moves freely
Head Tracker Off - subject's head remains still
- subject's head moves to the right
- subject's head moves freely
Trials were counterbalanced to reduce practice effects.
Stimuli
The stimulus Wi\S a I-second gaussian noise burst with 25 ms onset and offset ramps.
The noise was generated using a C function, from a set of stimulus generation
routines on an AP2 computer card. The noise sequence was then transferred to a DSP
module (located in a Power SDAC unit, Tucker-Davis Technologies). Here, it was
convolved with the HRTF corresponding to the target location and also (where
relevant) in relation to the head position, as read from a Head Tracker (Polhemus
160
Chapter 11: Head Movements using the Head Tracker
3Space, Isotrak Il}. The SDAC unit then transformed the signal from digital into
analogue form, to be played through headphones.
The sound sources were played randomly from a play list, based on the 'stimulus
parameters' and 'experiment parameters' entered into the computer which controlled
the Head Tracker and Convolving apparatus. The stimulus parameters set four
variables. Firstly, the stimulus duration, which was set to I second to provide a
comparison with previous experiments that had used a noise stimulus. Secondly,
interstimulus interval was set to 6 seconds, adequate time for the subject to perform
any head movement necessary and record their judgement. Thirdly, attenuation (in
dB) was set to 6S dB SPL and lastly, whether or not the Head Tracker was switched
on or off was controlled.
The experiment parameters were the stimulus locations themselves (as specified in the
Design section).
Procedure
Subjects listened to the sounds over headphones (Senheiser HD-414). whilst seated in
a normally reverberant quiet room. Each subject was provided with instructionsl and
6 response sheets (see Figure ILl), one for each trial. Listeners were asked to locate
their head to a forward-facing position by aligning their head with a yellow cardboard
spot (I" in diameter) affixed to a vertical surface in front of the subject, at a distance
of I.S ft. A fully adjustable chair was provided to ensure the spot was at eye-level
and that the correct distance was maintained.
Each trial was initialised from a computer keyboard. It was imperative that as this
key was pressed the subject's head was facing the forward position. The Head
Tracker would set a relative 0° azimuth position from the subject's head position at
that point in time. Subjects were verbally reminded of this at the beginning of every
trial.
I See Appendix 50
161
Chapter 11: Head Movements using the Head Tracker
For the 'head still' condition, the subject could move after the stimulus offset to record
their responses and then the head was returned to the central position. For the 'head
moves right' condition, the subject moved their head as soon as the sound began to a
45° location (the centre of a specially positioned computer screen). After recording
their response, they were again required to affix on the yellow spot. For the 'head
moves freely' condition, the subject was free to move their head as desired once the
stimulus had begun, but returned the central position for the onset next stimulus.
Each trial was presented at approximately 1 minute intervals, which was the time
taken for the computer to be re-set in order to run a new sequence with different
parameters. Subjects were unaware of the Head Tracker function and all other
stimulus and experiment parameters during the experiment, but they were fully
debriefed after completing the 6 trials.
162
-------
Chapter 11: Head Movements using the Head Tracker
·Front
Left o Right
Back
Figure 11.1: Response diagram (actual size) given to subjects. Subjects marked the
numbers 1 to 14 on the diagram - the total number of stimuli per trial. A new response
sheet was provided for each of the 6 trials.
163
Chapter 11: Head Movements using the Head Tracker
RESULTS
Mean angle errors were calculated by averaging error values across subjects. For the
Head Tracker On condition, the overall mean error was ±19.5° and for the Head
Tracker Off the overall error was ±21°. A more detailed breakdown of the findings is
shown in Figure 11.2. A within-subjects analysis of variance (see Table 11.1) was
used to analyse the results.
With free head movements allowed and the Head Tracker On, the accuracy was
significantly higher than for all other conditions (ANOVA, f = 8.39, df = 2). For the
Head Tracker On condition, there were large differences between the different head
motion conditions: The angle error for the 'head move right' condition (±27°) was
significantly higher than the errors for the 'still' and 'move freely' conditions (±16,7°
and ±14.9° respectively). This pattern of results was not replicated with the Head
Tracker Off, where all head movement conditions showed similar errors.
A statistically significant (ANOV A, f = 6.6, df = 2) interaction was found between the
Head Tracker status (On/Off) and head motion conditions (see Figure 11.2).
The number of front-back errors was calculated to establish whether there was a
difference between the two Head Tracker conditions (On and Off). With the Head
Tracker switched on, the number of front-back confusions was I1 %. This value rose
to 15% with the Head Tracker switched off - a statistically significant increase
(p:'>O.05, unrelated t-test).
164
Chapter 11: Head Movements using the Head Tracker
ANALYSIS OF VARIANCE
analysis of variance for errors
SOURCE HTstatus motion :Interaction Error Total
HT status On Off
motion Head Still Head Move R Head Free
DF 1 2 2
54 59
Mean 20.6 21.0
Mean 19.2 25.3 17.9
SS MS F P 2.77 2.77 0.08 0.785
616.52 308.26 8.39 0.001 485.38 242.69 6.60 0.003
1985.05 36.76 3089.72
Individual 95% Cl --------+---------+---------+---------+---
(-----------------*------------------) (-----------------*-----------------)
--------+---------+---------+---------+---
19.2 20.4 21. 6 22.8
Individual 95% Cl -------+---------+---------+---------+----
(-------*-------) (-------*-------)
(-------*-------) -------+---------+---------+---------+----
17.5 21. 0 24.5 28.0
Table 11.1: Analysis of variance table showing mean angle error values. Head Tracker
status represents the Head Tracker On or Off. Motion refers to the three head movement
conditions; head still, controlled head movement to the right and head allowed to move
freely. Statistically significant effects are shown in bold type.
165
Chapter 11: Head Movements using the Head Tracker
Mean (Front/Back Corrected) Errors
35 T 30 J
~ 25 ~ .. 0 20 .. .. -0- Head Tracker ON t.l
" 1 5
j .. c ..:
10
---Head Tracker OFF
5 ~
I 0
Still Move Right Move Freely
Head Motion Condition
Figure 11.2: Mean angle errors with the Head Tracker switched on and off for the three
different head motion conditions ('still' but without restraint. a controlled 'movement to the
right' of 45° and 'free' movement). A statistically significanr improvement for the 'move
freely' condition with the Head Tracker On was found, over all conditions with the Head
Tracker Off. There was also a statistically significant improvement over the 'head move
right' condition with the Head Tracker On (ANOY A, f = 8.39, df = 2) and a statistically
significant interaction between HT status and head motion (ANOY A, f = 6.6, df = 2).
166
Chapter 11: Head Movements using the Head Tracker
DISCUSSION
The incorporation of head movements through the use of a Head Tracker has
produced a improvement in judgement accuracy, but not to the degree that was
anticipated. With the Head Tracker On, only one head movement condition showed
an improvement over Head Tracker Off conditions - where the head could be moved
'freely'. Furthermore, the angle error for this (statistically significant) improvement
was still surprisingly high. The benefits provided by visual cues, reported in Chapter
10, are clearly not matched by head movement cues alone. But although the angle
errors in general have remained fairly high, a clear benefit has been demonstrated for
cases where the listener is able to move their head as desired. 'Unnatural' or
controlled motion does not aid localization any more than keeping the head still. In
fact in many cases, unnatural movements appear to be confusing and make
judgements more difficult than when head movements are not incorporated. This is
highlighted by the statistically significant interaction between head motion condition
and the Head Tracker status (On or Off).
Despite the high errors, the pattern of results offers support to some published studies
which report an improvement with head movements, but not to the degree of
producing almost perfect accuracy (e.g. Hirsch, 1971; Blauert, 1983; Wightman et al,
1987). In particular, it closely matches Thurlow & Runge's (1967) study where they
report a reduction in error with head movements, but only of 30%. The reduction in
this study is also approximately 30%, from 21 ° to IS°.
One puzzling result was the difference between the 'still' head conditions with the
Head Tracker On and Off. It was assumed that these two conditions would produce
the same, or very similar results. If the head is still, then no head movements are
accounted for, making the two conditions apparently identical. However, there is a
rise in accuracy for the Head Tracker On condition (although marginally statistically
insignificant) that implies that additional cues may have been obtained from the very
small head movements that some subjects made. However, this was thought unlikely,
since such small head movements (in the order of I - 2° for the head still condition)
have been shown to have no effect (e.g. Makous & Middlebrooks, 1990. See also
Chapter 10).
167
Chapler 11: Head Movemenls using the Head Tracker
One other explanation for the difference between the two 'head still' conditions may
be the limited resolution of the Head Tracker. Although HTI has a resolution of 1°,
the HRTF bank has only a 5° resolution, which causes perceptible jumps or slight
irregularities in movement. The small movements of less than 5° (actually in the
order of I - 2°) made by subjects should therefore have no effect since the HRTF
resolutions require 5° movement in either direction to cause a perceptible change.
But if those movements are about the 0° point (which they often were), then a
noticeable change would occur, since from 359° to 0° is where a 5° 'jump' occurs.
Thus, even though these small movements shouldn't have had an effect, the very fact
that the I or 2° spanned the 0° azimuth position meant that their movements were
exaggerated and affected accuracy. Thus one might assume that with correction of
the resolution, no 'jumps' would be perceived and the 'head still' conditions for the
Head Tracker on and off would be very similar.
The number of front-back errors was calculated with and without head movements. A
statistically significant reduction in the number of front-back confusions was found
with the Head Tracker switched on. Combined with the overall improvement in angle
error, this front-back error reduction supports Wightman et aI's (1987) theory. They
claim that head movements reduce errors by preventing azimuth confusions. But if
head movements only aid localization through reduction of front-back errors, then
when the data is corrected for these errors, the effect of the head movements should
be nullified. Thus the 'head movement' and 'no head movement' data should be
similar when front-back corrected. Yet this is not the case, demonstrating that head
movements are performing a more complex function than simply correcting for
confusions. Such findings may lend support to Wallach (1939, 1940) and Young
(1931) who assert that head movements play a more extensive role than simply
resolving azimuth confusions.
The resolution of the equipment clearly presents problems for auditory virtual reality.
In order to fully establish the contribution of head movements as a cue, the (5°)
resolution must be refined to produce perceptibly smooth and small changes.
Certainly, to provide HRTF interpolations that match the capabilities of the
equipment (usually I ° resolution) is essential. But this study goes some way towards
highlighting the potential value of head movements in localization.
168
Chapter 12: General Discussion and Conclusions
CHAPTER 12
General Discussion and Conclusions
12.1 SUMMARY
The localization experiments reported in this thesis have provided an insight into
A YR. The major problem surrounding these 'virtual' sounds were the (apparently
unrealistically) large angle errors, which would result in real problems in safety
critical situations. The thesis provides a fundamental evaluation of the importance of
acoustic cues in locating targets.
Manikin recordings have been used extensively to assess localization acuity. A
manikin provides a direct and accurate means of achieving 3-dimensional sound that
isolates auditory cues (head movements, vision and pinna cues can be eliminated as
variables). A number of basic factors were then investigated in terms of their
contribution to localization.
The significance of pinna-based spectral cues was assessed in terms of making
azimuth and elevation judgements. The accuracy of locating sounds with one's own
pinnae was measured against using another person's pinnae or no pinnae at all. The
conclusions had a strong bearing on the future cost of producing A VR sounds and on
subsequent experiments.
Different response methods and stimulus types were used in several experiments
throughout the thesis. However, disparities in the results led to a controlled
comparison of these variables, to clear up ambiguities that surrounded these issues.
A systematic investigation of recording and playback techniques further contributed
to the extensive examination of factors that might be involved in localization. Yet
169
Chapter 12: General Discussion and Conclusions
with little success in reducing the angle error, these variables were clearly not the
major cues being utilised by the listener. The non-acoustic cues of head movement
and vision were incorporated and examined. A discussion of the major findings and
conclusions of all of these investigations is outlined in the sections below (major
conclusions are underlined for ease of reference). Some of the limitations of the work
are also presented with reference to possible improvements and suggestions for future
research.
12.2 DISCUSSION AND CONCLUSIONS
12.2.1 Individualized pinnae
A comparison of individualized, nonindividualized and no pinnae for azimuth and
elevation judgements is reported in Chapter 4. The results showed that there was a
small but statistically insignificant increase in accuracy when using one's own pinnae.
For azimuth judgements, this increase was 3°, which with a larger sample size may
have been significant. The effect for elevation was much smaller. only 1.3°, which is
less likely to give a significant result with a larger sample size. These results offer
support to Freedman & Fisher (1968) who similarly found no difference between
individualized and nonindividualized pinnae for azimuth and elevation judgements,
Therefore, in terms of maximising cues in AYR, using pinnae will undoubtedly lend
support. But the time consuming and costly procedure of manufacturing
individualized pinnae was not found to be worthwhile.
12.2.2 Pinnae/no pinnae
Chapter 4 provides the first comparison in this thesis of pinnae and no pinnae for
azimuth and elevation judgements, For the azimuth judgements, using pinnae (either
individualized or nonindividualized) gave similar results to no pinnae (0.3°
difference). This highlights the dominance of interaural timing and level differences
for azimuth discrimination, These results were replicated in Chapter 6, where the
pinna also showed no effect for azimuth. In Chapter 9, which looks only at azimuth
judgements, a larger improvement was found when using pinnae, but the 3° difference
170
Chapter 12: General Discussion and Conclusions
overall was insignificant. For elevation, however, a strong effect was found in
Chapter 6. The pinna increased judgement accuracy in the vertical plane
significantly. (by 7°) perhaps demonstrating the primary function of the pinna In
Chapter 4 the difference between pinna and no pinna for elevation was also larger
than for azimuth, by 5°, although this result was marginally insignificant. In this
particular case, where the subjects reported problems with localizing the stimuli,
overall task difficulty may have masked the subtle pinna effects. Generally, however,
these findings lend support to Freedman & Fisher (1968). They showed that for
elevation judgements there was an improvement for pinnae over no pinnae (as in
Chapters 6 and 9). However, as mentioned above, using individualized pinnae over
nonindividualized pinnae did not improve accuracy further.
Further support for Freedman & Fisher's findings are the large angle errors in Chapter
4. Initially, Freedman & Fisher's results seemed questionable because of their
untypically high errors (overall average of around ±34° without head movements).
However, the even higher mean angle error (±43°) in Chapter 4 reinforces their result.
These sets of results could be an indicator of judgement accuracy using pinna cues
alone in the vertical plane. Although it may also be that pinna cues are not being fully
utilised in this particular task. When Freedman & Fisher incorporated head
movements, their accuracy with pinna in the vertical plane rose to 22.5°. Thus with
the addition of cues such as head movement. the role of the pinna could be critical.
12.2.3 Stimulus type
Controlled comparisons of different stimulus types; speech, clicks and white noise
were reported in Chapters 8 and 10. Chapter 8 found that for azimuth, speech
produced the highest accuracy and noise the lowest, although this difference was not
statistically significant. Elevation showed the opposite effect, with noise producing
higher accuracy than either clicks or speech, but this result was also statistically
insignificant. The results are surprising given that the majority of subjects found the
click stimulus significantly more difficult to localize than either the noise or speech
sounds. Part of this result might be explained by expectation. A speech sound is a
known stimulus and we are familiar witl;! its movement around us in the horizontal
plane. But for elevation, the untypical behaviour of a person's voice varying with
height only, and being presented from higher elevations than usual, may lead to
greater difficulty in placing the sounds. However, it should be noted that the
171
Chapter 12: General Discussion and Conclusions
differences between stimulus types are subtle and not significant and may therefore be
due to individual variation.
Chapter 10, which compares the same 3 stimuli as Chapter 8, but for azimuth only
and in the free-field, also showed that clicks gave the greatest accuracy. When
comparing other experiments in the thesis, those using noise (e.g. Chapters 6 and 7,
roughly 24 ° and 20° on average) have given higher errors than those using clicks (e.g.
Chapters 4 and 5 - overall means of 15° and 19°). However, it is only possible to
make these comparisons between experiments with certain similar methodological
features. Variables such as response method (discussed below in section 12.2.4) have
a strong effect that overrides stimulus type. Thus, for experiments using a similar
response method. this thesis has generally found clicks to give the lowest errors for
azimuth judgements. For elevation. noise gives marginally greater accuracy than
clicks or speech.
12.2.4 Response method
Investigations into the effects of response method have highlighted an important
variable in localization studies. Whilst published literature has used a wide variety of
response techniques (e.g. Stevens & Newman, 1936, categorical; Pollack & Rose,
1967, head alignment; Wenzel et ai, 1993, verbal co-ordinate reporting; Lovelace &
Anderson, 1993, hand pointing), the effects of different response methods was
undetermined.
Experiments reported in Chapters 4 to 11 have either adopted a categorical method or
have allowed subjects to make a 'free', non-categorical judgement. Categorical
judgements, by their very nature, give cues to the locations of the targets and hence
guide the subjects' judgements. A non-categorical method of eliciting responses
allows for a completely unknown number and placement of sound sources, resulting
in 'true' placement of the perceived sound locations. Knowledge of the speaker
positions may have been the reason for obtaining such a huge improvement in
accuracy for the categorical method (from ±20° to ±8°), in a controlled comparison in
Chapter 7. Chapter 8 also showed a reduction in error from 24° to 15° when using a
categorical method for making azimuth judgements. For elevation judgements,
however, the effect was minimal, with a difference of just 2°. Subjects reported
considerable task difficulty when judging elevation which is likely to have
172
Chapter 12: General Discussion and Conclusions
outweighed the method of response. Experiments in this thesis have shown response
method to be highly influential for localization in the horizontal plane. These
findings should make response technique a fundamental element in future localization
research.
12.2.5 Visual stimuli
A number of acoustic and non-acoustic cues have been isolated, manipulated and
investigated in an attempt to increase accuracy. Yet none of these either alone, or in
combination, appear to offer sufficient localization cues to obtain free-field accuracy
(e.g. Makous & Middlebrooks, 1990, between ±1.6° and ±16° on average). It was not
until vision was included (Chapter 10) that there was a significant drop in error. For
pre-recorded sounds the addition of a visual context increased accuracy from ±ll ° to
±4°. Free-field presentation of the sounds reduced the error further to ±0.3°. The
discrepancy between pre-recorded sounds with a visual link and free-field
presentation may be a result of using nonindividualized pinnae. Although previously
this has been shown to have only a small effect (Chapter 4). It may also be due to
more subtle effects produced by the KEMAR' s artificial ear canal, or hollow torso
cavity. Nevertheless, providing a visual context or link to the sound sources
drastically reduces judgement error. even for sounds recorded using a manikin and
presented over headphones.
The work of researchers like Jackson (1953) and my own free-field study, where
vision was dominant, shows that the visual and auditory system must complement
each other and work together to avoid confusion. At close visual and auditory
stimulus deviations (up to at least 30°) vision will be dominant, and even when the
deviation between an auditory and visual stimulus is 90°, vision can have some effect.
Thus it is not possible to implicate only an auditory cueing system. that may conflict
or fail to be guided by vision. where 100% accuracy is required.
12.2.6 Live/recorded stimuli
A KEMAR has been used to assess localization accuracy in the majority of
experiments reported. An important contribution of this thesis has been the
examination of the process of recording stimuli using this technique. There is a clear
discrepancy in the literature between studies of localization that have used simulated
173
Chapter 12: General Discussion and Conclusions
3-dimensional sounds and those that have been conducted in the free-field. The latter
has produced results far more accurate (e.g. Stevens & Newman, 1936; Makous &
Middlebrooks, 1990) than those using pre-recorded or generated stimuli (e.g.
Wightman & Kistler, 1989; Wenzel et ai, 1993; Chapters 4 - 9 & 11). Although the
free-field might incorporate a very different set of acoustic cues to a 'virtual' auditory
environment, these could never be identified absolutely because the free-field studies
differed a great deal in methodology.
So what is it about the simulation process that inevitably results in higher errors? By
presenting sounds 'live' through the manikin in Chapter 9, instead of making digital
recordings and playing back from a tape, it was possible to show that little
information (if any) is lost in the recording process. Although live presentation
improved accuracy by 5° overall, the effect was not statistically significant. This
demonstrates that the recordings are a hi-fidelity reproduction of the original signal.
Pinna cues were also investigated in this chapter (see section 12.2.2) and were found
to have little effect on the overall accuracy in the absence of other factors.
12.2.7 Head movements
Head movements were initially examined in a free-field set-up in Chapter 10. It was
hoped that the cues provided by head motion would produce the same high level of
accuracy as visual context cues. However, there was no significant difference in
accuracy between restraining the head in a clamp (±O.30) or allowing it to move freely
(±0.35°). However, there was a strong floor effect and so the lack of statistical
significance is likely to be a methodological artefact.
In Chapter 11, head-tracked HRTF's were used to account for head movements. This
is an accurate representation' of the technique used in VR systems and was expected to
gi ve a good indication of potential accuracy. For each of 3 different head motion
conditions the Head Tracker was either switched on or off. It was therefore possible
t~ asses not 'only whether head movements aid localization but if certain types of
movement are preferred. Overall accuracy was low (±200) and there was only a 2°
improvement for head-tracked over non head-tracked conditions. But where subjects
moved their head 'naturally' (as opposed to a specified controlled movement or no
movement at all) there was a large (statistically significant) improvement in accuracy,
for head-tracked over non head-tracked stimuli - from ±21 ° to ±14.9°.
174
Chapter 12: General Discussion and Conclusions
Clearly. head movements will only reduce error significantly where subjects are
allowed to move their heads as is natural and typical for them. Nevertheless.
incorporating head motion using HRTFs. without a visual context. does not produce
the near-perfect accuracy of free-field listening with a visual context (Chapter 10).
The ability to monitor all 3 dimensions of head movement may slightly improve
judgement accuracy. For .head-tracked 3D audio sounds only azimuth (side-to-side
rotation) and elevation (up-down tipping) movements were accounted for, roll
(pivoting) was ignored due to software limitations. The decision to omit roll, rather
than either azimuth or elevation, was based on findings by Thurlow et al (1967), who
found roll to be the least performed or necessary movement in making localization
judgements.
12.3 IMPLICATIONS FOR VR
In a system where auditory warning cues work alone, this thesis has shown that
potentially large misjudgements will undoubtedly occur. What is apparent from the
findings is the significance of including a visual context at the very least. Certainly in
terms of producing almost perfect accuracy. which is paramount in safety-critical
situations. auditory localization is not sufficient as a sole cue to location.
The problems encountered in VR are not helped by the limitations of the equipment.
The resolution of head-tracked HRTF generated stimuli will almost certainly cause
problems for A VR unless it can be refined. No doubt these refinements will be
achieved in the relatively near future, since VR is a fast-moving field. What is
required of the technology is at least a 10 resolution to give a more realistic
representation of the way a sound behaves when we move our head. This will also
give a more accurate assessment of the effects of head movements on accuracy.
Training of pilots might improve matters, although the effects of training may be
minimal. This is because rnany other factors in cockpits will hinder the available
localization cues. Cockpit noise. excess auditoz:y information and headphone quality
may limit localization performance to a degree that perhaps even extensive training
cannot compensate. Certainly, for situations where the user has little or no time to
become accustomed to the equipment and sound sources, then the localization errors
175
Chapter 12: General Discussion and Conclusions
reported in Chapter II should be taken as true indicators of the accuracy that can be
expected for auditory virtual reality sounds.
A VR systems might be effective for drawing attention to instruments in the cockpit
(where vision is available) but not remote unseen objects or targets. especially where
front-back errors may intrude.
12.4 PROPOSALS FOR FUTURE WORK
Attempts have been made to study the fundamental processes that contribute to
auditory localization. Rigorous investigations of acoustic and non-acoustic cues have
been successful in increasing our understanding of how we localize sounds. But
despite the progress made, the work is by no means complete. As ever there remain
some puzzling aspects of localization which require further investigation if they are to
be resolved. It is hoped that the work set out in this thesis has laid the groundwork for
future research within this field.
Throughout the experiments some suggestions for future directions and also some
specific areas of necessary research have been made. Outlined below is a summary of
these ideas and some new proposals based on the conclusions drawn in the sections
above (12.2.1-5).
I. Experiments in a cockpit set-up with the relevant visual and tactile cues.
Applying the major findings of this thesis to a simulated cockpit environment
will provide conclusive information about the validity of the results. It will
allow important research to be geared within the relevant visual and spatial
setting and may reveal many new problems as well as solutions to old
problems.
2. A critical factor is to implement the auditory equipment that would actually be
available in simulated cockpit environments in order to work with the existing
facilities and adapt localization cues accordingly. For example, if the
headphones used tended to attenuate high frequency sounds, thus reducing the
176
Chapter 12: General Discussion and Conclusions
pilots front-back discrimination, then high frequency components of the sound
could be boosted to compensate.
3. Taking account of cockpit noise is essential in attempting to produce a useful
set of auditory cues. It is likely that the range of optimal signal types outlined
in this thesis will need to be rethought in light of the new auditory
environment. The fairly long-duration broadband signal that give the greatest
accuracy with head movements allowed may become lost in a consistently
noisy background. Studies on masking will undoubtedly be of use to such
research.
4. Research into the effects of high stimulus intensity on localization. Loud
cockpit noise will require a fairly intense signal, which must still remain
within a safe threshold. This area of research is much needed since no
information is available about the effects of near-threshold sounds on acuity.
The problem of stimulus intensity is strongly linked with the problems
surrounding stimulus type. The solution may be to analyse the noise in
different types of cockpits and generate a signal that is optimised in such noise
and could therefore be less intense. Perhaps even work into noise cancellation
would aid the process if intense signals were found to significantly hinder
localization accuracy.
5. Equipment refinement.
a) Experiments using head-tracked HRTF generated 3D stimuli should be
conducted with a software capable of superior resolution - at least 10.
It may well be that some institutions are already developing or in
possession of such equipment, in which case it will be readily available
in the near future. A realistic representation of how a sound behaves as
we move our head is critical if insight is to be gained into the role of
head movements using head-tracking equipment.
b) Also linked to refined technology is the need for full tracking of a
subjects' head motion during such experiments. The software utilised in
this thesis could only account for head motion in two of the three
dimensions of azimuth, elevation and roll. Thus a choice was made to
incorporate azimuth and elevation, based on the findings of Thurlow et
al (1967) that found roll to play a minor part. Nevertheless, roll may
177
Chapter 12: General Discussion and Conclusions
have a small effect and give a more realistic experience for listeners,
thus boosting acuity. It is essential to replicate, as accurately as
possible, everyday listening conditions.
178
References
REFERENCES
Attneave F (1959) Applications of information theory to psychology. Henry Holt and
Company - New York.
Batteau D W (1967) The role of the pinna in human localization. Proceedings of the
Royal Society 168, 158-180.
Begault D R & Wenzel E M (1991) Headphone localization of speech stimuli.
Proceedings of the Human Factors Society. San Francisco CA. September.
Bekesy G (1960) Experiments in Hearing. McGraw-Hill, New York.
Blauert J (1969) Sound localization in the median plane. Acustica, 22,205-213.
Blauert J (1983) Spatial hearing: The psychophysics of human sound localization.
MIT Press: Cambridge, MA.
Butler R A (1969) Monaural and binaural localization of noise bursts vertically in the
median sagittal plane. Journal of Auditory Research 3, 230-235.
Butler RA & Humanski (1992) Localization of sound in the vertical plane with and
without high-frequency spectral cues. Perception and Psychophysics, 51 (2),
182-186.
Coleman P D (1962) Failure to localize the source distance of an unfamiliar sound.
Journal of the Acoustical Society of America, 34 (3), 345-346.
Durlach N I & Col burn HS (1978) Binaural Phenomena in "Handbook of
Perception", edited by E Carterette, Academic, New York, Vo!. IV
Durlach N I et al (1992) On the externalization of auditory images. Presence, 1 (2),
251-257.
179
References
Edwards E (1969) Infonnation transmission. An introductory guide to the application
of the theory of information to the human sciences. Chapman and Hall.
Freedman S J & Fisher H G (1968) The role of the pinna in auditory localization. In:
Freedman S J (Ed) "The neuropsychology of spatially oriented behaviour."
Dorsey Press, Homewood, Illinois.
Gardner M B & Gardner R S (1973) Problem of localization in the median plane:
effect of pinnae cavity occlusion. Journal of the Acoustical Society of America,
53 (2), 400-408.
Gelfand S A (1990) Hearing. An introduction to psychological and physiological
acoustics. Marcel Dekker, Inc.
Giguere C & Abel S (1993) Sound localization: Effects of reverberation time, speaker
array, stimulus frequency and stimulus rise/decay. Journal of the Acoustical
Society of America, 94 (2), 769-776.
Good M D & Gilkey R H (1996) Sound localization in noise: The effect of signal-to
noise ratio. Journal of the Acoustical Society of America, 99 (2), 1108-1117.
Hake H W & Gamer W R (1951) The effect of presenting various numbers of discrete
steps on scale reading accuracy. Journal of Experimental Psychology, 42, 358-
366.
Hartmann W M & Wittenberg A (1996) On the externalization of sound images.
Journal of the Acoustical Society of America, 99 (6), 3678-3688.
Hebrank J & Wright D (1974a) Are two ears necessary for localization of sound
sources on the median plane? Journal of the Acoustical Society of America, 56,
935-938.
Hebrank J & Wright D (1974b) Spectral cues used in the localization of sound
sources on the median plane. Journal of the Acoustical Society of America, 56,
1829-1834.
180
References
Hirsch I J (1971) Masking of speech and auditory localization. Audiology, 10, 110-
114.
Jackson C V (1953) Visual factors in auditory locazlization. Quarterly Journal of
Experimental Psychology, 5, 52-65.
Loomis J M, Hebert C & Cicinelli J G (1990) Active localization of virtual sounds.
Journal of the Acoustical Society of America, 88, 1757-1764.
Lopez-Poveda E A (1996) The physical origin and physiological coding of pinna
based spectral cues. Doctoral Thesis, Loughborough University.
Lovelace E A & Anderson D M (1993) The role of vision in sound localization.
Perceptual and Motor Skills, 77,843-850.
Makous J C and Middlebrooks J C (1990) Two dimensional sound localization by
human listeners. Journal of the Acoustical Society of America, 87 (5) 2188-
2200.
Middlebrooks J C, Makous J C & Green D M (1989) Directional sensitivity of sound
pressure levels in the human ear canal. Journal of the Acoustical Society of
America, 86, 89-108.
Middlebrooks J C & Green D M (1991) Sound localization by human listeners.
Annual Review of Psychology, 42, 135-\'59.
Miller G A (1956) The magical number seven plus or minus two: some limits on our
capacity for processing information. Psychological Review, 63 (2), 81-97.
Mills A W (1958) On the minimum audible angle. Journal of the Acoustical Society
of America, 30 (4), 237-246.
Musicant A & Butler R (1984) The influence of pinnae-based spectral cues on sound
localization. Journal of the Acoustical Society of America 75, 1195-1200.
Old field S R & Parker SPA (1984a) Acuity of sound localization: a topography of
auditory space. I. Normal hearing conditions. Perception 13, 581-600.
181
References
Old field S R & Parker SPA (I984b) Acuity of sound localization: a topography of
auditory space. II. Pinna cues absent. Perception 13, 601-617.
Perrott D R (1984) Concurrent minimum audible angle: A re-examination of the
concept of auditory spatial acuity. Journal of the Acoustical Society of America,
75 (4),1201-1206.
Pick H L, Warren D H & Hay J C (1969) Sensory conflict in judgements of spatial
direction. Perceptual Psychophysics, 6, 203-205.
Pollack I & Rose M (1967) Effect of head movement on the localization of sounds in
the equatorial plane. Perceptual Psychophysics, 2, 591-596.
Lord Rayleigh (1907) Our perception of sound duration. Philosophical Magazine, 13,
214-232.
Sandel TT, Teas D C, Feddersen W E & Jeffress L A (1955) Localization of sound
from single and paired sources. Journal of the Acoustical Society of America,
27, (5) 842-852,
Sayers B & Cherry C (1957) Mechanism of binaural fusion in the hearing of speech.
Journal of the Acoustical Society of America, 29, 973-987,
Schlegel P A (1994) Azimuth estimates by human subjects under free-field
and headphone conditions. Audiology 33, 93-116.
Searle D et al (1975) Binaural pinna disparity: another auditory localization cue.
Journal of the Acoustical Society of America, 57 (2), 448-455.
Shaw E A G & Taranishi R (1968) Sound pressure generated in an external ear replica
and real human ears by a nearby point source. Journal of the Acoustical Society
of America, 44, 240-249.
Shelton B R & Searle C L (1978) Two determinants of localization acuity in the
horizontal plane. Journal of the AcousticalSociety of America, 64 (2), 689-691.
182
References
Shelton B R & Searle C L (1980) The influence of VISIOn on the absolute
identification of sound-source position. Perceptual Psychophysics, 28, 589-596.
Shelton B R, Rodger J C & Searle C L (1982) The relationship between, head motion
and accuracy of free-field auditory ·localization. Journal of Auditory Research,
22,1-7.
Siegel J A & Siegel W (1972) Absolute judgement and paired-associate learning:
Kissing cousins or identical twins? Psychological Review, 79 (4), 300-316.
Stevens S S & Newman E B (1936) The localization of actual sources of sound.
American Journal of Psychology, 48, 297-306.
Thurlow W R & Runge P S (1967) Effect of induced head movements on localization
of direction of sounds. Journal of the Acoustical Society of America, 42, 480-
488.
Thurlow W R et al (1967) Head movements during sound localization. Journal of the
Acoustical Society of America, 42 (2), 489-493.
van Soest J L (1929) Richtungshooren bij sinusvorrnige geluidstrillingen [Directional
hearing of sinusoidal sound waves]. Physica 9,271-282.
Wallach H (1939) On sound localization. Journal of the Acoustical Society of
America, 10, 270-274.
Wallach H (1940) The role of head movements and vestibular and visual cues in
sound localization. Journal of Experimental Psychology, 27, 339-368.
Wenzel E M, Wightman F L & Kistler D J (1991) Localization with non
individualized virtual acoustic display cues. In: Proceedings of CH! '91, ACM
Conference on Computer-Human Interaction. New York: ACM Press, pp 351-
359.
Wenzel E M et al (1993) Localization using non individualized head-related transfer
functions. Journal of the Acoustical Society of America, 94, (1) 111-123.
183
References
Wightman F L & Kistler 0 J (1989) Headphone simulation of free-field listening. 2:
Psychophysical validation. Journal of the Acoustical Society of America 85 (2),
868-878.
Wightman F L, Kistler 0 J & Perkins M E (1987) A New Approach to the Study of
Human Sound Localization. In Directional Hearing W A Yost & G Gourevitch
(Eds.) Springer-Verlag.
Woods W S & Kulkarni A (1992) Some examples of binural recordings with KEMAR
in anechoic and reverberant environments. Unpublished, Department of
Biomedical Engineering, Boston University.
Wright D, Hebrank J H & Wilson B (1974) Pinna reflections as cues for localization.
Journal of the Acoustical Society of America, 56 (3), 957-962.
Young P T (1931) The role of head movements in auditory localization. Journal of
Experimental Psychology, XIV, 2, 95-124.
184
Appendix I: Pinnae photographs
APPENDIX 1
Pinnae Photographs
Photographs of the pinna supplied with the manikin and examples of the pinna moulds
made for use in this thesis.
The photographs show the following:
1. The standardized rubber pinnae supplied with the KEMAR. Shown are models
DB-065 (the larger, red, right pinna mould) and DB-06l (the smaller, pink, left
pinna mould).
2. Pinnae of subject AC, moulded using the technique described in Chapter 3. The
pinnae of volunteer AC were typically used for 'nonindividualized' pinna
conditions.
3. The 'infills' used as a no-pinna condition. These fit flush with the KEMAR's
head and were used in place of pinnae in a number of experiments.
185
Right
Knowles-manufac tured
KEMAR pinnae
DB-06S
Cm.
186
Appendix 1: Pinnae photographs
1.
2.
3 .
4 .
S. Knowles-manufactured
6. KEMAR pinnae
DB-061
Cm.
Cm.
Left Infill
1 .
2.
3.
4.
s. 6.
Appendix 1: Pinnae photographs
AC (Left)
2.
3.
4.
5.
6.
Right Infill
Moulded pinnae of subject AC (used for all nonindividualized pinnae conditions) (top)
and infills (bottom).
187
Appendix 2: Calculation of transmitted information
APPENDIX 2
Calculation of transmitted information.
A typical subject's responses are given below:
RES P ON S E S
1 2 3 4 5~ 6 7 8 9
S P 1 5 1
T 0 2 5 1
S 3 3 2 1
M 4 1 4
U T 5 5 1
L 6 3 3
U 0 7 1 4 1
S N 8 2 4
9 2 4
TOTALS 13 4 2 4 6 1 8 15 1
1.74 bits
188
TOTALS
6
6
6
6
6
6
6
6
6
GRAND TOTAL
54
Appendix 2: Calculation of transmitted infonnation
Calculation of transmitted information (HT) is based on 3 ancillary measures:
HS: Stimulus total frequencies are divided by the grand total (the sum of all stimulus
and response totals). This gives the value of P;.
HS = I P;log2P,
HR: Response total frequencies are divided by the grand total yielding the value of
P;.
HR = S-P;log2P;
HSR: Individual response frequencies are divided by the grand total to give P;
(ignoring empty cells since they give zero values).
HSR = S-P;log2P;
HT = HS + HR - HSR. In this case transmitted information is 1.74 'bits' per stimulus.
2HT: Gives the number of alternative positions that can be identified without error.
189
Appendix 3: Headphone and Tubephone Responses
APPENDIX 3
Responses of the headphones and tUbephones to a click.
00'9GlLI
OOL£S9I
OO'SliSSI &" -0 00'6SISI 0
'" c OO'OLlivl 0 -" 0.. OO'HIL£l '" .0 ~ f- 00'Z60£l
OO'£OvZl
OO'vILI[
OO'SWll
&" OO'S££Ol -0 0 00'9v96 '" c: 0 OO'LS68 17 -" 0.. c:
-0 00'8908 " '" " " <::r' :r: 00'6LSL " - -" '" >, 00'0689 ~
t 00'lOZ9
OO'ZlSS
00'£Z8V
OO'v£[v
OO'Slili£
bO 00'9sa " Cl 00'L90Z 0
" OO'SL£l c: '50 '~ 00'689 0
f OO'l
C> C> C> C> C> C>
0 C> C> C> q C> 0 0 0 ~
C> C> C> ~
C> C> ~
~ ~
(liP) aplQ!ldwv
190
Appendix 4: Trial Ordering
APPENDIX 4
Trial ordering- Chapter 8.
The following sequence was looped to form a continuum.
Subject Trial I I Trial 2 I Trial 3 I Trial 4 I Trial 5 I Trial 6 I Az I click Az I noise I Az I chips El I click El I noise Ell chi~
2 El I click El I chips El I noise Az I click Az I chips Az I noise 3 Az I noise Azl chips Az I click Ell noise Ell chips Ell click
4 Ell noise Ell click Ell chips Azl noise AzI click Az/chi~
5 AzI chips Az I click Az I noise Ell chills Ell click Ell noise
6 Ell chips Ell noise Ell click Az I chips Az/noise Azl click 7 El I click El I noise El I chips Az I click Az I noise [ Az I chips
8 Az I click Az I chips Az I noise El I click El I chips El I noise
9 El I noise Ell chips Ell click Az I noise Azl chips Azl click 10 AzI noise AzI click I Az I chips Ell noise Ell click El/chi~
II Ell chips Ell click Ell noise AzI chips AzI click Azl noise 12 AzI chips Az I noise Az I click Ell chips Ell noise I Ell click
191
Appendix SA: Subject Instructions
APPENDIX SA
Subject Instructions
You will be listening to sets of sounds (clicks) over the tubephones (the experimenter will assist you with inserting the tubephones). 6 sets will be presented in total- 3 now and 3 after a 10 minute break.
In front of you are your first 3 sets of response sheets. These are either for making "azimuth" or "elevation" judgements (the experimenter will tell you which ones you have and will demonstrate what they represent). Each set is made up of 25 click sounds which are spaced 5 seconds apart. Note that each response sheet comprises 25 individual small diagrams. Use a separate diagram for each of the 25 clicks. Make your response by placing a cross anywhere in or on the circle, that corresponds to where you think the sound is coming from. Try to judge each sound as quickly and accurately as possible. If you have difficulty, then make the best guess you can. If you do miss a sound, leave the diagram blank and move onto the next one.
Whilst you are listening it is important that you keep your head still and fixed on the cardboard spot on the wall in front of you. You can move your head to make your response, but return your head to the central position as quickly as possible. Remember that the sounds are only 5 seconds apart, so you will need to make your response fairly quickly.
When you are ready, the booth door will be closed and the sounds will begin after a few seconds. The door is not locked and you are free to leave at any time should you feel uncomfortable.
192
Appendix 5B: Subject Instructions
APPENDIX 58
You will be presented with 54 clicks which are spaced at 4 second intervals. You must listen carefully and try to judge the location of the clicks as accurately as possible.
You should try to match your perception of each sound with one of the 9 target locations shown on the diagram in front of you. Then you must record that position next to each stimulus number on your response sheet (numbered 1 to 54).
The stimuli are spaced at 4 second intervals, so you will need to make your response fairly promptly and prepare for the next sound. Try to keep your head as still as possible and pointing straight ahead whilst you are listening to the sounds.
The booth door is not locked and you may leave at any time if you feel uncomfortable.
193
Appendix se: Subject Instructions
APPENDIX se
You will hear sets of white noise bursts (a "shhh" sound) which are I-second in duration. You must try to judge the location of each of the sounds as accurately as you can. You should make your response by placing a cross on the diagram in front of you - a separate diagram should be used for each judgement you make.
There are two sets of judgement tasks - 'azimuth' and 'elevation'. For azimuth, there are 28 noise bursts in total and your response sheet has 28 corresponding diagrams. For elevation, there are 56 sounds in total, which again match the number of diagrams on your response sheet. Your experimenter will inform you which of these two tasks you will complete first. When the first trial is over, you will be given the response sheet for the second set of sounds.
It is imperative that whilst listening, you keep your head still and facing straight ahead. You can move to record your responses, but should return to a forward-facing position as soon as possible. The sounds are spaced at 4-second intervals, so you will need to make your responses quickly.
Try to respond to all sounds, and if you have difficulty then make the best judgement you can.
The booth door is not locked and you may leave at any time should you feel uncomfortable,
194
Appendix 5D I: Subject Instructions
APPENDIX 50.1
For the categorical response method.
You will be presented with 4 sequences each comprising 25 sounds. The 4 sequences will each have different intervals between the sounds. For some sequences the sounds will be presented very quickly, for others the interval between sounds may be longer. Whatever the spacing, it will be regular for the whole sequence.
You must try to judge the location of each sound source. You should keep your head still and pointing straight ahead whilst you listen, although you can move your head to record your response.
You should try to match, as accurately as possible, your perception of the sounds with one of the target letters that represent positions - shown on the diagram in front of you. Then you must record that position (letter) next to each stimulus number on your response sheet, which is numbered from 1 to 25.
Once you have heard 25 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the delay between sounds will be different. You will receive 4 response sheets in total.
The booth door will be shut before the experiment commences. However, it is not locked and you are free to leave at any time should you feel uncomfortable.
195
Appendix 5D2: Subject Instructions
APPENDIX 50.2
For the 'free' (non-categorical) response method.
You will be presented with 4 sequences each comprising 25 sounds. The 4 sequences will each have different intervals between the sounds. For some sequences the sounds will be presented very quickly, for others the interval between sounds may be longer. Whatever the spacing, it will be regular for the whole sequence.
You must try to judge the location of each sound source, as accurately as possible. You should keep your head still and pointing straight ahead whilst you listen, although you can move your head to record your response.
For the first stimulus you hear, put a "1" on your response diagram that matches where you think the sound came from. For the second stimulus you hear, put a "2", and for the third, a "3" etc. up to "25". If two stimuli appear to come from the same place, just put the number underneath/next to the first number, to form a diagonal (the experimenter will explain this more fully).
Once you have heard 25 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the delay between sounds will be different. You will receive 4 response sheets in total.
The booth door will be shut before the experiment commences. However, it is not locked and you are free to leave at any time should you feel uncomfortable.
196
Appendix 5El: Subject Instructions
APPENDIX 5E.1
For the 'free' response method. The red text was omitted for subjects not using the guidance ring in the booth.
Thank you for taking part in this experiment.
You will be presented with a set of 14 sounds (either clicks, white noise, or the word "chips"). You must listen carefully and try to judge the location of the sound source as accurately as possible. It may help to close your eyes whilst listening and it is imperative that you keep your head still and pointing straight ahead at the 0° mark at the onset of each sound. Using the ring surrounding you as guidance, you should write your response on the diagram in front of you.
For the first stimulus you hear, put a "1" on the diagram that matches where you think the sound came from (there is no need to put your response next to one of the markers). For the second stimulus you hear, put a "2", and for the third, a "3" etc. up to "14". If two stimuli appear to come from the same place, just put the number underneath/next to the first number, working in towards the centre, so that the location is the same, but the distance from the head gets closer (distance is not a variable in this experiment) - see diagram below.
The stimuli are spaced at 5 second intervals, so you will need to make your response fairly promptly and prepare for the next sound.
Once you have heard 14 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the stimulus sound will be different.
There will be 3 azimuth trials and 3 elevation trials. The experimenter will clarify the response procedure and trial details with you as the experiment progresses.
The booth door is not locked and you are free to leave at any time should you feel uncomfortable.
197
Appendix SE I: SUbject Instructions
350 0 10 340 20 30
14- 40
2-310 S 50
b
280 80
270 0 90
12-
260 100
110 11
~ \3 p
130 230
9 140 'P 3
200 160 190 180 170
198
Appendix SE.2: Subject Instructions
APPENDIX 5E.2
For the categorical/guided method of response.
Thank you for· taking part in this experiment.
You will be presented with a set of 14 sounds (either clicks, white noise, or the word "chips"). You must listen carefully and try to judge the location of the sound source as accurately as possible. It may help to close your eyes whilst listening and it is imperative that you keep your head still and pointing straight ahead at the onset of each sound. You should write your response on the sheet in front of you.
You should try to match, as accurately as possible, your perception of the sounds with one of the target positions shown on the diagram in front of you. Then you must record that position next to each stimulus number on your response sheet (1 to 14) (see diagram below for a specimen). The stimuli are spaced at 5 second intervals, so you will need to make your response fairly promptly and prepare for the next sound.
Once you have heard 14 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the stimulus sound will be different.
There will be 3 azimuth trials and 3 elevation trials. The experimenter will clarify the response procedure and trial details with you as the experiment progresses.
The booth door is not locked and you are free to leave at any time should you feel uncomfortable.
199
Appendix SE.2: Subject Instructions
Stimulus Response
1
2
3
4
5
6
7
8
9
10
1 1
12
13
14
200
Appendix SF: Subject Instructions
APPENDIX SF
Thank you for taking part in this experiment.
You will be listening 'live' to a laboratory setting and your task is to judge (as accurately
as you can) the location of a number of specified sounds within that setting. The list
below shows the order in which those sounds are played. For each stimulus you should
write down where you perceive the sound source to be located (by using the stimulus
number - I to 6). The sequence will be played twice and you will be told when to listen
for the second set. Two response sheets are provided to record your answers.
The stimuli will occur at 15 second intervals which should leave you plenty of time to
respond to each sound. It is important that you try to keep your head as still as
possible and pointing straight ahead whilst you listen. It may also help to close your
eyes whilst listening.
It is important to note that whilst the experiment is taking place, you are likely to hear a
number of other sounds. You should not make a note of these. The 'other' sounds may
include typing, phone rings, printer, door opening and talking. Try to concentrate on
and listen specifically for the sounds you must locate.
After the experiment you will be fully debriefed.
The booth door is not locked and you are free to leave at any time should you feel
uncomfortable.
You will now be played a tape of each of the sounds to familiarise you with them. After
this, the experimenter will come in to see if you have any queries. The experiment will
then commence.
P.T.O.
201
Appendix SF: Subject Instructions
Stimulus sounds to listen for:
Stimulus Number Sound
Metronome clicks (4 in total) ........................................... "2" .............. ···· .. ···· .. ············ .. I·········· .. ·H'aii·a .. crap(nil·q·ulck"successlo·iiy··········· ............................................ ")" .......... ···· .. ··_····· .. ················1···························"Xylopli"o·ii·e··C':(strnces)"""························· ············································4·················· .. ·······················1·····································Papec·tearl"ii-g·························· ........... .
·············································5···················· .. ·······················:···········Miije··vOlce·sayTii-g·ihe··word··;;-Eill"ps·';·········· ············································K·········································1····························s·iilicil·of"Reys··cii·itrl"ilg························ ..
202
Appendix 5G: Subject Instructions
APPENDIX 5G
Thank you for taking part in this experiment.
You will be presented with 6 sets of 14 sounds (white noise - a "shhhhhhh" sound). You must listen carefully and try to judge the location of the sound source as accurately as possible.
You should write your response on the sheet in front of you.
For the first stimulus you hear, put a "1" on the diagram that matches where you think the sounds came from. For the second stimulus you hear, put a "2" and for the third, a "3" and so on, up to "14". If two sounds appear to come from the same place, just place the number underneath/next to the first number,
. working in towards the centre in a diagonal (so that the position is the same, but the distance from the head gets closer (distance is not a variable in this experiment).
You will be given a new response sheet for each of the 4 trials.
The sounds are spaced 6 seconds apart, so you will need to make your response promptly and then re-position your head to the central position (yellow dot), fixing your eyes on the cross, and prepare for the next sound.
For two trials you will be asked to keep your head as still as possible whilst listening to the sounds. For two you should move your head to the right as soon as the sound begins (the experimenter will demonstrate this to you). For the other two you can move your head freely, to do whatever you feel helps you to judge the sound more accurately.
Reminders of the procedure will be given at the beginning of each trial to guide you.
Due to equipment faults, the first sound on every trial you do must be ignored. Start recording number 1 from the second sound you hear.
203
I
I