Absolute auditory object localization · ITD: KEMAR: KHz: LVP: lIT MVP: Pa: secs: SPL: YR: List of...

This item was submitted to Loughborough's Research Repository by the author. Items in Figshare are protected by copyright, with all rights reserved, unless otherwise indicated.

Absolute auditory object localizationAbsolute auditory object localization

PLEASE CITE THE PUBLISHED VERSION

PUBLISHER

© Emily Shotter

LICENCE

CC BY-NC-ND 4.0

REPOSITORY RECORD

Shotter, Emily. 2019. “Absolute Auditory Object Localization”. figshare. https://hdl.handle.net/2134/10833.

https://lboro.figshare.com/

This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository

(https://dspace.lboro.ac.uk/) under the following Creative Commons Licence conditions.

For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/

Pilkinglon Library

~ Lol;Ighh.orough • Uruverslty

Author/Filing Title ............... ?H.~:T.T.o;;7). .... ~ ...................... .

Accession/Copy No.

Vol. No ................ . Class Mark ............................................... .

0401524671

11111I 11111111 11111 .

I,

Absolute Auditory Object Localization

by

Emily Shotter

. ,., .

~ . .,..

'. A Doctoral Tlit<sis

Submitted in partial fulfilment of t~~requirements for the award of

~ ~. -.. , . . Doctor-of . Philosophy •

of Loughborougq pI]iversity •. . I

".' .... " .' 'Juiin997

"

© by Emily Shotter

Abstract

This thesis concerns the.potential use of auditory virtual reality (A VR) in safety-critical

situations. Localization accuracy is essential in many VR situations, such as simulated

cockpits, where vision is fully occupied and targets must be signified acoustically.

However, the errors .reported for localizing 3D sounds varies ·considerably in the

literature and some (e.g. Wightman & Kistler, 1989; Wenzel et aI, 1993) report fairly

large errors. This thesis consists of an evaluation of the use of acoustic cues to indicate

the location of certain targets.

A Knowles Electronic Manikin for Acoustic Research (KEMAR) was used to examine

the effects of individualized pinnae on localization accuracy. The results showed that

using our own pinnae over foreign pinnae provides little or no benefit. More

surprisingly, substantial errors were observed in this study. This initial result drove the

fundamental investigation into the large angle errors.

The method of eliciting subject responses was investigated. The findings established

response method as an important methodological feature in localization experiments

from the significant effect it has on the results. Error values can be halved when using

a categorical method, compared to an unguided (non-categorical) method, possibly

because it constrains the subjects' response options. A further possible constraint on

subject responses is the effect of memory in absolute judgement tasks. If the memory

of one sound impinges on subsequent sounds then the subject's judgement is

constrained and the measurement of error may be contaminated. This effect was

studied by introducing variable delays that should affect memory .(0 a different extent.

No obvious differences in accuracy were noted. This rules out 'interstimulus interval'

as a cause for the variability of reported angle errors.

Stimulus types were varied in an effort to maximise acuity. Although broadband

sounds are purported to give the smallest errors (e.g. Stevens & Newman, 1936;

Sandel et aI, 1955), this investigation offered a unique comparison of long and short

duration broadband and complex sounds. But consistently high angle errors forced the

inclusions of non-acoustic cues such as vision and head movements, which decreased

the error to between 0° and 7°.

III

The implicatioos for VR in light of the importance of vision (demonstrated by this

work), are that it is 'not advisable to implement an auditory cueing system that may

conflict or fail to be guided by vision. Where high levels of accuracy are required, as is

paramount in safety-criticalsituations, auditory localizatioo is not sufficient as a sole

cue to target location.

Scientific conclusion: The acoustic cues alone (independent of context) cannot support

accurate auditory localization.

Applications conclusion: It is not advisable to implement an auditory cueing system that

is not guided by vision.

I

IV

Acknowledgements

My sincere thanks go to Prof. Ray Meddis for all of his advice and support and for

being an excellent supervisor.

I wish to express my gratitude to everybody in the Speech & Hearing Laboratory in

Loughborough and the Hearing Research Laboratory in Essex; to Stuart Hunter for his

technical support, to Enrique Lopez-Poveda {or being a true companion. both

academically and socially, to Lowel O'Mardfor his expertise and help with all

computer-based problems and to Roel, for his company.

I would like to thank all in the Department of Human Sciences at Loughborough. For

all of the academic and administrative support I have received, especially during my

final year away. Particular thanks goes to the research students, not ·only for their high

expectations of me, which provided motivation and encouragement, but also for their

social succour.

My gratitude goes to all in the Department of Psychology at Essex University for

making me feel so welcome and for making available all facilities and technical support

I needed.

I would like to thank my family for always believing in me and for giving me the

freedom and opportunity to find my own way in my own time. Thank you for your

enthusiasm and optimism. And finally, to Toby, for his unswerving belief, support,

love and companionship.

,

v

Table of Contents

Abstract .....•.•........................•...........•........•.........•........ iii

Acknowledgements .•.....................•..........•........•..... : •......... v

Table of Contents ....................................................•......... vi

List of Abbreviations and Acronyms ......................................... xi

List of Figures .......................................•.......•................. xii

List of Tables .................................................................. xviii

CHAPTER 1

General Introduction ..............•.......•...•..•..•.........•.....•.. 1

1.1 Motivation ......................................................... 1

1.2 Objectives .......................................................... 2

1.3 Original contributions ............................................ 5

1.4 Overview of the thesis ............................................ 7

CHAPTER 2

Background and Literature Review ....•...•..•..••..........•.•..•.•. 12

2.1 Introduction to localization ....................................... 12

2.2 Pinna effects ....................................................... 15

2.3 Head movements ................................................. 18

2.4 Vision .............................................................. 19

CHAPTER 3

Methodologies .......................................................... 22

3.1 Introduction ........................................................ 22

3.2 Headphones and tubephones .................................... 24

3.3 Pinna moulding ................................................... 24

3.4 KEMAR recording procedures .................................. 27

3.5 Front-back correction ............................................. 29

vi

CHAPTER 4 The Role of the Pinna in Sound Localization ........................ 30

Abstract. ................................................................. 30

-Introduction ............................................................. 32

Method ................................................................... 34

Results ................................................................... 37

Discussion ............................................................... 41

CHAPTER 5 Localization Judgements in the Azimuthal Plane .................... 43

Abstract. ................................................................. 43

"Introduction ........................................... : ................. 44

Method ................................................................... 47

Results ............................ , ...................................... 50

Discussion ............................................................... 53

CHAPTER 6

Methodologies: Site of Recording, Playback Method and

Pinna Effects ........................................................... 55

Abstract. ................................................................. 55

Introduction ................................. " .......................... 57

Method ................................................................... 59

Results ............... ~ ................................................... 62

Discussion ............................................................... 70

CHAPTER 7 The Effect of Interstimulus Delay and Response Method on

Localization Accuracy .................................................. 73

Abstract. ................................................................. 73

Introduction ............................................................. 74

Method ................................................................... 77

Results ................................................................... 82

Discussion ............................................................... 91

vii

CHAPTER 8

The Effect of 'Stimulus Type and 'Response Method on

Judgement Accuracy .....•............................................. 93

Abstract. ................................................................. 93

'Introduction ............................................................. 95

Method ................................................................... 98

Resul ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 106

Discussion ............................................................... 108

CHAPTER 9

'Live' Relay using the KEMAR Manikin ............................. 112

Abstract .................................................................. 112

Introduction ............................................................. 114

Method ................................................................... 116

Results ................................................................... 119

Discussion ............................................................... 123

CHAPTER 10

Vision and Head Movements in Localization ........................ 125

Abstract .................................................................. 125

Introduction ............................................................. 127

EXPERIMENT 1 ................................................................ 130

Method ................................................................... 130

Results ................................................................... 132

Discussion ............................................................... 134

EXPERIMENT 2 ................................................................ 135

Method ................................................................... 135

Results ................................................................... 141

Discussion ............................................................... 154

CHAPTER 11

Head Movements using the Head Tracker ............................ 156

Abstract. ................................................................. 156

Introduction ............................................................. 158

VllI

Method ................................................................... 1-60

Results ................................................................... 164

Discussion ............................................................... ·167

CHAPTER 12

General Discussion and Conclusions ................................. 169

12.1 'Summary ......................................................... 169

12.2 Discussion and conclusions .............. " .................... 170

12.2.1 Individualized pinnae ................................ 170

12.2.2 Pinnae/no pinnae ..................................... 170

12.2.3 Stimulus type ......................................... 171

12.2.4 Response method .................................... 172

12.2.5 Visual stimuli ......................................... 173

12.2.6 Live/recorded stimuli ................................ 173

12.2.7 Head movements ..................................... 174

12.3 Implications for VR ............................................. 175

12.4 Proposals forfuture work ...................................... 176

References ..................................................................... 179

APPENDIX 1

Pinnae Photographs •...•......•.•............................................. 185

APPENDIX 2

Calculation of transmitted information .........................•............ 188

APPENDIX 3

Responses of the headphones and tubephones to a ·c1ick ..........•........ 190

APPENDIX 4

Trial ordering- Chapter 8 ....••.•........................................... 191

IX

APPENDIX 5

Subject Instructions ........................................................... 192

APPENDIX SA (Chapter 4) ............................................................ 192

APPENDIX SB (Chapter S) ............................................................ 193

APPENDIX SC(Chapter<i) ............................................................ 194

APPENDIX SD.I{Chapter7) .......................................................... 19S

APPENDIX SD.2 (Chapter 7) .......................................................... 196

APPENDIX SE. I (Chapter 8) .......................................................... 197

APPENDIX SE.2(Chapter 8) .......................................................... 199

APPENDIX SF.(Chapter 9) ............................................................. 20 I

APPENDIX SG(Chapter 11) ........................................................... 203

x

ANOVA:

AYR:

DAT:

dB:

dB SPL:

df:

ft:

g:

HRTF:

Hz:

ILD:

ISI:

ITD:

KEMAR:

KHz:

LVP:

lIT

MVP:

Pa:

secs:

SPL:

YR:

List of Abbreviations and Acronyms

Analysis of Variance

Auditory Virtual Reality

Digital Audio Tape

decibel

dB (re 20x 10-6 Pa)

degrees of freedom (from analysis of variance)

feet

grams

Head-Related Transfer Function

Hertz (cycles per second)

Interaural Level Difference

Interstimulus Interval

Interaural Timing Difference

Knowles Electronic Manikin for Acoustic Research

kilohertz

Lateral Vertical Plane

metres

Median Vertical Plane

Pascals

seconds

Sound Pressure Level

Virtual Reality

xi

List of Figures

CHAPTER 3

Figure 3.1: Diagram (not to scale) of the manikin in the centre of a wooden hoop (3

m in diameter), used to support the speakers in the horizontal plane. All speaker

positions were 'fully adjustable. The hoop was supported on wooden struts, which

slotted into heavy metal base units to stabilise and secure construction.

Figure 3.2: Diagram (not to scale) of the speaker set-up for median plane (elevation)

source locations. The arrow shows the direction of movement around the manikin.

The range of possible speaker positions was -'SO° to +3200 elevation (where (J' is

straight ahead at ear-level and 1800 is directly behind).

Figure 3.3: Front-back correction of sound source ("A") judged to be at position "B".

The judgement is first shifted to the opposite hemisphere ("C") then the angle error

from this new shifted position is 'calculated ("0").

CHAPTER 4

Figure 4.1a: Response diagram given to subjects for azimuth judgements (actual

size). The head and horizontal plane are viewed from above. A separate diagram was

used for each response and subjects were free to put the cross anywhere on or within

the circle. Distance was not a variable and was ignored in the results.

Figure 4.1b: Response diagram given to subjects for elevation judgements. The head

is shown in profile and facing the median vertical plane. One diagram was used for

each judgement.

"

xii

Figure 4.2: Mean angle ·error values for azimuth judgements for all subjects

combined. Data is both uncorrected and corrected for front-back azimuth errors.

Statistically significant differences were found between the uncorrected and front-back

corrected data, although no differences were present for the different .pinna conditions

(ANOVA).

Figure 4.3: Mean angle error values for elevation judgements. The results are both

uncorrected and corrected for front-back azimuth errors. No statistically significant

differences were found either between uncorrected and front -back corrected data, or

between the different pinna·conditions.

CHAPTER 5

Figure 5.1: Response diagram (actual size) given to subjects. For each stimulus

sound heard, subjects were forced to place their judgement at one of the speaker

locations (I - 9). Subjects recorded their actual responses on a separate sheet.

Figure 5.2: Matrix showing the total transmission scores for all 16 subjects in the

binaural headphone condition.

Figure 5.3: Information matrix showing the mean angle error values for individual

source positions. The frequency of response and sum error in degrees is given for each

stimulus. (Note that the position numbers (1 - 9) listed along the top and down the side

of the matrix are rotated through +900 to obtain only positive angle error values). The

mean error for each source position is given in the extreme right hand column, with the

total mean angle error shown below.

CHAPTER 6

Figure 6.1a: Response diagram used for azimuth trials. Subjects were instructed to

mark a cross at the point of perceived sound origin.

Figure 6.1b: Response diagram given to subjects for elevation trials. Sound source

locati9n was indicated by placement of a cross, anywhere on the perimeter of the circle

(distance was not a factor in this experiment).

xiii

i<'igure 6.2: The effect of the internal and external recording positions with headphone

and tubephone .playbackfor 'front-back corrected angle·errors. ±2 standard errors are

shown in each case.

Figure 6.3: Spectra of original (internal and external) stimuli with comparisons of

playback through headphones and tubephones.

Figure 6.4: Diagrams of original recording and playback positions. (Internal)

Original shows the microphone at the eardrum location of the manikin. External

original is the signal received by the manikin with the microphone at the meatus

entrance. The playback positions, to human subjects, show the tubephones - at the

eardrum, and headphones - close to the meatus entrance.

CHAPTER 7

Figure 7.13: Blank response diagram given to subjects for the non-categorical

response condition. Subjects marked a cross on a separate diagram for each sound

heard. The diagram is actual size.

Figure 7.1 b: Guidance diagram (half actual size) provided to subjects in the

categorical response condition. Subjects used the diagram in a forced-choice paradigm.

Figure 7.lc: Response sheet used in conjunction with the guidance diagram (Figure

7.1 b). Subjects indicated a response letter from the guidance diagram next to each

stimulus number. A new response sheet was provided for each interstirnulus delay

sequence.

Figure 7.2: Chart showing the mean angle errors of the categorical and non

categorical response methods, broken down into interstimulus delay time. Statistically

significant differences exist {p~O.OI, ANOVA) for all interstimulus delay times

between the two response methods, as shown by the ±2 standard error bars. There are

no significant differences between the different interstimulus intervals within each

response condition.

i

xiv

Figure 7.3: Errors by target angle for each of the interstimulus delay times for the

categorical response condition. No statistically significant differences were found for

judgement accuracy of each .target angle within each interstimulus interval condition

(ANOYA). Random response values (0° to 90° range) are given for the categorical and

non-{;ategorical response conditions to illustrate chance levels.

Figure 7.4: Errors by target angle for all interstimulus delay times for the non

categorical response condition. There were no statistically significant differences for

response accuracy of the target angles within .each interstimulus interval condition

(ANOY A). Random response values for a full 360° range of possible responses are

included to show·chance levels.

CHAPTER 8

Figure 8.1a: Blank response diagram for azimuth with 10° markings around the

circumference. For subjects in the non-{;ategorical condition, either with or without the

judgement aid (marker strip).

Figure 8.1b: Blank response diagram for elevation, with 10° markings. This was the

response sheet provided for the non-categorical response condition.

Figure 8.2a: Diagram of stimulus locations for azimuth. Each subject was provided

with this guidance diagram for reference throughout the categorical condition of the

experiment.

Figure 8.2b: Diagram showing the elevation stimulus locations. This diagram was

provided throughout the categorical response condition for reference.

Figure 8.2e: Response table used in conjunction with the categorical response

condition. The same sheet was provided for all azimuth and elevation trials (6 in total

per subject).

xv

CHAP"FER 9

Figure 9.1: Respense diagram provided to. subjects (I fer each pinna conditien). The

square represents the environment/room in which the stimuli are played, viewed from

above. The head shows the manikin's position at the centre of the room. The

dimensiens are net to scale and ne furniture or fittings are shown.

Figure 9.2: Mean angle errors (front-back corrected) for the two presentatien

cenditions; live and recorded, fer pinna and ne pinna. ±2 standard error bars are

shown.

Figure 9.3: Percentage of front-back errors for pinna and no pinna for the live and

recerded presentations. ±2 standard errer bars are shown. Altheugh the standard

errors are large, a statistically significant difference.(p~0.05, related t-test) was feund

between pinna and no pinna for the live cendition. There is also a small, statistically

insignificant (ANDV A, f = 3.32, df = 34) interaction between presentation method and

pinna condition.

CHAPTER 10

Figure 10.1.1: Movement ef the head when either restrained by a head clamp or held

still but unrestrained. (NB: The y-axis does not represent absolute angles). The

measurements for azimuth, elevation and roll dimensions were taken simultaneously by

the Head Tracker. ±2 standard error bars are included. No statistically significant

differences (p~.05, related t-test) between the different head restraint conditions were

feund.

Figure 10.2.1: Response diagram given to subjects in all cenditions. Beside each

stimulus number a response letter had to be recorded (A - G), according to the

perceived lecatien ef the sound source.

Figure 10.2.2a: Diagram given to subjects showing the speaker locations in the

herizontal plane. Speakers were spaced at 30° intervals. Subjects heard each sound

source and were required to choose one ef the letters, which represented actual target

locations.

xvi

Figure 10.2.2b: Diagram given.to subjects representing 2IJo speaker spacing in the

horizontal plane.

Figures 10.2.2a-d: 'Confusion matrices showing the pattern of ·responses for the

free-field condition for (a) 30" speaker spacing with head fixed, (b) 30" speaker

spacing with head free, (c) 20° speaker spacing with head fixed and (d) 20° speaker

spacing with head free.

Figures 10.2.2e-h: Confusion matrices for the ring playback condition for (a) 30"

speaker spacing with head fixed, (b) 300 speaker spacing with head free, (c) 2ff'

speaker spacing with head fixed and (d) 20° speaker spacing with head free.

Figures 10.2.2i-l: Confusion matrices showing the response pattern for the booth

playback condition for (a) 30° speaker spacing with head fixed, (b) 30° speaker

spacing with head free, (c) 20° speaker spacing with head fixed and (d) 2IJ0 speaker

spacing wi th head free.

CHAPTER 11

Figure 11.1: Response diagram (actual size) given to subjects. Subjects marked the

numbers 1 to 14 on the diagram - the total number of stimuli per trial. A new

response sheet was provided for each of the 6 trials.

Figure 11.2: Mean angle errors with the Head Tracker switched on and off {or the

three different head motion conditions ('still' but without restraint, a controlled

'movement to the right' of 45° and 'free' movement). A statistically significant

improvement for the 'move freely' condition with the Head Tracker On over all

conditions with the Head Tracker Off. There was also a statistically significant

improvement over the 'head move right' condition with the Head Tracker On (ANOV A,

f = 6.99, df = 2).

, j

xvii

List of Tables

CHAPTER 4

Table 4.1a: Analysis of variance for azimuth. There are no statistically significant

differences between the three different pinna conditions; own, nonindividualized and

individualized. Data is corrected for front-back errors.

Table 4.1b: Analysis of variance for (front-back corrected elevation data). No

statistically significant differences are found between the different pinna conditions;

own, nonindividualized and individualized.

CHAPTER 5

Table 5.1: Individual information transmission scores (in 'bits') for all 16 subjects.

CHAPTER 6

Table 6.1a: Mean angle error values for headphone presentation with pinnae/no

pinnae and internal/external microphone placement for both azimuth and elevation

judgements. Statistically significant results (ANOV A, f = 3.96, df = 27) are in red.

Table 6.1b: Mean error values for tubephone presentation of stimuli. Results are

shown for pinnae/no pinnae and internal/external microphone positions for azimuth and

elevation judgements. Statistically significant differences (ANOV A, f = 3.96, df = 27)

are given in red.

Table 6.2a: Analysis of variance table for azimuth with front-back confusions

corrected. "p/np" represents 'pinna' or 'no pinna' conditions (with or without), "hp/tp"

refers to headphone or tubephone playback method and "i/e" represents internal or

external microphone placement. Statistically significant effects are shown in bold type. t

xviii

Table 6.2b: Analysis of variance table for elevation, front-back corrected data.

"p/np" represents pinna condition (with or without), "hp/tp" refers to headphone or

tubephone playback method and "i/e" r~resents internal or external microphone

position. No statistically significant effects were-found.

CHAPTER 7

Table 7.1: Starting points for all ten subjects for both sequence (different

interstimulus delays) and stimulus position within the sequence. Note that the ordering

for 'sequence' is a quasi-random selection taken from the full range of 4~factorial

permutations. Thus, a different interstimulus delay is used at least once as a starting

sequence, then two randomly picked sequence orders were added to make 6 in total -

the number required to give the full range of stimulus start points.

Table 7.1a: Analysis of variance table for all interstimulus intervals combined

showing the differences between different response methods. A statistically significant

improvement is found for the different (category) response methods.

Table 7.1 b: Analysis of variance table for the 2-second interstimulus interval

condition. A statistically significant difference is found for response method (category)

but not between the different target locations.

Table 7.1 c: Analysis of variance table for the 5-second interstimulus interval

condition. A statistically significant improvement for the categorical response method

was noted.

Table 7.1 d: Analysis of vanance table for the 8-second interstimulus interval

condition. A statistically significant difference was found for response method only.

Table 7.1e: Analysis of variance table for the l2-second interstimulus interval

condition. Statistically significant improvements were found for the categorical

response method.

XIX

CHAPTER 8

Table 8.1a: Summary of overall mean angle-errors for front-back corrected azimuth

results. There are 9 different subjects in each condition: non-categorical (with reference

marker strip in the sound booth, but 'for azimuth only), non-categorical '(with no

reference marker strip) and categorical.

Table 8.1b: Overall mean angle errors for elevation trials. The subjects are the same

in each condition as those in the azimuth trials. Responses are corrected for front -back

errors. The unusually large angle errors are little better than·chance.

CHAPTER 9

Table 9.1: Analysis of variance for Jive and recorded presentations for pinna and no

pinna. No statistically significant effects were found, although the interaction between

pinna effects and presentation method was only marginally insignificant.

CHAPTER 10

Table 10.2.1a: Analysis of variance for all stimulus types combined for -different;

speaker spacings ("spacing") - 20° and 30°, playback locations ("place") - free-field,

ring and booth, and head restraint conditions ("movement") - fixed (clamped) or free.

Statistically significant results are shown in bold type.

Table 10.2.1b: Analysis of variance for the speech stimulus "Chips" for the two

different speaker spacings, three playback locations and two head restraint conditions.

Statistically significant results are shown in bold type.

Table 10.2.1c: Analysis of variance clicks, for the two speaker spacing conditions,

the three playback locations and two head restraint conditions. Statistically significant

results are shown in bold type.

Table 10.2.1d: Analysis of variance for the noise stimulus for the different speaker

spaciIjgs, playback locations and head restraint conditions. Statistically significant

results are shown in bold.

xx

Tables 10.2.2a-c: Mean angleenor values for thefree"field condition are shown in

10.2.2a. Averages are broken down into speaker spacing (30° and 20°), .head restraint

(either fixed in a clamp or {ree to move), and stimulus type ("Chips", Clicks and

Noise). Identical breakdowns are given for subjects listening to pre-recorded stimuli in

the original recording set-up (i.e. with a visual correlate) in (10.2.2b) and for pre

recorded stimuli .played back in the booth (10.2.2c).

Tables 10.2.3a-c: Information analysis for the free-field (a), ring playback (b) and

booth playback (c) conditions. Breakdowns are given in terms of speaker spacing and

head motion conditions. Transmitted information is in bits and the corresponding

number of reliably identified positions, from the total of 7, is also shown.

CHAPTER 11

Table 11.1: Analysis of variance table showing mean angle error values. Head

Tracker status represent the Head Tracker On or Off. Motion refers to the three head

movement conditions; head still, controlled head movement to the right and head

allowed to move freely.

XXI

Chapter 1: General Introduction

CHAPTER 1

General Introduction.

1.1 MOTIVATION

Auditory virtual reality (A VR) is an integral part of the new simulated cockpit

environments. However, within these virtual reality (VR) set-ups, the localization of

auditory stimuli has been poor. This is a problem that cannot be afforded in such

safety critical situations where auditory cues convey critical information, such as

enemy target location or instrument warnings. Auditory information is .particularly

important during high-speed flight since vision may be almost fully occupied, making

sound a critical cue for directing attention. However, there is a real danger of the

signal being misjudged and confusing the user·(rather than aiding them), given the

problems users have had in localizing the signals. There is clearly the need for a

fundamental evaluation of the use of acoustic cues to indicate the location of certain

targets.

The basic aim is to ·establish the accuracy that can be expected in VRsystems,

through localization studies. Auditory localization is the process of determining the

position of sound sources in our external environment using auditory cues. Such a

phenomenon occurs daily as a matter of course, but is usually linked to visual

processes. Thus when we hear a bird singing in a tree, we will often direct our visual

gaze in the rough direction of the song (which is based very much on expectations and

experience as well as our ability to localize the sound) and use our eyes, not our ears,

to pinpoint the source precisely.

Although a significant amount of localization research exists, the errors reported vary

considerably between studies and there is little consistency in the literature. Initially

there is the need to try and resolve some basic issues, like the large mean angle errors ,.

repeatedly obtained by some researchers and the substantial rate of front-back

1

Chapter I: General Introduction

confusions. These fundamental problems must be clarified before addressing the

complex applicati<ms of 3-dimensional sound simulation.

To study auditory localization as a separate phenomenon, hearing must be separated

from vision in an experimental situation. This .is typically done either -by use of a

blindfold, -by presenting sounds that are simulated directly through headphones, or by

using sounds that have been pre-recorded and which are played back in an

environment that does not replicate the experimental 'set-up.

To overcome the problem of misjudgements and front-back confusions, a recently

developed piece of equipment called a 'head tracking device' can be incorporated into

virtual displays of this kind. A Head Tracker is able to monitor movements of the

head and adjusts the auditory cue that is sent to the listener. The listener then

perceives location of a signal altered in accordance with their head movement,

producing a realistic ,response. So far no experiments have investigated the

effectiveness of head tracking devices - a step that will provide valuable comparisons

of 3D audio and head tracked 3D audio !out of the head' sensations - highlighting

which problems (if any) are solved by the introduction of head movements. These

will be of significance not only {o VR set-ups, but to the auditory field as a whole.

1.2 OBJECTIVES

The primary objective is to evaluate the role of spectral cues for localizing sounds.

Attempts will also be made to identify and investigate the fundamental variables

underlying the localization process. The literature does contain a range of localization

research, although within it there are many contradictions, omissions and questionable

methodology in terms of its application to everyday auditory events. This research

will attempt to rectify and resolve many of these problems, some of which are

outlined below, in an attempt to demonstrate what is realistic and achievable in

'virtual' auditory simulations.

One method of generating VR sounds, to produce a 3D localized image, is with a

KEMAR (Knowles Electronics Manikin for Acoustic Research). Another method , uses so-called 'head related transfer functions' (HRTF's) which are discussed later.

2


However, manikin recor-dings are a more-direct method of reproducing the sound and

are therefore more likely to retain what may ·be vital signal content. They also

provide an accurate means of studying the usefulness.ofthespectral content of signals

to localization by retaining realistic spectl·al·profiles in·both ears (Durlach et al; 1992;

Hartmann & Wittenberg, 1996). The KEMA'R is an artificial figure of a head and

torso, with -removable pinnae and artificial ear canals, which are simulated using

Zwislocki Couplers. These are metal ear canal extensions that fit inside the ear

recesses of the manikin and have fittings for microphones to be attached at the

eardrum position. These microphones record stimuli.played from speakers placed at

various locations around a room.

Using this technique, a number of methodological issues will be investigated that may

have a substantial effect on localization accuracy.

• A fundamental aspect of this resear{;h is to reproduce everyday listening

conditions where possible. Thus, all experiments will be conducted in a normally

reverberant large room. A highly reverberant environment would make

localization very difficult due to confounding incident and reflected waves, and an

anechoic environment would cut out valuable (and typical) reflection and

reverberation cues.

• For each experiment a fairly large sample of untrained subjects will be used in an

attempt to produce results representative of the normal population. These contrast

with other auditory experiments with a small number·of subjects

• Pinna-based spectral cues have been demonstrated to play an important role in

elevation discrimination..(e.g. Gardner & Gardner, 1973; Lopez-Poveda, 1996)

and by some researchers it is also deemed important for azimuth judgements

(Freedman & Fisher, 1968). However, it is still unclear whether the pinna are

important for azimuth discrimination as well as elevation. Also, the fundamental

question of whether we need our own pinnae to accurately localize sound remains

unaddressed. The primary aim is to asses a listener's ability to localize with their

own pinna, another person's pinna or with no pinna at alL

• Another major objective is to resolve the problems surrounding response method.

The means of eliciting subject responses varies a great deal in localization studies,

as does the methodology in generaL Siegel & Siegel (1972) highlight two basic

3


ways of measuring accuracy; allowing subjects to either respond with their own

'free' judgement, or to choose from a number of given categories. But within

tbese types, methodologies vary considerably and do not allow for direct

comparisons and thus an understanding of the effect ofresponse method . .Because

using categories involves a great deal of constraint and guidance, the method of

response has apotentiaily hrge influence on judgement accuracy and requires a

controlled comparison. Tied in with response method are the effects of·different

speaker spacings and total angular arc containing the sound sources.Shelton·&

Searle's (1978) results demonstrated these factors to be important, but are unique

in doing so. Therefore the effect of different speaker spacings will also be

examined.

• The different stimulus types used in localization studies may have a large effect.

But from much of the literature, comparing different stimulus types directly is

difficult since it is impossible to disentangle their effect from other

methodological features. A controlled comparison of stimulus types is essential,

therefore, in attempting to provide the most easily identified signal in terms of its

position.

• The effect of head movements will be investigated using another method of sound

simulation. The new method uses HRTF's, which convolve audio stimuli with

pairs of filters that include ITD's, lLD's, ·pinna and ear canal effects. These effects

are measured as the sound enters the ear from a variety of locations. Responses at

resolutions 'finer than those actually measured are estimated using a linear

interpolation technique. Thus the convolvotron produces the same 'surround

sound' effect, without need for making recordings and variables such as stimulus

duration and target locations can be more readily manipulated. However, this

method also adds two computational steps that are unnecessary for manikin

recordings - deriving tbe HRTF and generating new stimuli using these

functions. Nevertheless, using stimuli generated in this way allows a Head

Tracker to be incorporated into the convolving equipment to take account of head

movements. This constitutes the final stage of the research, and it will be of

extreme value to compare the HRTF data with the data from manikin recordings.

• Finally, a comparison of visually aided and visually unaided localization will be

carried out. Throughout, the project aims to asses localization acuity without the

aid of vision - whether an acoustic stimulus provides powerful enough cues to

4


location to be used on its own. Yet if the apparent accuracy of everyday abilities

cannot be matched, a visual element may need to·be introduced. It may well be

that attempting to study auditory-localization alone is akin to separating taste and

smell, in which case its function may be fundamentally reduced if used as a sole

cue. Attempts to measure the effect of vision will be conducted in a freecfield

setting, with and without a visual Enk to the sources of-sound.

1.3 ORIGINAL CONTRIBUTIONS

This thesis offers a number of contributions to the field of auditory localization. The

main areas are outlined below.

• Localization in a normally reverberant-environment is studied. Most studies using

pre-recorded stimuli are either conducted in an anechoic chamber or a sound

deadening room. Others have been conducted in the open air with few reflecting

surfaces nearby, to deliberately reduce echoes. This is because researchers are

generally interested in providing an uncluttered signal for localization. But this

does not represent what normally occurs. Whilst the incident wavefront is

paramount for accurate localization, the reflected waves play an important part in

determining distance and intensity (for example) and form part of our expected

and experienced auditory environment.

• To take account of the complex filtering characteristics, pinnae modelled on a

human subject are used in place of those supplied with the manikin. The

manikin's pinnae are standardised - a characteristic rarely true of an individual's

pinnae and it is of more value ·to use a genuine set of ears, that can be modelled

using a rubber-based substance called Otoform t.

• A thorough breakdown and exploration of the recording-playback relationship and

recording techniques is given in the thesis. This partly involves examining

different stages of the recording process for possible loss of important

information. It also covers the effects of procedural and presentation differences.

I See Chapter 3, Pinna Moulding section.

5


• An investigation into interstimulus .intervalprovided important information about

the constraints -<If memory in absolute judgement tasks. A study by Siegel &

Siegel (1972) argued ·that any memory of a cprevious sound would contaminate

subsequent judgements and the task would no longer -be absolute judgement. In

this thesis, by manipulating the delay between sounds and noting the .effect on

accuracy, it was -possible to establish whether some studies might show increased

accuracy as a-result of constraints imposed by memory of previous stimuli.

• The concept of .eporting localization accuracy was examined by comparing the

commonly-used 'angle error' measurements with an information theory approach.

Angle error gives an average error value, whereas the information transmission

rate is used as an ·estimate of the maximum number of locations which could be

used without confusion. Information analysis may also be important if the

significance of the information coming from a sound source was partly defined by

its location.

• Controlled comparisons of response method and stimulus type and use of realistic,

familiar stimuli and fairly long-duration stimuli are much needed areas of

investigation. Although it is widely accepted that signals with a broad range of

frequencies provide increased localization cues (e.g. Stevens & Newman, 1936;

Sandel et ai, 1955; Wightman & Kistler, 1993), there are a number of broadband

and complex stimuli available. The auditory stimulus itself forms a fundamental

part of every localization experiment and yet there are no genuine indications of

the effects these may have.

• A unique evaluation of the equipment utilised in VR auditory simulations was

conducted. Whilst many studies report the effect of head movements or lack of

head movements, these are usually conducted in free-field situations, which do not

accurately represent VR environments (e.g. Stevens & Newman, 1936; Makous &

Middlebrooks, 1990). Here, 'head-tracked' 3D audio sound was directly compared

to non head-tracked 3D audio, for sounds generated using HRTF's. This

equipment very much represents the technology used in VR systems and although

this is a fast-moving environment, necessary refinements and implementations are

suggested.

6


• Visual cues are .incorporated towards -the.end of the thesis. Although visual cues

·had deliberately been omitted throughout, itbecame·ciear that acoustic cues were

insufficient·to support accurate localization exrept in a -context. Such a context is

typically established by vision and so the effects of a visual correlate were

investigated.

1.4 OVERVIEW OF THE THESIS

A genet"al background and review of the literature covering the area of localization is

in Chapter 2. This provides a framework for the thesis by highlighting the current

status of the literature in several areas of localization research. However,

identification and investigation of the unresolved elements or unidentified variables is

reported in the individual experiment chapters. Chapter 2 not only covers recent

work, but introduces the concept of localization and the development of many current

theoretical issues.

A detailed explanation of several methodological procedures, common to the majority

(if not all) experimental chapters is given in Chapter 3. Firstly, the process of

moulding individual pinnae is described. So-called 'individualized' pinnae can be

used to replace the standard pinnae provided with a KEMAR manikin. Modelling the

pinna involves not only manufacturing a perfect replica of the outer ear, but fitting the

model to the manikin and maintaining the -correct dimensions where necessary.

Secondly, KEMAR recording procedures are described. For all but one experiment,

recordings are made using the manikin. Recordings involve placing the manikin in a

normally reverberant large room amongst a number of speakers. These sound sources

are arranged in the horizontal and sometimes median vertical plane. Typically

between five and nine 'Sound sources are used, although the stimuli and other

variables may change. Since the construction of the recording set-ups is generally

consistent, this chapter gives the reader descriptions, and in some cases diagmms of

the apparatus and equipment used. Finally, the issue of front-back correction is

covered. This occurs during the analysis phase of an experiment and can have a large

impact on the results. A full description of the concept of 'angle error' and front-back

correction for azimuth and elevation is given.

7


The initial investigation is concerned with the mle ·of the pinna (or outer ear) in our

ability to judge the locus of a ·sound source -(Chapter 4). T.hisfundamental issue is

critical to A VR which typicaUy uses nonindividualized HRTF's. Thus, measurements

are taken based on another person's .pinnae, which is the most .practical method since

individual measurements would be gf{)ssly inefficient. However, if using

non individualized pinnae greatly reduces the· potential accuracy of judgements, then

this could have serious implications for systems implementing this method of

generating sounds. An experiment comparing individualized with nonindividualized

and even no pinnae is critical in order to establish the necessity for our own ears.

Manufacturing individualized pinnae is done by taking moulds of the pinna of five

subjects and making identical sets of manikin recordings using the different pinnae.

Each listener is asked to judge the apparent location of the recorded sounds using

either their own pinna or the pinna of the four other listeners. In addition, judgements

are made from recordings made with no pinnae - flat surfaces fitted into the manikin

ear recesses, with a hole at the meatus entrance position. If subjects are significantly

more accurate with their own pinnae then this could have costly implications for the

future of Virtual Reality simulations that rely heavily on auditory cueing.

The errors obtained are surprisingly large in this study and even the use of

individualized pinnae does little to improve matters. A more extensive examination

of the fundamental elements underlying localization is required. But to begin with, it

is necessary to establish exactly how many positions in the horizontal plane could be

identified without confusion, using acoustic·cues alone. This would be particularly

important if the significance of the information coming from a sound source was

partly defined by its location. The problem is approached in Chapter 5 using

information theory (e.g. Attneave, 1959; Edwards, 1969) where listeners are asked to

identify the location of a pre-recorded broadband click which is presented over

headphones or 'in-ear' tubephones (which deliver sound to the tympanum). The

information transmission rate was obtained by asking listeners to judge the location of

the sound from a fixed number of available response choices. The value in bits can be

converted into an estimate of the maximum number of locations which could be used

without error. Information analysis further identifies the response accuracy at each

source location, thus highlighting location-dependent response patterns.

8

,Chapter 1: General Introduction

Some of the issues raised in Chapters 4 and 5, in additi-on to a ·number of new

methodok>gical questions, fonu the·basis of"<:hapters 6 and 7. In an attempt to reduce

the large number of errors, Chapter 6 expiores the techniques used for recording and

playback. The most common and simplest way10 make a recording using a KEMAR

manikin is to use microphones attached to a Zwislocki coupler at a location

corresponding to -the eardrum. The Zwislocki coupler simulates the ear canal.

However, if playback is· through headphones, this creates a mismatch between the site

of recording and the site of playback. In effect, this method causes the sound to pass

through the concha and meatus twice. This problem can be solved in one of two

ways. The playback can be through tubephones, in which case the sound is delivered

close to the tympanum and matches the recording site. Alternatively, the recording

can be made using small microphones placed at the ·external entrance to the meatus

and close to the headphone playback site. Both approaches are explored. Results

from an earlier chapter also drive further investigation into the role of the pinna.

Interstimulus interval and response method are two factors that may have an

important effect on absolute judgement accuracy. Absolute judgement measures the

ability of listeners to judge the position of discrete, isolated sounds. Yet reported

studies rarely examine the extent to which the memory of a previous stimulus can

affect subsequent judgements. This may occur where a response to one stimulus

constrains the response to a subsequent stimulus, thus artificially reducing error

values. This concept was first introduced by Siegel & Siegel (1972). The point at.

which one the memory of one stimulus may interfere with another involves the so

called 'interstimulus interval'. This is the delay between the individual sounds in a

sequence. In Chapter 7 the interstimulus interval is varied from one sequence to the

next and judgement accuracy is examined. If shorter interstimulus delays show

marked increases in judgement accuracy then this would indicate a threshold below

which memory has a strong influence. This would impose constraints on studies of

absolute judgement, by setting a minimum interstimulus interval.

A comparison of the method of eliciting subject responses in a number of reported

studies revealed that for studies using a forced-choice or categorical method, the

apparent accuracy was generally lower than for studies using no guidance or

categories to choose from. Nevertheless, these studies are not directly comparable

since the methodology varies considerably. This experiment offers a controlled

comparison of different response methods.

9


Chapter 8 further investigates the powerful effect that response method was found to

.produce in the previous chapter. This is combined with a .unique comparison 'of the

judgement accuracy of different stimulus types. ~roadband -or complex sounds were

chosen in an attempt to reduce the high lDcalizationerror and ascertain the optimal

signal type for use in Virtual Reality displays. Whilst two of the sounds (clicks and

white noise) are used widely in localization studies, a complex and relatively long

duration speochsound{the wor<l "chips") is rare - vowels and vowel complexes are

more common. 'However, the familiarity and experience of complex speech sounds

should promote maximum acuity, helping to reduce the consistently high errors

obtained, particularly when judging elevation.

The results of Chapter 8 lead {Q a more thorough investigation of the sound

reproduction process, in an attempt to pinpoint a possible source of high angle errors.

Although a number of variables have been examined in an attempt to reduce error

values, few of these refinements have had any effect. Here, the recording process is

eliminated as a source of error by conducting a 'live' relay of the sound through the

manikin, to a subject seated in a remote location and listening through hi-fidelity

tubephones. As a control, these 'live' trials are also recorded and played back to

subjects from a tape, but in identical conditions.

The pinna, whose role for judgements in the horizontal plane remains unresolved, is

also investigated in this chapter. The manikin is therefore fitted with either no pinnae

or nonindividualized pinnae - a pair previously modelled on a human volunteer and

used throughout the thesis for nonindividualized pinna conditions.

The results show that the recording process does not lose any information, since the

accuracy of judgements remains consistent. It therefore appears that the physical

characteristics of the signal are not utilised by the listener sufficiently to ·obtain

adequate localization cues. Therefore, Chapter 10 incorporates two major factors that

have previously been omitted from all experiments - vision and head movements.

However, it has been necessary to exclude these factors so far in order to ascertain the

importance solely of spectral information.

Chapter 10 outlines two experiments. The first is concerned with the possibility that

for sounds recorded on a manikin and played back over headphones in a booth, any

head movements made by the subject will confound the signal. This may be a cause

of inflated angle errors and so must be investigated prior to fully incorporating head

10


movements. A Head Tracker is used .to monitor the movement f<lr a restrained

(clamped) and unrestrained still head. The results showed that the range of movement

for a clamped and non-clamped still head were almost identical, implying that small

head movements made by subjects in the booth would not affect judgement accuracy.

Experiment 2 evaluates the effect of either having a restrained, clamped head or being

able to move the head freely whilst listening. A comparison is also made between

providing a visual link to the sound sources and listening to sounds with no visual

correlate. Stimuli are either played in the 'freecfield, to assess the role 'of head

movements and to include vision, or they are presented in a booth to eliminate the

visual element.

Findings from the free-field investigation provided the motivation for Chapter 11.

The role of head movements, which was not adequately resolved in··the free-field

study, is subjected in this chapter to a more valid and rigorous investigation.

Localized sounds are generated in this experiment using HRTF's, not manikin

recordings. Head movements are incorporated by using a magnetic head tracking

device. This is a more faithful representation of the technique used to generate and

present sound in Virtual Reality simulations. Three conditions are used to evaluate

the effectiveness of head movements using such equipment. Subjects are able move

their head freely as desired in the first condition. The second requires subjects to

make a controlled, specified movement, and in the third the head is kept still

(although it is not physically restrained). For all of these conditions the Head Tracker

is either switched on or off to compare the effect of accounting for, or failing to

account for, different movement patterns on localization acuity.

The final chapter (Chapter 12) summarises and concludes the work covered in the

thesis. A number of issues raised by the thesis are discussed and some strengths and

weaknesses are identified. Some outstanding areas of investigation are identified and

suggestions for approaching these problems are given. The issues raised must be

resolved in order to gain a complete understanding of the psychological and

physiological factors involved in sound localization.

11

--------

Chapter 2: Background and Literature Review

CHAPTER 2

Background and Literature Review

2.1 INTRODUCTION TO LOCALIZATION

With the advent of 'virtual reality' information systems such as simulated cockpit

displays, comes the need for accurate simulation of auditory information as well as

the more obvious visual elements. Perhaps the most-fundamental auditory process in

such systems is localization of sounds, an everyday process of locating sounds within

our e~ternal environment. Localization may be used either to direct visual gaze and

complement fully the visual experience, or as a sole cue for warning or information.

Within current VR systems, localization has presented some problems with accuracy

falling short of apparent 'real life' capabilities. This is perhaps because the subtleties

of localization are either taken for granted or overlooked. We still do not have a

complete knowledge of all of tbe variables involved in localization.

One of the earliest theories of sound localization was first introduced by Lord

Rayleigh (1907). He recognised that if the wavelength of a sound was short relative

to a listener's head, then there would be a 'head-shadow' effect. This 'shadow' would

be cast causing a difference in level between the ear closest to the sound and the ear

opposite the sound - an 'interaurallevel difference' (lLD's). He also noted that the

distance between the two ears would vary, causing 'interaural timing differences'

(ITD's). Rayleigh conducted an experiment with tuning forks and discovered that at

low frequencies a listener was more sensitive to ITD's. This is because the

wavelength is long enough to refract around the head, leaving minimal ILD's. He

thus hypothesized that our ability to localize is governed by ITD's at low frequencies

and ILD's at high frequencies - an idea known as the "Duplex Theory".

12


A study by Stevens & Newman (1936) ·confirms Rayleigh's findings. They

investigated localization of pure tone bursts on the roof of a building -.the 'Iocation

being chosen to minimise reverberation, giving a more anechoic-type environment but

in free-fiekl conditions. Subjects were required iO estimate the location of the sound

source in the horizontal plane .for a variety ·of frequencies, and results showed that

whilst sounds in the same location in front and behind were often indistinguishable,

left-right judgements were usually reliably accurate to an average error of ±14°.

Stevens & Newman noted that larger errors were rare at very low or high frequencies,

but in the midrange (around 3000 Hz), the error rate rose, indicating two different

mechanisms for sound 10calization; one functioning at very low range frequencies

and the other at high range frequencies, but with neither effectively operative in the

midrange.

Sandel et al (1955) have confirmed such findings regarding midrange inaccuracies,

when using an anechoic chamber. They expanded on the median range issue by

arguing that errors tended to occur between 1500 and 5000 Hz and that the greatest

errors occur at 1500 Hz, not 3000 Hz.

These timing and level differences between the two ears of the incident wavefront

form part of the fundamental framework of directional hearing cues. However, they

are by no means the sole constituents of our ability to localize. Timing and level

differences alone only allow a left-right in location to be perceived. However, these

left-right locations will be heard intracranially (or 'inside the head'), because timing

and level differences alone do not produce an externalised image. The factors

involved externalisation are discussed below. This intracranial perception is known

as lateralization, and differs from localization where the sounds are heard

extracranially, or 'outside the head'.

Most early studies of localization attempted to simulate directional sounds over

headphones by implementing ITD's and lLD's. However, they only achieved

lateralization. In 'real' listening conditions, the sounds are filtered by the head, torso

and outer-ears, or 'pinnae' which causes subtle changes to the stimulus that must be

accounted for in order to produce simulated 3-dimensional sounds.

Batteau (1967) recorded sounds using a metal tube (the width of a head) with

microphones at either end, representing the two ears. The bar was either fitted with

moulded pinnae or with nothing at all. When the recorded signals were played to

I3


subjects over headphones, the pinna con<!ition produced localized sounds, but where

nothing was used, the sounds were typically lateralized and very poorly judged in

absolute terms. Thus the pinna appears to contribute not only to externalisation, but

also to the general localization of a sound source.

Durlach et al (1992) outline other factors that contribute to externalisation, apartfrorn

the more obvious pinna.(and more minimal head an<! torso) cues. They argue that

head movements can help to identify targets an<! prevent them ·frombeing located

inside or very close to the head. This is done primarily by producing a binaural

change in the stimulus that is 'natural' and corresponds closely.(o a listener's everyday

experience. By holding the head still, unnatural conditions are created, involving the

listener's head position and expectations about a binaural signal, thus weakening

externalisation.

Durlach et al also consider reverberation as a factor in externalisation. They propose

that reverberations somewhat reduce the resolution of direction. However, this

reduction is limited by the so-called 'precedence effect', where the auditory system

enhances perception of the incident wavefront and suppresses subsequent echoes.

Reverberation also aids judgement of distance (which aids externalisation), but only

in reverberant environments. In 'anechoic' (non-reverberant) settings, loudness is the

only available cue to distance, although it is unreliable because the listener must have

an awareness of the original intensity of the sound source. So although reflections are

considered to confuse the listener, they can enhance distance information. It should

be noted, however, that reverberation as a cue to distance can be difficult to resolve.

Woods & Kulkarni (1992)·comparedperceived externalisation of manikin recordings

made in either an anechoic or reverberant setting. They found that the sounds

recorded in a reverberant room produced a far greater perception of externalisation

that those recorded in an anechoic room. Even with the KEMAR's pinnae removed,

the sounds were well externalised for reverberant conditions.

Hartmann & Wittenberg (1996) measured externalization using discrimination tasks

for simulated sounds over headphones. They argued that localizing (as opposed to

lateralizing) depends on the ITD's of low-frequency components (but not high

frequency). But lLD's in all frequency ranges were equally important. They also

demonstrated that it is necessary to deliver a realistic spectrum to each ear and that

14

Chapter 2: Background and ·Literature Review

simply maintaining the interaural spectral level difference is inadequate. A simple

interaural spectral level difference did not produce a well externalised sound.

Thus externalisation appears to depend on a number of features. Most, but not all of

which are essential to 3-dimensional simulated·(or pre-recorded)sound·if they are to

be localized by listeners. However, Durlach et al recognise that externalisation is not

wholly physical and that experimental methodology ·can have a small influence.

Eliminating internalised percepti{)ns can be limited or ruled out by constraining the

available response choices.

2.2 PINNA EFFECTS

Batteau (1967) showed that the pinna plays some role in localization. He

hypothesized that this was due to refraction of the sound by the pinna causing a

transformation that would be unique according to the original source location.

Blauert (1969) offered support for this view by suggesting that the pinna acts as a

filter that attenuates or passes frequencies depending on their direction. Blauert

(1983), Oldfield & Parker (1984a, b) and Wright et al (1974) all went on to establish

the pinna as a direction-dependent filter that did indeed cause spectral changes to an

incoming signal. Hebrank & Wright's (l974b) study also revealed that the

cancellation of reflected sound at certain frequencies by part of the pinna known as

the concha, causes spectral notches that alone may provide azimuth and elevation

cues.

Elevation discrimination has been hypothesized (e.g. Butler, 1969; Gardner &

Gardner, 1973) to be the primary function of the .pinna. Azimuth judgements have

been established by many (e.g. Rayleigh, 1907; Stevens & Newman, 1936; Sandel et

ai, 1955) to be made more on the basis of the dominant interaural difference cues.

However, for elevation, particularly sounds that lie on the median plane, the ITD's

and ILD's are identical since the source is equidistant from both ears. The pinna may

therefore be the principal component of location identification.

15


Searle et al (1975) examined the role of the .pinna by making physical measurements

of the transfer function fmm a (vertical plane) free-field source to microphone in a

listener's ear canal. They indicated ·thatthere are two independent localization cues

generated by the pinna. The first is a change in frequency response as a function of

elevation in the median vertical plane (MVP), and the second is a disparity between

left and right ear responses, which also changes with elevation angle.·Independent

psychophysical measurements indicate that these pinna cues are detectable by

subjects and that both cues are used in vertical iocalization tasks.

Investigations into the specific effects of -the pinna folds and dimensions ,have also

been conducted by Gardner '& Gardner (1973), who progressively occluded pinna

cavities to see the effect on localization in the median plane. They demonstrated that

localization ability does decrease with increasing occlusion, but found that

localization ability was not uniform. Localization was improved for signals in the

anterior sector of the median plane (as compared to the rear sector), and high

frequency signal content was discovered to be more important for accurate

localization than low frequency content. An experiment by Musicant & Butler (1984)

has shown that the importance of high frequency content is because the pinna

attenuates the high frequency components of a sound, above approximately 9 KHz,

when the stimulus is played behind a subject. This allows listeners to make front-back

distinctions, so providing valuable cues to azimuth as well as elevation and enhancing

localization accuracy.

Butler & Humanski (1992) studied binaural localization of lowpass & highpass noise

in the MVP. Seven speakers were located in a sound-treated room positioned

vertically at 15° intervals between 0 and 90°, 1.2 m from the head. They predicted

that localization performance on lowpass signals would not differ from chance values,

but that for highpass signals, performance would be significantly more accurate than

chance. They argued that this increased accuracy for highpass noise would result

from the availability of pinna cues for higher frequency sound, Their results showed

a mean error of 27° for the 3 KHz lowpass noise. For the 3 KHz highpass noise they

obtained a mean error of just 8°. Since the chance figure was 35°, their hypothesis

was confirmed, reinforcing the view that the pinna plays a dominant role in MVP

localization.

16

-------


They also compared .binaural and monaural localization of low and highpass noise in

the lateral vertical -piane(L VP). For monaural localization the judgement accuracy

for the highpassnoise was significantly greater than for the lowpass signals (23°

compared to 33°). Fm.binaurallocalization, -the same trend was.observed with errors

of 6° for highpass and 9° {or lowpass signals - both of which were significantly

smaller than the monaural condition overall. They conclude that monaural spectral

cues do contribute toward ·localization accuracy in the L VP up ·to around 45°

elevation. To localize throughout the L VP (beyond 45° elevation), however, requires

interaural timing and level differences in addition to pinna cues.

For the majority of experiments investigating pinna cues, standardised pinnae are

used, although in reality pinna shapes are unique. Freedman & Fisher (1968)

investigated localization with individualized pinna as part of their study. They

proposed that a listener's perception may be hindered by using standard pinnae

beCause the considerable·experience and practice we have with our own ears may be

critical.

Their experiment compared using one's own pinna with using nonindividualized

(standard) pinna and no pinna. The individualized pinna condition involved subjects

listening normally. The nonindividualized condition used IDcm metal tubes to

conduct the sound to the ears, with ·casts of pinnae at the ends of these tubes. The no

pinna condition used sound conducted through the metal tuhes only.

The first part of their study ruled out head movements in order to conduct a pure

evaluation of the role of pinna cues. They found individualized pinna and

nonindividualized pinna to give significantly greater accuracy than no pinna

(nonindividualized .pinna = ±31.6°, no pinna = ±36.5° - surprisingly large results

which are not matched in the literature). However, no differences were found

between using one's own pinnae and standardised pinnae, implying that we do not

seem to require our own ears. A second experiment used the same conditions but this

time head movements were incorporated. Accuracy was similar to the condition with

restricted head movements, but no differences were found between the different pinna

conditions. Therefore, with head movement restricted the pinna appears to provide

important localization cues. But when head movements are incorporated the accuracy

is the same both with and without pinnae. However, the accuracy noted with head

movements was 22.5° overall which is still surprisingly high.

17


2.3 HEAD MOVEMENTS

Head movements are a factor in Jocalization whose importance was first proposed by

van Soest (1929). He argued that ifone were-to.perceive a sound·from straight ahead,

then the absence of ITD's and llD's means that a-differentiation of front from back is

very difficult (i.e. _0° sounds can often be indistinguishable from 1800 sounds, and

similarly with, say, 30° and 150° sounds). However, if the head is moved, say, to the

right, then sound would reach the left ear first, enabling the listener to disambiguate

its direction. As the majority of studies do not incorporate head movements (unless

conducted 'live' in the free-field as opposed to being pre-recorded and presented over

headphones) front-back azimuth confusions are commonplace (e.g. 12% reported by

Wightman & Kistler, 1989; 26% by Wenzel et ai, 1993).

Van Soest's findings were supported·by Wallach (1939 & 1940), who demonstrated

that moving the head during a sound provides cues for several lateral angles for the

same sound source direction. He argued that this sequence of lateral angles will

accurately determine a particular location. An important part of this motion is also a

disambiguation of so-called 'front-back confusions'. Wallach differs from many

researchers.(e.g. Gardner & Gardner, 1973; Musicant & Butler, 1984) in his belief

that the pinna are only important in ·reducing front-back errors in the absence of head

movements, which is not a common occurrence in everyday listening.

The work of Young (1931) describes the importance of head movements in terms of

externalisation. He studied the effect of either a still head or a moving head on the

binaural stimulus pattern and found that where head movements are available,

reliable, accurate, 3-dimensional localizations are possible. But when head

movements are ruled out, only restricted 2-dimensional judgements can be made.

Experiments by Pollack & Rose (1967) investigated the role of head movements in

localization. In one condition they systematically varied the duration of the sound

source and compared situations where head motion was either allowed (generally or

turning to face the sound source) or restricted. Their results revealed that head

movements do assist in localizing a sound source, but only one condition in their

series of studies yielded a significant improvement - when the subjects turned to

face the sound source. Turning to face the sound may aid localization because sounds

are more accurately located in the midline (Mills, 1958; Perrott, 1984). Thus, if

18

· Chapter 2: Background and Literature Review

subjects did not position the sound at the centre of their heads then stimuli would be

judged 'off-centre' and may therefore be less accurate. Yet other findings, such as

those of Thurlow & Runge (1967) {<mnd a clear improvement in localization for all

positions when head movements were available and did not report any specific

movement or location conditions. However, Thurlow & Runge did find that the

improvement, although statistically significant overall, was typically less than 30%.

Yet head movement is still commonly restricted in studies to maintain consistent

input, which may not be entirely representative of a "normal" hearing experience, but

provides a "pure" measure of human localization accuracy using only the spectral

content of a signal and transformations by the pinnae, head and torso.

2.4 VISION

Visual stimulation may also affect the apparent location of a sound. Along with head

movements, it is little researched and often omitted from studies to examine more

subtle physiological effects such as location-dependent spectral changes.

One of the earliest absolute localization studies incorporating vision was conducted

by lackson (1953). He reports two experiments that compare the judgement accuracy

of an auditory stimulus alone and an auditory stimulus accompanied by a visual

stimulus. His first experiment uses 5 bells placed along an arc in the frontal azimuth

plane, spaced at 22.50 intervals. In the first condition, a bell was rung on its own and

subjects had to report the location of the source from a number of options. In the

second condition, the bell sound was accompanied by a light, independent of the

sound source, shone either at the same or a different location. Subjects had to report

the apparent position of both the bell and the light and it was expected that the

presence of the light would alter the perceived location of the bell. Indeed, the

addition of the light increased accuracy from 46% to 60% if the bell and the light

were in identical locations. However, this difference was not statistically significant.

In the second experiment, 7 whistles were placed 300 apart along the same azimuthal

arc. These whistles were either played alone or were accompanied by an unrelated

puff of steam, presented at either the same or a different location. As in the first

19


experiment, the addition of vision increased accuracy from 62% to 99% if the whistle

and steam were aUhe same position (a statistically significant improvement).

Where the auditory and visual stimuli deviated by 20 - 30° the proportion of correct

responses to the auditory stimulus fell to 38% in the first experiment and just 3% in

the second experiment. Although the percentage of misguided responses to the visual

cues were 43% for the first ·experiment and 97% in the second - vision clearly

overriding ·the auditory cues. At deviations of 45° or more the number of correct

responses to the auditory signal remained similar for experiment I but was higher for

experiment 2 and the responses to the visual stimuli decreased in both the bell and

whistle experiments. Thus as they reach sufficient distance from each other, the cues

are identified correctly by subjects as being separate.

Jackson's study clearly shows that the effect of vision can be strong and even

misleading when localizing a sound source. This is a good illustration of the so-called

"ventriloquism effect" - where the presence of a corresponding visual object can

bias judgements of the perceived location of auditory objects (Pick et ai, 1969).

Lovelace & Anderson's (1993) study also looked at the effect of vision on auditory

location identification, but where no visual information was associated with the

sounds. The apparent location of a speech stimulus was judged by pointing to a

concealed target sound with subjects either being sighted (able to see an arc marked

out with measurements in degrees, but not the sound sources themselves) or

blindfolded. Unsighted subjects made errors of ±6.8°, compared to the sighted

subjects whose average error was ±3.79°- a statistically significant difference.

These findings illustrate that the general presence of vision appears to increase our

ability to localize. Since no visual link is provided with the sound source, general

vision perhaps informs us of more subtle cues about our acoustic environment that

aids localization. Indeed, these results offer support ·to Shelton & Searle (1980),

whose research compared the mere presence of a visual environment with localization

in darkness. Their findings revealed that localization accuracy is marginally greater

in the light than in the dark. However, Lovelace & Anderson went on to ascertain

that their noted improvement may simply reflect using vision to calibrate hand

movement. Thus, the true role of vision in localization remains rather ambiguous and

unresolved.

20


It is evident that a number of influential variables are encompasse<l within the

localization -process, although much ambiguity surroun<ls the precise role and

contribution of any of these factors, as is clearly demonstrated by {he diversity of

results yielded by studies in -this area. Without more reliable information about such

processes the application of audition to virtual reality systems will.faII a long way

short of producing the 'realistic' sound that is required. Hence these experiments set

out to solidify and expand upon existing knowledge with the aim of further improving

and refining the auditory element of VR displays.

21

Chapter 3: Methodologies

CHAPTER 3

Methodologies

3.1 INTRODUCTION.

This chapter is intended to elaborate on some of the basic methodological·processes

that are involved in constructing a sound localization experiment. Whilst each

chapter describes individual-relevant procedures and techniques, there are methods

which apply to all experiments and would benefit from a more in-depth explanation.

The first section describes the characteristics of headphone and tubephone listening.

In all experiments, pre-recorded or generated sounds are listened to through

headphones and/or tubephones.

The pinna moulding process is given in some detail, since throughout the thesis, non

standard pinnae were used on the manikin, to simulate a normal hearing experience

more accurately.

KEMAR recording procedures describes the experimental conditions and set up and

gives a detailed description of the equipment that is used in almost all experiments.

Another technique referred to throughout the thesis is 'front-back correction'. The

reasoning behind these corrections and an example of the technique for calculating

front-back errors is given.

22


3.2 HEADPHONES AND TUBEPHONES

The headphones used are£eyer Dynamic Dl 48 'closed', which cut out most

background sounds. These sit over the entire .pinna ·and deliver sound within the

concha, opposite the meatus ·entrance. Headphones are typically used to play back

sounds to subjects {hat have been recorded using a manikin (see section 3.4 below).

However, the'simplest and most common method of making manikin recordings is to

use fittings known as Zwislocki Couplers. These hold the microphones in place at the

eardrum position and create an artificial ear canal. Thus, headphones do not deliver

sound to the point ·of recording and produce a 'double travel' down the ear canal and

additional concha resonance (discussed more fully in Chapter 6). In an attempt to

deliver sound to the exact recording location, so-called 'in-ear' tubephones (Etymotic

ER-2) were used in .place of or in addition to headphones in several experiments.

These are narrow tubes which are inserted into the ear to within 0.5 cm of the

eardrum. They are held in place by a small foam earplug, which sits just inside the

meatus. The tubephones were expected to increase judgement accuracy by retaining a

more faithful reproduction of the original signal.

3.3 PINNA MOULDING

A silicon-based rubber called Otoforml was used for making individual pinna moulds

that could be used to replace the standard KEMAR pinnae2.

Ethical clearance must be {)btained before the process can begin. The subject is first

required to undergo an examination of the outer ear, meatus and eardrum by a trained

technician. If there are any reports of infection or discomfort by the subject or if the

inspection reveals any sign of infection, the process is terminated for that subject. If

no problems arise up to this point, the procedure is fully explained to the subject. If

1 Otofonn-K2. Condensation-Vulcanising Silicone Impression Material with Hardener, Cat. No.

071K2: By P. C. Werth Ltd., 45 Nightingale Lane, London, SWI2 8SP, UK. Fax: 01816757577.

2 See Appendix I.

23


they are comfortable with the procedure a consent form is 1iigned. The process of

pinna moulding comprises the following stages:

l. Producing a 'negative' mould of the entire ,pinna area and meatus entrance.

2. Making a 'positive' impression of the .pinna using the negative mould as a

cast.

3. Making a mould of the left and right1<EMAR pinna-fittings.

4. ·Fitting the moulded pinna to{heK£MAR mould.

3.3.1 Producing a 'negative' mould of the pinna area

The moulding composite for a single pinna mould is prepared by mixing 80 g of

Otoform with 0.5 ml of hardener. The compound is then transferred into a 100 ml

syringe. Each subject is prepared by placing a small foam 'otostop' into the ear canal,

as far down as is comfortable for ·the subject but several millimetres from the

tympanum. The otostop prevents any Otofonn from coming into contact with the

eardrum and causing damage. It is easily removed by pulling on the 2 cotton strands

that are sewn into the foam and which hang roughly 3 cm outside the meatus

entrance. These strands are ade1:juately strong without being thick enough to interfere

with the modelling process.

The subject's entire pinna is cleaned with cotton wool soaked with alcohol. With the

head on one side, the Otoform compound is squeezed into the ear, starting at the

meatus entrance and working outwards to fill the concha and pinna-flange. The

Otoform is then left for approximately 20 minutes, after which time the cast can be

removed fairly easily, although considerable care must be taken. The process is

repeated for the second pinna and the casts are left to harden ·fully for a further 48

hours.

3.3.2 Making a 'positive' impression of the pinna

Using the fully hardened, but still flexible negative pinna impression, the positive

mould is produced. The cast is lined with a thin film of petroleum jelly to prevent

sticking, since the negative and positive moulds are made of the same substance. This

time the composite comprises 30 g of Otoform and 0.15 ml of hardener. The Otoform

24


mixture is again transferred into a ·100 ml syringe and squeezed carefully into the

entire cast (including the ear canal section), ensuring that no air-bubbles form. The

compound is left -for 18 hours and is {hen removed slowly taking care not to tear or

distort the pinna mould. The positive impression is cleaned and any minor

imperfections or small·tears can be repaired by smoothing in more Otoform mixture.

3.3.3 Making a mould of the KEMAR fittings

The KEMAR has two square recesses on either side of the head. into which each

pinna fits. The new moulds of a listener's ears must therefore be fused to a mould of

this recess (a 'KEMAR-fit') before the whole pinna an be fitted to the manikin.

First, a cast is made of the square recesses using the same Otoform compound that

was used to manufacture the pinna cast (section 3.2.1). This cast includes the ear

canal (simulated using a Knowles 'OCCluded-ear simulator, model DB-lOO). A

KEMAR mould was then produced from this cast using a hard substance (lsopon Car

Body Filler). This substance sets completely hard to produce an accurate

representation of the manikin recesses that can be used for all subsequent pinna

moulds. A KEMAR-fit is then manufactured by filling the hard cast with the Otoforrn

substance.

3.3.4 Fitting the moulded pinna to the KEMAR mould

The pinna mould must now be fixed to the KEMAR-fit. Having two separate casts is

advantageous because the two can be .fused -together maintaining the correct angle of

the subject's pinna.

Excess Otoform is removed by making a cut around the pinna impression. The hard

KEMAR cast is then filled with more Otoform mixture (30 g Otoform, 0.15 ml of

hardener) and the pinna impression is placed on top at the correct angle. The pinna is

then pushed down into the Otoforrn substance, causing the mixture to overflow the

KEMAR cast. A spatula is used to remove this overflow and smooth down the

Otoforrn to fuse perfectly with the pinna mould. This is left to set for approximately

48 hours. Once removed, the ear canal entrance must be perforated (to remove a thin

layer of Otoforrn) using a circular chisel with a diameter of 7.5mm to match the

25


diameter of the ear canal simulator. The completed -pinna moulds can then either be

fitted into the left or right recesses on the manikin head. It should be noted that the

two manikin recess are not identical and it was therefore critical to obtain two

separate KEMAR {:asts.

3.3.5 Flat Pinna 'Replac-ements ('Infills')

Flat surfaces were frequently used to represent hstening with no pinnae. These were

manufactured by squeezing Otoform into theKEMAR moulds but without attaching a

pinna mould. Thus a flat rectangular square, flush with the manikin's head, was

produced, with a hole at the meatus entrance. This hole is slightly funnel-like to

smooth the sound pathway from the head surface into the ear canal.

3-4 KEMAR RECORDING PROCEDURES

A KEMAR (Knowles Electronic Manikin for Acoustic Research) was used to make

the stimulus recordings in all but one experiment {this single case used computer

generated HRTF's3). The fibreglass manikin consists of a head and torso positioned

on a rotating base, manufactured by a technician at Loughborough University. The

manikin stands 5 ft 10 in tall and had the approximate size, build and head dimensions

of an average male. The manikin is also supplied with standardised left and right

pinnae, made of a vulcanised rubber. These slot into recesses at either side of the

manikin's head. Each recess has a hole at the centre, representing the meatus

entrance. Inside the head, microphones can either be connected directly to the inside

of the hole (meatus entrance position) or they can be attached to the end of simulated

ear canals (Zwislocki couplers) at the tympanum position.

For all experiments the manikin was -placed at the centre of a large normally

reverberant room. Speakers were typically located around the manikin in the

horizontal plane at various locations (all at ear-height) or at various locations on the

3 See Chapter I for a brief explanation of HRTFs and Chapter 11 for a full experimental demonstration

and methodology.

26


median vertical plane. For azimuth sound sources, separate (matched) speakers were

used (see Figure 3.1). For elevation locations, a single speaker was used for the

majority of experiments. T-his could be rotated to different positions on the median

plane at a constant distance from the manikin (see Figure 3.2). Sounds were played

from these speakers and received by the microphones in {he manikin's head.

The left and right microphones were fed into a.pre-amplifler. From the pre-amplifier,

one of two recording methods was conducted. The first used an amplifier to feed the

sound through a pulse code modulator to digitise the recording onto a Betamax video

cassette. The second method used digital audio tape (DA THed directly from the pre

amp. From either the Betamax or DAT, sounds were transferred onto a computer for

editing, using the Audiomedia software package. Editing involved isolating each

stimulus and deleting any mistakes, talking, interruptions and miscellaneous noises

that had occurred during the recording process. ·From here, sounds were ordered as

required and an interstimulus interval (lSI) of the relevant duration was inserted. The

ISI consisted of a section of 'room silence', recorded during the stimulus recordings,

to produce a continuous and {;onsistent background noise.

Where stimuli were presented live and not in a pre-recorded form, the recording stage

was simply omitted4•

4 See Chapter 9 'Method' section.

27

Chapler 3: Melhodologies

i "m-t 1.2m 1.2m

~

Figure 3.1: Diagram (nol 10 scale) of the manikin in the centre of a wooden hoop (3 m in

diameter), used to support the speakers in the horizontal plane. All speaker positions were

fully adjustable. The hoop was supported on wooden struts, which slotted into heavy

metal base units to stabilise and secure construction.

I.Sm ~

I.S7m o I.Sm

Figure 3.2: Diagram (not to scale) of the speaker set-up for median plane (elevation) source

locations. The arrow shows the direction of movement around the manikin. The range of

possible speaker positions was _500 to +3200 elevation (where 00 is straight ahead at ear

level and 1800 is directly behind).

28


3.5 FRONT-BACK CORRECTION

The uncorrected angle error ~s calculated by measuring ~he absolute distance (in

degrees) between a subject's judgement and the true location <Jf the sound source.

The 'front-back corrected' angle error is calculated by shifting atl judgements that are

incorrectly placed in ·the front or .ear hemifield to the opposite ~mifield. Although

front-back correction refers to'shifting front-to-back as well as back-to-front, the latter

is more common.

Typically. 0° is taken to mean directly in fmnt of the subject, 90° to {he right of the

subject and 180° directly behind. Thus a target of 45° (marked "A" in Figure 3.3) that

is judged by a subject to be at lWo CB "), would first be shifted to the front quadrant

(in which the target lies), making it 70° (position "C"). This is done by flipping it

about the axis of symmetry defined by the line between 90° and 270° (marked

"AXIS"). Then the distance of this shifted judgement from the target would be

calculated-to give the front-back corrected error of 25° ("D").

TARGET

A/~ D

C -1 CORRECTED RESPONSE

27{)O I-------IIE--+----t 90°- AXIS

Figure 3.3: Front-back correction of sound source ("A") judged to be at position "B". The

judgement is first shifted to the opposite hemisphere ("C") then the angle error from this

new shifted position is calculated ("D").

29

Chapter 4: The Role of the Pinna in Sound Localization

CHAPTER 4

The Role of the Pinna in Sound Localization.

ABSTRACT

Several studies have shown the pinna to assist in the localization of a sound source

(e.g. Batteau, 1967; Freedman & Fisher, 1968). The pinna is primarily considered to

facilitate elevation discrimination (e.g. Butler, 1969; Gardner & Gardner, 1973).

However, subtle pinna cues may be lost or hindered if using unfamiliar pinnae.

This study investigates the benefit of using individualized pinnae compared with

nonindividualized or no pinnae at all. A KEMAR manikin was fitted out with moulds

of each subject's pinnae or no pinnae - flat infills with a hole representing the meatus

entrance. Clicks were digitally recorded using the manikin with microphones placed

at the internal meatus entrance. When played back over headphones in a sound

attenuating booth the recordings gave a realistic 3D sensation. Subjects were then

asked to identify the location of the clicks in the horizontal and vertical planes.

The angle errors for both azimuth and elevation judgements were unexpectedly high.

Simply producing 3D recordings and realistic pinnae is clearly not sufficient to

maximise localization accuracy.

A small but statistically insignificant benefit was found for individualized pinnae over

nonindividualized and no .pinnae for azimuth. This was as expected since interaural

timing and level differences are the dominant cues for azimuth discrimination. No

effect was also found for elevation. This is surprising in view of the purported role of

the pinna - elevation determination. However, the failure to obtain a significant

result may relate to the fact that the sample size was small or perhaps that overall task

difficulty (reported by subjects) masked any subtle pinna effects.

30


INTRODUCTION

The role of the pinna in localization is considered to be particularly important when

judging the elevation of a sound source in the median plane (e.g. Butler, 1969;

Gardner & Gardner, 1973). This .proposal·does seem reasonable since there is a

paucity of interaural time and level differences (ITD's and lID's) for elevation in the

median plane. Although the pinna cannot be ruled out as a factor in judging sound

sources played in the horizontal plane. A study by -Batteau (1967) illustrates this: He

examined the effects of the pinnae on localization accuracy by recording sounds using

microphones which were inserted into moulds of pinnae held onto a bar (representing

the diameter of the head). The recorded sounds were .played to the subjects via high

fidelity headphones. This resulted in the impression of the sounds being "out in

space" and not latera1ized within the head. Subjects were able to make reasonably

accurate judgements of both left-right dimensions and elevation. However, when the

pinnae were removed, judgement accuracy was significantly reduced in both planes.

Batteau reasoned that the role of the pinnae in localization was to facilitate the

production of numerous micro-second delay paths caused ·by the different pinna folds

and cavities. The incoming signal is thus transformed by the pinna and interpreted by

the listener to have originated at a particular point in space, depending upon this

transformation.

Since then, studies have focused more upon the spectral transformations of the sound

that are caused by the pinnae and -less upon time delays. Wright et al (1974), for

example, looked at the effect the ·pinna has on incoming sound, and found that pinna

reflections cause spectral changes which may provide (at least partially) the cues

necessary for localization. A number of similar studies (e.g. Blauert, 1983; Oldfield

& Parker, 1984a) have further established the pinna as a direction dependent filter that

causes spectral-changes that can be used as a cue to the location -of a sound source. In

support of the notion that pinnae are useful for localization in every direction, and not

just in the median plane, is the finding that the pinna attenuates the higher frequency

components of a sound, above approximately 9 KHz, when the stimulus is played

behind a subject. This might enable listeners to make front-back distinctions, thus

providing valuable cues to azimuth as well as elevation (e.g. Freedman & Fisher,

1968; Musicant & Butler, 1984). Shaw & Taranishi (1968) additionally showed that

blocking the ear canal had little effect on the sensitivity to sound source azimuth

31


measured in the ear canal at up to 12 KHz, indicating that the longitudinal resonance

of the ear canal contributes little to direction dependence. This implicates 'further the

need for the pinna in sound source localization.

Investigations into the more specific effects of the-pinna cavities and dimensions have

been conducted by Gardner "& -Gardner (1973), who progressively occluded pinna

cavities to see the effect on -localization in the median plane. They demonstrated that

localization ability does decrease with in{;reasing ~cclusion, but found that

localization ability was not uniform. Indeed, it was better for signals in the anterior

sector of the median plane (as compared to the rear sector), and high frequency signal

content was discovered to be more important for accurate localization than low

frequency content.

The cues provided by the pinna appear to be subtle and complex and may therefore be

altered by using foreign pinnae. Each person's pinna is unique in shape, and many

studies ignore this by using nonindividualized, or standardised, pinnae. Freedman &

Fisher (1968) overcame this problem -by attempting to measure localization accuracy

using individualized, nonindividualized or no pinnae. Sounds were either channelled

through metal tubes (with and without pinna casts attached) or subjects listened

normally. However, to test a{;curately individualized against nonindividualized

pinnae, casts of the listener's own pinnae should also have been attached to the metal

tubes. Nevertheless, they found no difference between the individualized and

nonindividualized pinnae, but both gave an improvement over no pinnae at all.

This study addressed the issue of whether individualized pinnae give greater

localization accuracy than nonindividualized pinnae and no pinnae. However, the

methodology of Freedman & Fisher was improved by comparing identical listening

conditions using different pinnae. Recordings were made using a KEMAR manikin

fitted either with casts of the individual subject's pinnae, or with a standard set of

moulds taken from a non-participant listener. Finally a set of infills were used, which

represented a no pinna condition.

Subjects were required to identify the locus of clicks that had been digitally recorded

using the manikin and were played back over headphones. Judgement accuracy in

both the horizontal and median vertical planes was investigated.

32


METHOD

Subjects

5 male subjects were recruited by opportunity sampling. All were undergraduate

students with no prior experience ·of auditory localization tasks. ·Subjects were

examined for infection using an Otoscope.

Design

A 6*5*5 repeated measures design is used. There were six listening conditions which

were all the same, but consisted of different recording conditions; using

individualized (own) pinnae, no pinna and nonindividualized pinnae. Each condition

consisted of recordings made at 5 azimuths (0°,40°,80°, 140°, 180°) and 5 elevations

(_50°, _25°, 0°, 25°, 50°). Thus 150 clicks were presented in total.

The stimuli for each of the 25 target locations (5 azimuths x 5 elevations) were

recorded through the manikin using 5 different sets of pinnae l (l for each of the 5

subjects) and using infills2 (no pinnae). Each set of 25 sounds was randomised and

the 6 listening conditions were presented to subjects in a random order.

Stimuli

Broadband clicks (with cut-off frequencies of 1 KHz and 17 KHz) were generated

using a Masscomp computer3. The clicks were played through a Radio Spares Wide

Range 6" speaker placed on a 1.Sm wooden pole that was pivoted at the manikin and

could be adjusted to any elevation between _900 and +90° in front and behind.

I Chapter 3 for methods of pinna moulding

2 See Appendix I

3 Although the Masscomp generates a flat spectrum stimulus, when played through the speaker it

becomes distorted and a non-flat spectrum is produced. The signal (played through the speaker) is

therefore channelled back into the Masscomp which then generates an inverse spectrum of the

waveform such that when played through the speaker again, a flat spectrum click is obtained.

33


Azimuth positions were obtained by the use of a rotational device (with 1° azimuth

markings) built into the bottom of the manikin torso.

Breul and Kja:r 4134, OS' microphones were placed into the eardrum position of the

KEMAR manikin. The ear canals were replicated using Zwislocki Couplers, each

with a length of 2.3 cm. The stimuli were recorded at each of the 25 target locations

with the 5 diffcrent sets of pinnae and for ·the infills(no pinna condition), III a

normally reverberant large room.

Stimuli were recorded onto a Betamax video cassette using a pulse code modulator to

digitise the recording. Recorded clicks were sampled by an "Audiomedia" sound

editing package run on a Macintosh Computer for randomisation, which was different

for each subject. A 5-second interstimulus interval of 'room silence' was inserted.

,

Procedure

Stimuli were played through tubephones (Etymotic ER-2) in a sound-attenuating

booth. The 6 conditions were each presented to subjects twice, once for the subject to

make azimuth judgements and once to make elevation judgements. The order of

vertical and horizontal localization tasks was counterbalanced between ·conditions.

Thus, for half of the conditions subjects made azimuth judgements first, and for the

remaining half, subjects made elevation judgements first.

Subjects were first provided with instructions<!. Tbey were then given response sheets

(see Figures 4.la & b) atihe beginning of the first condition for azimuth and·elevation

and instructed in which order to make judgements. After making a set of azimuth and

elevation judgements f(lr one condition, there was a 10 minute break to counteract any

practice and boredom effects. Response sheets for subsequent conditions were

provided at the end of the 10 minute break.

For the onset of each stimulus, subjects were instructed to re-locate their heads to a

forward- facing position by focusing on a cardboard spot straight ahead of them. They

were told to keep their head still during the stimulus but were permitted to move their

head after the stimulus to make their response.

4 See Appendix SA.

34


Front I

I

()

Back

Figure 4.1 a: Response diagram given to subjects for azimuth judgements (actual size). The

head and horizontal plane are viewed from above. A separate diagram was used -for each

response and subjects were free to put the cross anywhere on or within the circle. Distance

was not a variable and was ignored in the results.

Up

Down

Figure 4.1 b: Response diagram given to subjects for elevation judgements. The head is

shown in profile and facing the median vertical plane. One diagram was used for each

judgement.

35

Chapter 4: The Role of {he Pinna in Sound Localization

RESULTS

Mean angle errors were calculated for aU subjects for azimuth and elevation. The

data is represented below -in Figures 4.2 and 4.3, where the mean values for both

uncorrected and front-back corrected ~udgements are given.

The azimuth judgements .produced error values of 16.4° for no -pinnae, 16.7° for

nonindividualized pinnae and 13.7° for own -pinnae. However, analysis of variance

(see Table 4.la) revealed that these differences were not statistically significant. For

elevation, the errors were much larger, even when front-back corrected, giving values

of 42.90 for no pinnae, 44.4° for nonindividualized pinnae and 41.60 for own pinnae.

Again, these differences were not statistically significant (see Table 4.1 b).

The number of front-back errors was similar for all three pinna conditions; 8% for no

pinnae, 9.2% for nonindividualized pinnae and 7.8% for own pinnae. These small

differences were not statistically significant.

36


Anova: Single Factor

SUMMARY

Groue.s Count Sum AveralJ.e Variance own pinna 5 "67.64 13.53 14.21 non individualized pinna 5 80.32 1"-6.06 12.11 no pinna 5 81.8 1·6.36 0.62

Pl'DVA Source of Variation SS df MS F P·value F crit

Between Groups 24.23 ·2 12.12 1.35 0.30 3.89 Within Groups 107.77 1 2 8:98

Total 132.oD 1 4

Table 4.1a: Analysis of variance for azimuth judgements. There are no statistically

significant differences between the three different pinna conditions; own,

nonindividualized and no pinna. Data is corrected for front-back errors.

Anova:Single Factor

SUMMARY

Groue.s Count Sum AveralJ.e Variance own pinna 5 208.04 41.61 98.68 nonin all 5 221.86 44.37 95.13 no pinna 5 217.28 43.46 81.93

Pl'DVA Source of Variation SS df MS F P-value F crit Between Groups 19.81 2 9.91 0.11 0.90 3.89 Within Groups 111)2.99 1 2 91.92

Total 1122.80 14

Table 4.1 b: Analysis of variance for elevation judgements. No statistically significant

differences are found between the different pinna -conditions; own, nonindividualized and

no pinna.

37

70

60 ~

0

~ 50 ... 0 ... 40 ... W

Q) 30

Cl c 20 «

1 0

0


Error Values {or different Pinna Conditions for Azimuth

20 ----If--__ ~

No Pinnae Nonindividualized Pinnae

Target Position

Individualized Pinnae

--Uncorrected -0- FIB Corrected

Figure 4.2: Mean angle error values for azimuth judgements for all subjects combined. Data is both

uncorrected and corrected for front-back azimuth error.;. Statistically significant differences were found

between the uncorrected and front-back corrected data, although no differences were present for the

different pinna conditions (ANOY Al.

38

60 70 I

~ 50 ~

o :: 40 W

Cl) 30 Cl

~ 20

10

o

Chapter 4: The Role of (he Pinna in Sound Localizat.ion

Error Values for different Pinna Conditions for Elevation

No Pinnae

r L..

Nonindividualize<f Pinnae

Target Position

Individualized Pinnae

--Uncorrected -0- FIB Corrected

Figure 4.3: Mean angle error values for elevation judgements. The results are both uncorrected and

corrected for front-back azimuth errors. No statistically significant differences were found either between

uncorrected and front-back corrected data, or between the different pinna conditions.

39


DISCUSSION

This study set out to examine the effect of using individualized, non individualized or

no pinnae on localization judgements. 'f.hemost immediately surprising outcome

were the large angle errors. Several published studies have produced error values

well below those obtained here (e.g. Makous & Middlebrooks, 1990; Stevens &

Newman, 1936). However, fundamental differences exist between this and such

published studies, which may have an important bearing on the angle error.

Many studies that have reported particularly low ermr values have been free-field

experiments. Makous & Middlebrooks (1990) achieved very low angle errors - ±9°,

and although Stevens & Newman's (1936) average was similar to this study -±14°,

errors near the midline were around 5°. These studies were both conducted in the

free-field with -head movements (but no vision) allowed. Although the contribution of

head movements to localization is unresolved, studies have generally found them to

increase acuity quite markedly (e.g. Pollack & Rose, 1967; Schlegel, 1994). For

manikin recordings, no head movements can be accounted for and the error is

expected to be significantly higher. However, the aim was to examine whether the

spectral transformations caused by the pinna are sufficient-to aid localization and thus

head movements would have been a conflicting variable. It should also be noted that

for published studies where head movements were not incorporated (e.g. Wenzel et

aI, 1993, Wightman & Kistler, 1989), errors even larger than those reported here were

obtained (±26° and ±21 ° respectively).

The recordings in this study were made in a normally reverberant room. This ensured

greater ecological validity than an anechoic setting and cepresented the task of the

pinna in everyday hearing conditions. However, Giguere & Abel (1993)

demonstrated that reverberation could reduce accuracy, even for sounds with a brief

onset (such as clicks). Bekesy (1960) also showed that in a non-anechoic

environment, the spatial image of a sound became more diffuse depending on the

distance the sound source was away from the head. Since in this study the

loudspeaker was over a metre away and the recordings incorporated reverberation,

these may well have been factors contributing to errors of judgement.

40


For azimuth, the angle ermrs show a statistically significant impmvement .for front

back corrected data compared to the uncorrected ·data. 'But for elevation there is no

real improvement when the correction is made. Verticallocalization .. provided a much

harder task for subjects than horizontal localization, such that fmnt-back correction

made liule difference. Indeed, task ,difficulty was reinforced both by subjects'

comments subsequent to the experiment and by the results obtained for all three

conditions.

Regardless of actual error values, it was expected ·that using personalised pinnae

would produce the greatest acuity. Since pinna shapes are unique, using unfamiliar

pinnae may produce subtle differences in sound transformation and reduce our

localization accuracy. However, for·both azimuth and elevation the variation in angle

error between conditions was very small, although the general trend shows that using

one's own pinnae pmduces a small improvement over using no pinnae at alL

Nevertheless, ANOV A revealed all differences between conditions to be statistically

insignificant, showing no effect for no pinnae, nonindividualized pinnae or

individualized pinnae. This result contradicts some publicised findings that argue the

importance of the some pinna (over no pinna) in localization (e.g. Baueau, 1967;

Freedman & Fisher, 1968; Musicant & Butler, 1984). Although this study does

contradict these findings, there is support from Freedman & Fisher (1968). Whilst

they found a difference between using pinna over no pinna, they found no

improvement with one's own pinna over another person's pinna. Nevertheless, the

small sample size in this experiment may be the reason for obtaining very liule

difference between pinna conditions.

Finding no differences between the three conditions for azimuth was as expected,

since for azimuth the main cues to location are obtained fmm interaural timing and

level differences between the two ears. For elevation, the findings contradict those

pmponents of spectral theory (e.g. Blauert, 1969; Gardner & Gardner, 1973) who

showed that pinnae are important for elevation discrimination. Such studies argue

that the pinna is useful for elevation in the median plane and this study uses elevation

stimuli that vary in azimuth. Therefore the effect of the pinna are combined with

interaural differences and so their subtle influence may be masked. Yet however

subtle the pinna effects are, the results show that there is only a small variation (if

any) between using individualized or nonindividualized pinnae. It therefore appears

that we don't require personalised pinnae to maintain accurate judgement.

41

Chapter 4: 1be-Rolc of the Pinna in Sound Localization

The pinnae also aid front-back azimuth distinction -by attenuating high frequency

sound when behind the listener. The.flumberDf front-back confusions should

therefore be smaller for the conditions where pinnae are used, compared to using no

.pinnae. However, all three conditions show similar numbers offront-back errors.

Furthermore, the range of 7.8% - 9.2% is small ~ompared to some absolute

localization studies (e.g. Good & Gi1key, 1996, 30% overall mean; Wenzelet aL,

1993,26% overall mean). Although Wightman & Kistler .(1989) obtained similar

values (11 % when averaged) to those in this study. The small percentage of azimuth

reversals offers support to the notion that these are high-fidelity and realistic sounds.

Also, whilst subjects find it difficult to pinpoint the locus of the stimulus, they can

identify the quadrant in which it lies.

This study has demonstrated that despite a small improvement for individualized

pinnae, one's own pinnae do not produce any real benefit and no justification is given

for the time consuming and costly procedure of constructing individual moulds. It

was also revealed that it is imperative to report certain methodological characteristics

when reporting angle errors of absolute judgement tasks, since results are strongly

affected by a number of variables (for example, whether the task is free-field or

recorded and whether the environment is anechoic or reverberant). -It may be that in

this particular task, pinna cues were not being fully utilised by the listener. However,

in other tasks, such as interpreting the effects of head movement, they may be critical.

42

Chapter 5: Localization Judgements in the Azimuthal Plane

CHAPTER 5

Localization Judgements in the Azimuthal Plane.

ABSTRACT

Absolute auditory localization in the horizontal plane was conducled for sound

sources played in a normally reverberant environment, using recordings made with a

KEMAR manikin. A 65dB sharp onset, flat spectrum click, with cut-off frequencies

at 1 KHz and 17 KHz was recorded at 9 locations in the frontal horizontal plane.

When the recordings were played back through headphones at 4 second intervals, a

mean value of 1.74 bits of information was transmitted, corresponding to an average

of 3.34 source locations that can be reliably judged without error in a 1800 arc. These

results indicate that even rich localization cues are not enough to generate auditory

images which are consistently associated with the objective locations of the stimuli

within an azimuthal quadrant.

43


INTRODUCTION

Developments in our ability to engineer 'virtual' sounds have been accompanied by

the construction of ·new working environments where .the operator's auditory world

can be completely manufactured and controlled·by'computer. This raises a number of

practical and iheoretical issues concerning the human listener's ability to process

information -deli vered through this new medium.

The basis of the technique is the application 'of spectral transforms to a sound to

generate new auditory inputs for the ·Ieft and ,·ight ear. By using appropriate

transforms ·the sound can be localized 'outside of the head', which contrasts with

normal stereophonic images which are always perceived, unrealistically, to be

lateralized inside the skull·(Gelfand, 1990). Unique transforms are applied for each

different location of the sound source; a technique that achieves a considerable sense

of realism. There are many potential applications of this method, some of which

require the listener to locate a sound source accurately in space. Below we shall be

considering how effectively a listener can utilise location information when the

stimuli are delivered using this approach.

Wightman & Kistler (1989) evaluated simulated sound source judgements by asking

subjects to identify the apparent location of clicks generated artificially using subjects'

individual 'head-related transfer functions' (HRTF's). These HRTF's had been created

earlier by measuring the spectral characteristics of sounds arriving at the subjects' left

and right ears from a range of locations. Subject's mean judgements showed a very

high correlation .(0.982) with the intended location of the sound, indicating that

HRTF's are a high fidelity means of simulating real sounds. Indeed, similar

correlations were obtained (0.95 being the lowest) by Wenzel et al (1993) using

non individualized HRTF's of wideband noise bursts.

These studies, however, showed that single judgements were often well off-target

even though in the long run the averaged judgements were accurate. In both studies

the judgements yielded large mean absolute angle errors: ±21.1 0 for Wightman &

Kistler and approximately ±26° for Wenzel et at. These do not represent a limitation

of the simulation technique because Wenzel et al were able to show that subjects were

equally poor in free-field situations with actual sound sources. Wenzel et al also

report high rates of front-back confusions for both virtual sound presentations and

44


free-field stimuli which, if left uncorrected, would further increase the reported error

rate. Such azimuth confusions are a common facet of localization studies, particularly

where stimuli are located'on or near the median plane (Blauert, 1983). Such reports

of listeners' surprisingly poor performance is an ~mportant cha,acteristic of human

sensory judgement and needs to be taken fully ~nto account when designing 'virtual

reality' devices for use with -human operators.

The present study was driven by the simple.question of how many different auditory

locations could be identified without confusion by human operators in the horizontal

plane. This would be important if the significance of the information coming from a

sound source was partly defined by its location. The .problem was approached using

information theory (see Attneave, 1959; Edwards, 1969) where subjects were asked to

identify the location of a click presented over headphones or tube phones. Nine

different locations were used and the number and type of confusions noted. The

information transmission rate was used as an estimate of the maximum number of

locations which could be used without confusion (Attneave 1959 p68).

Stimuli were digitally recorded using microphones placed at the entrance of the

auditory canal of a KEMAR manikin. By playing these back directly to subjects it

was possible to avoid ·the two computational steps of deriving HRTF's and generating

new stimuli using these functions. In this way the fidelity of the click presentations

should be increased. The manikin was fitted out with artificial pinnae which had been

moulded in the laboratory using the ears of a volunteer.

The stimuli were presented over headphones for the majority of subject trials.

However we took the opportunity to study the·effect of using recently developed high

fidelity tubephones which introduce sound directly into the ear canal. Thus they

create a sound which should not be subject to unwanted possible resonance effects

generated in the region of the concha when using headphones.

As a further control, the listener's ability to localise these sounds using only monaural

versions of the clicks (i.e. one ear only) was explored. The monaural spectra do vary

considerably as a function of location and it was thought possible that locations might

be distinguishable purely on the basis of monaural stimulation. The stimuli used were

flat-spectrum clicks recorded in a normally reverberant room. This was intended to

provide a rich range of cues, including interaural level and phase differences In

spectral profiles and temporal cues originating from modest reverberation effects.

45

Chapter 5: Localization Judgements in the Azimuthal P1ane

METHOD

Subjects

An opportunity sample of 14 untrained undergraduate students and 2 academic staff

of varying ages were used. For the tubephone ·condition, 2 subjects used in the

headphone trials were used along with 6 members of academic staff, all of whom

were inexperienced listeners. Four of this group were again used for the monaural

presentations.

Design

Subjects heard 54 recordings of flat spectrum clicks made with a KEMAR manikin.

These randomly presented stimuli were l()cated in one of nine different source

positions varying in azimuth between ±90° in the frontal plane. Subjects were asked

to identify which out of the nine locations they perceived the sound source to be

located.

There were three different conditions: the main trial using binaural headphone

presentation (n=16), binaural ear canal tubephone presentation to check for

headphone reproduction fidelity (n=8) and trials presented monaurally, as a control to

examine spectral content as a localization cue (n=4).

Stimuli

The basic stimulus was a 65dB flat spectrum click, with cut-off frequencies at 1 KHz

and 17 KHz. This sharp onset, broad-band signal gives an optimum indication of

source location through inclusion of ITD and lID cues (see Moore, 1989). The click

was generated using a Masscomp computer' and played through a Kef C35 8"

speaker, placed on a speaker stand. Breul and Kjrer 4134, OS' microphones were

placed into the removable pinnae of a KEMAR manikin at the external entrance of

the ear canal. Although the microphones were placed inside the head, the Zwislocki

, See Chapter 3 Method Section

46


coupler was not used. A sound level meter (Breul and Kjrer 2203) with a 0.5"

microphone, was used to measure the received stimulus intensity at the manikin's ear.

Stimuli were recorded in a normally .reverberant r<Jom with recordings made at 9

angles (0, 23,45,68,90,270,293,315, and 338°),1 elevatKln (i:6m - ear height of

the manikin) and 1 distance of 1.83m (6ft). All recordings were made in the front

hemisphere to eliminate the1'roblem of front..IJack confusions.

Stimuli were recorded onto a Betamax video cassette using a .pulse code modulator to

digitise the recof{iing. The recorded tones were transferred to the "Audiomedia"

(sound editing) package on an Apple Macintosh computer to be isolated and re

recorded onto the betamax cassette. Each of 6 repetitions of the set of 9 clicks were

re-recorded in a fixed randomised sequence, identical for each of the 16 subjects. A 4

second silence'was inserted between each stimulus to provide adequate response time.

Procedure

The sound was channelled into a sound-attenuating booth through headphones (Beyer

Dynamic 01 48) or tubephones (Etymotic ER-2) and the volume adjusted such that

the stimulus levels, as received by the manikin during recording, was identical to that

received by subjects through the headphones during playback. Each subject was then

seated in the booth and given a set of instructions2 and a response sheet which

included a diagrammatic representation of the stimulus positions, numbers I to 9 (see

Figure 5.1). Subjects were told to map any stimuli they might hear beltind them to the

corresponding position in the frontal plane.

2 See Appendix 5B.

47


5 '6

3 7

1 o 9

Figure 5.1: Response diagram (actual size) given to subjects. For each stimulus sound

heard, subjects were forced to place their judgement at one of the speaker locations (I -

9). Subjects recorded their actual responses on a separate sheet.

48


RESULTS

Headphones

Individual subject responses were transcribed into confusion matrices and the

transmitted information was calculated' for each subject (TableS. 1 below).

The mean was found to be 1.74, with a standard deviation of 0.17 and a standard error

of 0.042. The range of 1.55 to 2.20 bits corresponds to a range of 2.93 to 4.60 (mean

3.34) source positions that subjects could reliably locate without error in a 1800 arc.

r-----s1;;;:;----T;;:;;;;-~;.:;;;~r;;;;;:;~~;;,;:;i-1

4

5 1.58

6 1.74

7 1.83

8 1.66

I ~o ::~ ! 11 1.92

12 1.56

13

14

15

1.55

1.65

1.70

16 2.20 ~ ........................................................................................................................................................ ..

Table 5.1: Individual information transmission scores (in 'bits') for all 16 subjects.

3 See Appendix 2

49


Tubephones

Binaural tubephone trials yielded a mean transmission value of 1.72 bits, with a

standard deviation 0.33 bits and standard error of 0.12 bits, for the 8 subjects tested.

This corresponds to reliable judgement of a mean of 3.29 source positions - similar to

the above condition of 0.02 bits.

Monaural Control

The control condition gave an average of 0.49 bits which represents a value little

better than chance judgements; accurate placement of 1.4 sound sources.

Pattern of Confusions

Figure 5.2 (overleaf) shows the total combined confusion matrix for all 16 subjects. It

can be seen that right or left general target areas are distinguishable, as is the target in

the midline, yet there is very little accuracy for individual positions beyond this.

Figure 5.3 shows a breakDown of angle error at each of the target locations. The

lowest error (8.44°) is indeed straight ahead at 0° and the highest error (31.4°) was to

the right - 90°.

50


RES PO N S ES

1 2 3 4 5 6 7 8 9

P 1 34 35 15 10 1 1

A 0 2 33 41 14 8

C S 3 9 31 35 21

T 4 4 12 23 48 9

U T 5 2 3 67 21 1 2

A 6 27 32 29 8

L 0 7 2 2 11 30 34 17

N 8 1 9 24 45 17

9 2 12 23 38 20

1.74 bits

Figure 5.2: Matrix showing the total transmission scores for all 16 subjects in the binaural

headphone condition.

51


Position 1 2 3 4 5 6 7 8 9

Angle -90 -68 -45 -23 0 23 45 68 90 Mean

Pasn. Totals ·E r ro r

Frequency 1 96 24.84·





Frequency 6 8 96 26.72·

Frequency 7 ·17 96 20.86·

Frequency 8 17 96 14.53·

Frequency 9 20 96 31.4,.

Total Mean Error = ±19.36°

Figure 5.3: Information matrix showing the mean angle error values for individual source

positions. The frequency of response is given for each stimulus. The mean error for each

source position is given in the extreme right hand column, with the total mean angle error

shown below.

52


DISCUSSION

The results clearly show that absolute judgement for the locations of click sound

sources under these conditions is very .poor. The average transmission. rate of 1.74

bits can be construed as a maximum of 3.34 ·locations whioh can be identified without

confusion. Ifthe 180° fr<lntal azimuthal plane is divided into 3.34 sections (each 54°

wide), we can roughly estimate the absolute angular error of judgement to be ±27°.

Thus the target stimuli must·be a minimum of 54° apart to ensure accurate location

judgement.

Inspection of Figure 5.2 clarifies-this point. The centrally located click is identified

with little confusion with its neighbours, but ·clicks to the left or the right side are

heavily confused with nearby locations. It is as if the system can distinguish, in

absolute terms, left from centre from right and liule more. The very small individual

differences across listeners suggests that this may be a reliable aspect of absolute

judgement of sound source location using this technique.

These findings deviate markedly from those of Wightman & Kistler (1989) who

found a high correlation between the intended and objective location of simulated

sound sources using HRTF's, and not just a distinction of left-centre-right dimensions.

However, Wightman's data is derived from points that each represent the centroid of

at least 6 judgements. These individual judgements may be considerably off-target,

but about a central point, such that when averaged they produce a mean judgement

value close to that of the actual target position.

Indeed, Wightman & Kistler obtained a large mean angle error of ±2 1.1 0,

corresponding with the mean angle-error of ±19.36° shown in Figure 5.3. This figure

also ag£ees fairly well with Wenzel et aI's (1993) figure of ±26° obtained using both

free-field and headphone delivered stimuli, but is somewhat higher than Stevens and

Newman's (1936) value of ±14°, even though this data is for free-field. The results

are also higher than in the previous chapter, the reason for which is unclear. Absolute

judgement of pre-recorded sounds (involving labelling the location of a sound source)

is very much poorer than our ability to say whether two successive sounds originate

from the same or separate locations - the 'minimum audible angle' (e.g. Mills, 1958;

Perrott, 1984).

53

Chapter 5: Localization Judgements in (he Azimuthal Plane

The comparison of headphone and tubephones showed no significant difference for

this particular experimental arrangement. Indeed they gave almost exactly the same

result and served as a useful replication of the main findings, although subjects were

much less happy with the use of tubephones because of the discomfort of wearing

them. This similarity of results .is somewhat surprising. Appendix 3 shows the

response of the tubephones, particularly up to IQ KHz, to be more·faithful to the

original stimulus than the -headphones. Between I'D and 13 KHz ·the headphone

response lies closer to the original stimulus, and above this frequency, the tubephones

again corresponds more closely. One would expect the tubephones 'to give improved

accuracy of judgement based on this finding, but as recordings were made at the

external entrance-of the meatus, then .perhaps of the two,·theheadphonespresent the

stimulus atthe nearest point of original recording.

The control condition using only monaural stimuli had to -be abandoned after four

subjects because listeners were unable to make any useful distinctions among the

stimuli. Our initial concern that the head-related spectral differences between the

stimuli might act as a cue were not justified by the results.

The binaural stimuli used were rich in Jocalization cues by virtue of being recorded

using a manikin with exact copies of human pinnae in a normally reverberant room.

It is clear, however, from -the results that these are not enough to generate auditory

images which are consistently associated with the objective locations of the stimuli

within an azimuthal quadrant. Under less controlled conditions, the listener would

normally be exposed to the sound for longer than the duration of a click and would be

able to take advantage of head movements to facilitate 'triangulation' of the source. In

everyday life, such judgements are normally made in collaboration with visual

stimulation with a resultant impression that auditory absolute judgement f{)r sound

source location is better than these results indicate.

54

Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna Effects

CHAPTER 6

Methodologies: Site of Recording, Playback Method and Pinna Effects.

ABSTRACT

Recordings made using a KEMAR manikin can be used to assess absolute

localization accuracy in the absence of head movements. Judgements using this

technique have been shown to be consistently poor (on average ±25°). The following

study systematically explores various aspects of the recording and playback

techniques used. These include the site of recording (eardrum versus meatus

entrance), playback position (headphones for meatus entrance and tubephones for

eardrum) and whether or not pinnae were used.

Digital recordings were made of white noise bursts of a 1 second duration at 7

azimuths (0° to 180°) all at 0° elevation, and 7 equally spaced elevations (-45° to

+90°) in the median plane. These were made under two conditions: either the manikin

was fitted with pinnae or with 'infills' - a flat surface with a hole representing the

meatus entrance.

Subjects were presented with these pre-recorded noise bursts, over tubephones or

headphones and were required to judge the sound source location. It was expected

that, the tubephones would produce the most accurate localization judgements for

sources recorded at the eardrum. This is because the recording and playback position

would correspond. Similarly, for headphones, the most accurate judgements were

expected with stimuli recorded at the meatus entrance. However, the expected

relationship between recording and playback position was not upheld and no

statistically significant effects were found.

55


It was further hypothesized that pinnae would .produce a significant reduction in

judgement errors for elevation trials, compared with infills, but that little effect would

be noted for azimuth. This is because other factors, such as interaural time and

intensity differences, play a principal role here. In'the absence of pinnae, statistically

significant reductions in accuracy were indeed observed for ,elevation judgements but

not azimuth judgements.

The fact that results were not affected by recording or playback location demonstrate

the robustness of this large mean angle 'error. These results therefore have

implications for human interface systems using virtual auditory sounds, since absolute

localization of a sound source seems to be greatly reduced in the absence of certain

cues such as head movements or a visual correlate,

56

---- - ---


INTRODUCTION

Increasingly, published studies of absolute localization, in addition to those reported

in previous chapters, are showing large overall mean angle errors. This seems to

conflict not only with other published localization studies, but with our apparent

abilities in the free-field. Stevens & Newman (1936), for example, investigated

localization of pure tones in the free-field, and found mean angle errors of ±14°.

Similarly, Shelton & Searle (1978) have obtained mean errors of ±3° using white

noise bursts, Makous & Middlebrooks' (1990) average errors were between ±2° and

±20° for broadband sounds, and Schlegel's (1994) averages ranged from ±4 to ±100

for pure tones, white noise and clicks.

By contrast, some studies that have compared headphone presented and free-field

stimuli have shown that error values can be high for both. Wenzel et al (1993)

compared headphone presented broadband stimuli with free-field presentations of the

same stimuli. They found an overall mean error of approximately ±26° headphones

and ±24° for free-field - a surprising result in the light of other free-field experiments.

Yet previously, Wightman & Kistler (1989) had obtained very much the same results

in a very similar study.

This inconsistency in mean angle error values may be explained by differences in

design and procedure as well, perhaps, as the inclusion of variables that have hindered

the localization process. Therefore, this study systematically investigated various

aspects of the recording and playback techniques used.

For this absolute localization task (in the azimuth and elevation planes), a I second

white noise stimulus was used. This broad frequency sound provides a varying signal

to aid perceptual judgement. The accuracy with which these judgements can be made

may depend on a number of factors. One fundamental question is whether presenting

playback sounds at the same location as recording improves accuracy. This will be

explored by making recordings using microphones placed both at the meatus entrance

and at the eardrum. Playing back the pre-recorded stimuli through either headphones

(which deliver sound to the outer ear) or tubephones (delivering sound to the

eardrum) should then provide a correspondence to the original recording position -

thus replicating more accurately free-field listening.

57


The effects of using pinnae or no pinnae were examined with the specific aim of

illustrating it's .proposed function for elevation discrimination through improved

accuracy with its use. For this purpose either .nonindividualized pinnae;modelled on

a human subject, or 'infills' Ca 'flat surface with a hole representing the meatus

entrance) were used.

58


METHOD

Subjects

An opportunity sample of 9 untrained undergraduate students and academic staff, 6

male and 3 female.

Design

Subjects made 168 judgements based on recordings of white noise bursts made using

a KEMAR manikin. These stimuli were presented as two identical blocks of trials,

each block consisted of 84 stimuli, made up of 28 sounds varying in azimuth plus 56

sounds varying in elevation. Subjects listened to one block through headphones and

the other block through 'in-ear' tubephones - the order of which was counterbalanced.

The seven azimuth recordings were all made at 00 elevation and were spaced 300

apart between 00 and 1800 (the right hemisphere). At each of these locations

recordings were made with and without pinnae and with microphones placed at the

internal and external meatus entrances.

Elevation recordings were made at 00 and 450 azimuth at seven locations lying

between -45 0 and +900 at approximately 22.5 0 intervals. Again, for each location,

recordings were made with and without pinnae and with microphones at the internal

and external meatus entrance.

Stimuli

The stimulus was a 25 msec raIriped white noise burst of a 1 second duration, with

cut-off frequencies of 20 Hz and 20,000 Hz. The noise burst was generated using the

Sussex Synthesizer package on an Apple Macintosh and played through a Radio

Spares Wide Range 6" speaker placed on a 1.5m pole pivoted at the manikin. This

could be rotated 3600 through either azimuth or elevation dimensions. Breul and

Kjrer 4134, OS' microphones were placed either at the internal (Zwislocki coupler) or

external entrance of the ear canal of a KEMAR manikin, which was either fitted with

59


removable -pinnae or infills (no -pinnae). :For recordings at the ear canal entrance,

microphones were held in place by the external meatus hole in the pinna or infills,

without distortion to the rubber mould.

Recordings were made in anormaHy reverberant -room with the manikin and speaker

placed about the centre. Stimuli were pulse code modulated using a Betamax video

recorder and then transferred to the" Audiomedia" 'sound editing .package. The noise

bursts were isolated and re-recorded onto the Betamax cassette in a random order, but

sub-divided into azimuth and elevation sequences, with a 4 second interstimulus

interval time_

Procedure

The sound was channelled into a sound-attenuating booth through headphones (Beyer

Dynamic D I 48) and tubephones (Etymotic ER-2). Each subject was then seated in

the booth and provided with instructions to read through J. When ready to commence,

subjects were given blank diagrams - different for azimuth and elevation (see

Figures 6.la & b), on which to mark perceived location of each stimulus.

Headphone and tubephone presentations were separated by a one week period, after

which debriefing took place.

J See Appendix Se.

60


. Front I

o Back

Figure 6.13: Response diagram used for azimuth trials. Subjects were instructed to mark a

cross at the point of perceived sound origin.

Up

Down

Figure 6.1 b: Response diagram given to subjects for elevation trials. Sound source location

was indicated by placement of a cross, anywhere on the perimeter of the circle '(distance

was not a factor in this experiment).

61


RESULTS

The absolute angular error across all variables fDr azimuth was 66,7° for uncorrected

data and 24.6° with front-back errors corrocted2• 'For elevation the mean errors were

55.3° for uncorrected results and 44:6° when front-back corrected. As for azimuth,

stimuli played in the median ..plane in .front and behind at the same position will

-produce the same interaural time and level differences. Confusions will occur as a

result and therefore front-back correction eliminates these errors.

Tables 6.1 a & b overleaf show that more detailed differences exist when the overall

means are broken down into presentation method (headphones or tubephones), pinna

or no pinna and internal or external microphone position.

For both headphones and tubephones a statistically significant difference (ANOY A,

see Table 6.2b) was found between elevation judgements with and without pinna after

correction for front back errors. However, the expected interaction between

microphone location and playback method was not found (see Figure 6.2). For

azimuth, no statistically significant differences were found between pinna and no

pinna, nor was there an interaction between microphone location and playback

method (see Table 6.2a).

Figure 6.3 shows the spectra of the original and playback signals (at 0° azimuth, 0°

elevation, with pinnae). 'Left External Original' refers to the signal played through

the loudspeaker and heard (in the left channel) by the manikin with the microphones

at the external meatus entrance. 'Left Internal Original' is the same signal recorded

by the manikin but with the microphones placed at {he eardrum. 'Left Headphone' is

the stimulus received by the manikin when the pre-recorded signal is played back

through headphones. Similarly, 'Left Tubephone' refers to the signal heard when

played back through tubephones (see Figure 6.4 for 'headphoneltubephone and

internal/external microphone placements).

The tubephones give a high-fidelity reproduction of the original signal. The

headphones, on the other hand, show poor reproduction of both the externally and

internally recorded signals.

2 See Chapter 3, section on front-back correction.

62


Headphones

AZIMUTH mean se mean se

( uncorrected) PINNA 64.6 16.3 . INTERNAL 65.39 16.5

NO PINNA 66.1 18A EXTERNAL 65.29 18.2


(fIb corrected) P[NNA 23.71 2.2 INTERNAL 24.52 2A

NO PINNA 23.19 2.3 EXTERNAL 22.37 2. [

ELEVATION mean se mean se

( uncorrected) PINNA 48.06 3.7 INTERNAL 52.30 4.5

NO PINNA 59.06 5.5 EXTERNAL 54.82 5.1


(fib corrected) PINNA 40.[9 3.0 [NTERNAL 45.71 4.1

NO PINNA 48.89 3.0 EXTERNAL 43.36 4A

Table 6.1 a: Mean angle error values for headphone presentation with pinnae/no pinnae and

internal/external microphone placement for both azimuth and elevation judgements.

Statistically significant resuits (ANOV A, f = 3.96, df = 27) are in red.

63


Tubephones


(uncorrected) PINNA 70.79 17.9. INTERNAL 68.16 17.5



(fib corrected) PINNA 25.58 2.6 INTERNAL 26.51 2.6


, ELEVATION mean se mean se

(uncorrected) PINNA 54.83 3.8 INTERNAL 59.14 3.7



(fib corrected) PINNA 42.45 3.1 INTERNAL 45.80 3.3


Table 6.1 b: Mean error values for tubephone presentation of stimuli. Results are shown for

pinnae/no pinnae and internal/external microphone positions for azimuth and elevation

judgements. Statistically significant differences (ANOY A, f = 3.96, df = 27) are given in

red.

64


Factor Type Levels Values p/np fixed 2 1 2 hp/tp fixed 2 1 2 i/e fixed 2 1 2

. Analysis of Variance for Azim

Source DF SS MS F P p/np 1 8.80 8.80 0.10 0.753 hp/tp 1 36.16 36.16 0.41 0.525 i/e 1 97.26 97.26 1.10 0.298

. hp/tp*i/e 1 3.30 3.30 0.04 0.847 Error 51 4495 .. 62 88.15 Total 55 4641.14

MEANS

p/np N Azim 1 28 24.646 2 28 23.854

hp/tp N Azirn 1 28 23.446 2 28 25.054

i/e N Azim 1 28 25.568 2 28 22.932

Table 6.2a: Analysis of variance table for azimuth with front-back confusions corrected.

"p/np" represents 'pinna' or 'no pinna' conditions (with or without), "hpltp" refers to

headphone or tubephone playback method and "i/e" represents internal or external

microphone placement. No statistically significant effects were found.

65


Factor Type Levels Values p/np fixed 2 1 2 hp/tp fixed 2 1 2 i/e fixed 2 1 2

Analysis of Varjance for Eley

Source OF SS MS F P p/np 1 1462.3 1462.3 3.96 0.050

. hp/tp 1 18.0 18.0 0.04 0.834 i/e 1 156.3 156.3 0.38 0.537 hp/tp*i/e 1 0.0 0.0 0.00 0.997 Error 107 43537.4 406.9 Total 111 45174.0

MEANS

p/np N Elev 1 56 41.323 2 56 48.550

hp/tp N Elev 1 56 44.536 2 56 45.338

i/e N Elev 1 56 46.118 2 56 43.755

Table 6.2b: Analysis of variance table for elevation, front·back corrected data. "p/np"

represents pinna condition (with or without), "hp/tp" refers to head phone or tubephone

playback method and "i/e" represents internal or external microphone position.

Statistically significant effects are shown in bold type.

66

~ 0 ~ .. 0 .. ..

(;1;1

'" -OJ)

= -< = '" '" ~

40

30

20

1 0 --

0


Relationship between Recording and Playback Location for Azimuth and Elevation

i 1 __ Headphones (fib corrected)

-0-Tubephones (fib corrected)

Internal External

Recording Location

Figure 6.2: The effect of the internal and external recording positions with headphone and

tubephone playback for front-back corrected angle errors_ ±2 standard errors are shown in

each case.

67

Chapter 6: MethodologIes: SIte of Recordmg, Playback Method, and Pmna Effects

Spectra of Original and Playback Signals

60

40

20 Left External Origin a

..J - - - - Left Headphone

Cl. - - - - Left Tubephone en 0 ID q Left Internal Original "0

- 2 0

- 4 0

- 6 0 Frequency

Figure 6.3. Spectra of original (internal and external) stimuli with comparisons of playback

through headphones and tubephones (for sound recorded at the internal position) .

68

Chapter 6: Methodologies: Site of Recording, Playback Method, and Pinna·Effects

0:]=----

Left External Original (signal received by

microphone)

Left Tubephone (signal playback)

Left (internal) Original (signal received ·by

microphone)

Left Headphone (signal playback)

Recorded Sound

Figure 6.4: Diagrams of original recording and playback positions. '(Internal) Original

shows the microphone at the eardrum location of the manikin. External original is the

signal received by the manikin with the microphone at the meatus entrance. The playback

positions. to human subjects, show the tubephones . at the eardrum, and headphones -

close to the meat us entrance.

69


DISCUSSION

The overall mean angle errors are, once again, surprisingly large. Although they

agree with studies of a similar nature (e.g. Wightman & Kistler, 1989; Wenzel et ai,

1993), they do not correspond to our abilities in the free"field (Stevens & Newman,

1936; Shelton & Searle, 1978; Makous & Middlebrooks, 1990; Schlegel, 1994).

Furthermore, this finding is not unique to one condition, but is large both with and

without pinnae and regardless of recording and playback position.

The physical characteristics of the stimuli at different recording and playback

positions offer some support for this finding. The original 'internal' signal should

produce a close match to the tubephone-presented stimulus, since the recording and

playback position are similar. Indeed, this was the case up to 7.5 KHz, but between

7.5 and 15 KHz, there is some separation of the two signals, in the order of 4 - 8 dB

(see Figure 6.3). The fidelity of tubephone reproduction is, however, far grater than

that of headphones. It is expected that headphones should not match the original

(internally recorded) signal, since there is a difference between recording and

playback position. But they also fail to produce a high fidelity reproduction of the

externally recorded original stimulus, where there is a recording-playback

relationship.

Yet the hypothesis that headphones would match the externally recorded signal was

based upon the assumption that headphones would deliver the sound to the point of

recording, In reality, they sit a small distance away from it and therefore produce

additional resonance around the concha. 'Since the concha causes important

reflections and cancellations before the sound enters the ear canal, this 'double travel'

through the pinna can have a substantial effect. Indeed, Figure 6.3 shows that

head phone playback considerably boosts lower frequencies (between 2 and 4 KHz)

and attenuates higher frequencies (between 4 and 10 KHz), perhaps indicating the

effect of additional concha resonance. Playing the externally recorded sounds

through tubephones should actually give greater accuracy than headphones (although

no difference is apparent) because even though some travel through the ear canal is

lost in this process, the effects of this are minimal in terms of altering the sound.

So the listener should be more accurate with the tubephones, regardless of recording

position. But as the experimental data shows, there are no real differences between

70


headphones and tubephones. Even where slight . differences do exist, the trend is

opposite to that expected, with headphones baving the edge on accuracy. Perhaps the

subtle spectral improvements shown for tubephone ~elivery are being nullified by

overall task difficulty.

Pinnae, as expected, reduced the number·of errors for elevation judgements. The

result was replicated in both headphone and tubepboneconditions, demonstrating its

robustness. This statistically significant finding supports the hypothesis that the pinna

are necessary for discriminating locations in this .plane. Another important function is

distinguishing front from back in the horizontal plane.

The number of front-back azimuth reversals were large - approximately 43% with

pinnae and 33% without pinnae. This difference is not statistically significant, but the

results do not reflect the expected pattern. One would expect the pinna to reduce the

number of front-back azimuth confusions in accordance with a number of studies that

have demonstrated the pinnae to be functional in distinguishing front from back

(e.g. Musicant & Butler, 1984).

There was also no difference found between headphones and tubephones, with both

giving front-back errors of around 40%. This finding is surprising, especially in view

of the apparent 'double travel' of sound through the pinna when using the

headphones. Despite this, no difference was observed and it may be that the

frequencies altered by this 'double travel' are either too high or too low to be of any

significance to the listener.

This study has produced a surprising set of results in view of the large angle error

values obtained. However, the fact that the results were not affected by recording or

playback location demoustrates the robustness ·of this finding. There may, however,

have been some factors whose overriding influence inflated the overall error values.

The lack of a visual reference and even possible discrepancy between the recording

and playback environment could have substantially increased task difficulty. Playing

back the signals in identical conditions to which they were recorded could prove to be

an essential element in our ability to localize accurately. In addition, confusion may

have resulted from false head movements. Although subjects were instructed to keep

their heads as still as possible, even slight head movements are likely. This may

conflict with the signal, which would not make the appropriate transformations

relative to the subjects head movement, since sounds were recorded on a still manikin.

71


Nevertheless, the results obtained, although large, remain similar to the findings of

other studies investigating absolute ·Iocalization (e.g. Wenzel et ai, 1993, errors of

±26°; Wightman & Kistler, 1989, ±21°). 'Clearly, problems with accuracy would

arise in man-machine interfaces incorporating simulated 3-dimensional sounds where

head movements, and ·perbaps visual references are not incorporated.

72

Chapter 7: Effect of Interstimulus Delay and Response Method on Localization Accuracy

CHAPTER 7

The Effect of Interstimulus Delay and Response Method on Localization Accuracy.

ABSTRACT

Studies of absolute localization test the accuracy of listeners' judgements of the

position of discrete. isolated sounds. Yet most experiments present more than one

stimulus which may conflict with the concept of absolute judgement. Reported studies.

however, rarely examine the point at which the memory of a stimulus affects

subsequent judgements by introducing a constraint and therefore artificially reducing

error values - a concept first introduced by Siegel & Siegel (1972).

This study looks at the effect of different interstimulus durations (2, 5, 8 & 12 seconds)

on localization acuity. Subjects heard white noise bursts recorded using a KEMAR

manikin with microphones located at the eardrum position. Sounds were recorded in

the frontal horizontal plane at 5 locations between 0° and 90° and played back over

headphones in a sound-attenuating booth.

No consistent effect (ANOV A) was observed across interstimulus interval conditions.

Therefore, memory did not appear to be involved in the response to current stimuli.

The method of eliciting subject responses was also investigated. Half of the subjects

used a forced-choice (categorical) method and half were given a non-categorical

method - allowing subjects to respond anywhere in a 360° horizontal plane. The

categorical response method produced a statistically significant (pS;O.OI, AN OVA)

improvement in judgement accuracy over the non-categorical method. Therefore, part

of the explanation for the high angle errors clearly lies with the method of collecting

responses.

73


INTRODUCTION

Considerable variability in absolute angular error has been found for studies of

localization reported in previous chapters and in ,the literature (e.g. Stevens &

Newman, 1936, ±14°; Wightman & Kistler, 1989, ±21°; Makous & Middlebrooks,

1990, ±1.5°-16°). A number of influential factors have·been identified in the process

of ascertaining angle error, such as the use of pinnae, the stimulus type and whether

sounds are simulated or freecfield. However, one important and unspecified variable

involved in absolute localization is the point at which the task becomes absolute, as

opposed to relative. Since relative judgements are more accurate (e.g. Mills, '1958;

Perrott, 1984) this may be a confounding factor.

Absolute localization is judging the position of a single sound source, without using

any cues from a previous sound. But the critical point at which sounds cease to be

relative is not established, since published studies either use very diHerent

interstimulus times or fail to report it at all. Studies discussed in earlier chapters have

used either a 4 or 5 second interstimulus delay - adequate time for the subject to

respond, yet brief enough to hold their attention. However, it may be that even a 4

second interval is too short for one sound not to affect the response to subsequent

sounds.

The present study was motivated by the original attempts to establish localization

accuracy in terms of information theory (see Chapter 5). Miller (1956) asserted the

accuracy of information theory for establishing absolute judgement channel capacity.

Information analysis, according to Miller, is unaffected by memory span or retention

interval - factors which may make absolute judgement indistinguishable from

relative or 'paired associate' tasks. Absolute judgement, he argued, is limited by the

amount of information in a stimulus, whereas memory span is affected by the number

of items. Thus, information theory is unaffected by practice effects.

However, Siegel and Siege! (1972) criticise Miller's view. They demonstrate that

judgement accuracy is not limited by the amount of information in a signal, but by

failure of subjects to hold several successive stimuli in memory. Thus if signals are

closely spaced, the information transmission will increase. This shows a failing of

information theory studies in terms of the contamination of the measurements by the

memories of recent judgements. The aim of this experiment was therefore to study

74


this effect by introducing variable delays ·that should affect memory to a different

extent.

Another influence whose effect has not been measured, is the method of eliciting

subject responses. Published studies have used a number of methods such as head

pointing (Makous & Middlebrooks, 1990), reporting target co-ordinates (Wenzel et

al, 1993) and naming speaker positions {Stevens'& Newman, 1936). All of these

studies have reported very different angle error values. Experiments discussed in

chapters 3 to 5 have. used a mixture of categorical (forced choice) or non-categorical

(a blank diagram for subjects to mark with no guidance as to target locations)

methods. A comparison of these studies alone shows a tendency for categorical

response methods to produce lower angle errors.(approximately ±10-16°, Chapter 3)

than studies using non-categorical techniques (±26° in chapter 5). Although Chapter

4 used a categorical method and produced a mean error of ±19°. Evidently, these

studies are not directly comparable and a controlled comparison is required.

Hake & Garner (1951) used an information analysis approach to study the effects of

discrete and continuous scales on error rates. They found that subjects who

responded by identifying discrete steps made fewer errors than those given a free

choice, particularly when the number of possible choices was small (in the order of

five).

This experiment will use five sound sources, located in the right frontal azimuth

plane. Subjects will either be provided with a forced-choice (categorical) response

method or a non-categorical method - where no guidance as to target angle will be

given. It is expected that categorisation will yield significantly lower angle errors

than non-categorical judgements.

The interstimulus delay times used will reflect a range that is just adequate to allow

subjects to respond and as far apart as maintained subject attention allows. The

briefer the interstimulus duration, the more likely it is that the memory of the previous

sound will remain and constrain subsequent judgements. The stimulus heard by

subjects will be a I-second white noise burst, since pure tones, as used by Perrott

(1984) and Mills (1958), are more difficult to judge by virtue of only having a single

frequency component, rather than a broad range.

75

Chapter 7: Effect of lnterstimulus Delay and Response Method on Localization Accuracy

METHOD

Subjects

Ten subjects were taken from an opportunity sample. All were undergraduate

students, 6 male and 4 female.

Design

The stimulus recordings were made at 5 azimuths in the right hemisphere (0°, 23°,

45°, 68°, 90°) all at 0° elevation. Subjects heard .the recordings in sequences of 25

stimuli (5 repetitions of the sound at each location). The interstimulus delay times for

these sequences were 2, 5, 8 and 12 seconds. Four sequences in total were presented

in a fixed quasi-random order and this enabled each listener to begin at a different

point in the sequence. In addition, the order in which sequences were presented was

varied. The ordering for sequences and stimulus starting points is shown below in

Table 7.1. .

Subject Number Sequence Order Stimulus Start Point

1 7 2s 5s 8s 12s 1 7 13 19

2, 8 5s 2s 12s 8s 2 8 14 20

3 9 8s 12s 5s 2s 3 9 15 21

4 10 12s 8s 2s 5s 4 10 16 22

5 2s, 8s, 12s, 5s 5 11 17, 23

6 5s, 8s, 2s, 12s 6, 12, 18, 24

Table 7.1: Starting points for all ten subjects for both sequence (different interstimulus

delays) and stimulus position within the sequence. Note that the ordering for 'sequence' is

a quasi-random selection taken from the full range of 4-faclOrial permutations. Thus, a

different interstimulus delay is used at least once as a starting sequence, then two randomly

picked sequence orders were added to make 6 in total - the number required to give the

full range of stimulus start points.

76


Procedure

Stimuli were played over headphones (Beyer Dynamic D 1 48) in a sound-attenuating

booth. Following instructions I ,subjects were either provided with a response sheet

and asked to mark the perceived location with a cross (the non-categorical method) or

they were asked to choose a location.(categorical method) (see Figures 7.1a, b & c).

Between each sequence subjects had a -brief break, during which time new response

sheets were provided.

I See Appendix 5D

78

~

° ~ ~

0 ~ ~

w

80

70

60

50

40

30

20

10

0


I 0

Errors by Target Angle for the Non-Categorical Response Method.

+-- I-

23 45

Target Angle (0)

68 90

--2 secs 1 --5 secs

8 secs __ 12 secs

-Jt- random response~

Figure 7.4: Errors by target angle for all interstimulus delay times for the non-categorical

response condilion . There were no statistically significant differences for response

accuracy of the targel angles within each inlerstimulus interval condition (ANOY Al .

Random response values for a full 360· range of possible responses are included to show

chance levels.

89


DISCUSSION

This study set out to investigate whether interstimulus duration had an effect on

localization accuracy. It was hypothesized that the shorter the interstimulus delay, the

more likely it was that judgements would be constrained (and thus made more

accurate) by the memory of previous stimuli. The results were divided into

categorical and non-categorical to correspond with subject response conditions.

Both conditions show an improvement for the 5-second interstimulus interval.

However, if this effect were the result of improved retention of previous stimuli, then

one would expect any interstimulus interval·below 5 seconds ·to show the same

pattern, since memory would only.be enhanced for more ·c1osely spaced sounds.

However, this pattern is not evident and the 2-second condition shows a drop in

judgement accuracy compared to the 5-second condition. It may be that whilst the 2-

second response condition does result in strong retention which would constrain

subsequent judgements, the response time is too short for subjects to be accordingly

accurate. Many subjects reported that there was insufficient time to record the

response and prepare for the next stimulus. Although all subjects responded to every

stimulus, it may be that subjects were forced to guess or make random judgements

fairly frequently, in order to keep up with the sequence.

What does show a clear effect, however, is response method. As Figure 7.2 shows,

there is a large improvement in the categorical (forced choice) method over the non

categorical method, which is maintained for each interstimulus delay time. This

supports Hake & Garner's (1951) findings that there are fewer errors with scales

using discrete steps. This is because the categorical method constrains judgements in

two ways. Firstly, by giving the subject an awareness of the range of source

locations. Secondly, errors that occur in a non-categorical method typically become

0° in a categorical method, since the subject is forced to choose the nearest (often

correct) category. By using this means of eliciting responses, potentially huge errors

are ruled out by automatically eliminating judgements that might otherwise be placed

well outside the range. Hence it's validity as a method of assessing true localization

ability is questionable.

The breakdown of judgements into errors by target angle (Figures 7.3 & 7.4), show

surprisingly different trends for the different response methods. The categorical

90


method shows increased accuracy at 0°. Sounds emanating from around the midline

typically show the greatest accuracy (Stevens & Newman, 1"936; Mills, 1958; Shelton

& Searle, 1978; Middlebrooks et ai, 1989; Makous & Middlebrooks, 1990). This is

because a fixed angular difference produces a larger interaural timing difference near

the midline (Gelfand, 1990). In addition, stimuli presented at 45° and 68°, which

show the largest errors in this study, lie within the cone of confusion where judgement

accuracy decreases.

However, Siegel & Siegel (1972) argue that stimuli at the ends of a scale are judged

more accurately than those in the middle. This phenomenon they term 'end point

effects' and they occur because subjects who perceive sounds off the extremes of the

scale will then attribute their judgements to the end points. Thus the end points have

a higher chance of being judged correctly and this may be the reason behind the

pattern showed by the categorical data in this study. However, the categorical method

used in Chapter 4 also showed the greatest accuracy to be at 0°, where 0° was not an

end point, but in the middle of the scale. It should also be noted that in this

experiment, only one end point shows a dramatic increase in accuracy. The 90° end

point shows values similar to those at 23° and so it rnay well be that the cone of

confusion is the major reason behind increased errors in the middle of the range.

What is clear from the inclusion of the random set of responses is that the categorical

data are not artifactual. A range of possible responses were randomly inserted that

either corresponded to the available category choices (categorised random responses)

or to any value within the target range of 0° - 90° (non-categorical random

responses). The similarity of these two curves shows that the values obtained are not

a facet of categorisation.

The non-categorical response method shows much higher angle errors than the

categorical response method. B ut whilst the values seem high, the graph shows that

the results are well below chance. In terms of where the greatest accuracy occurs,

almost the opposite findings to the categorised condition are displayed. These results

are puzzling, but one explanation for the increased error at 0° may be that several 0°

targets were judged to be to the left of the subject. Several subjects made judgements

between 270° and 359°, despite the fact that no sources were located in this region. A

270° judgement constitutes an error of 90°, which when made more than once or

twice per subject, produces a large increase in the overall error for the 0° target

position.

91


These misplaced judgements are perhaps due to pr-esenting the sounds in one

hemisphere only. The ·right ear, receiving all the direct sound becomes somewhat

saturated, whilst the left ear, receiving no direct 'sound, becomes over-sensitive. Thus

when a sound is presented in the midline, the sound is artificially 'shifted' over to the

left because of an imbalance between the two ears. A second possible 'reason may be

response bias. Subjects may ·feel uncomfortable placing all judgements to'one side,

and so expect some sounds over to the left. Sounds located at 0° are the-closest to

these expectations and thus are placed incorrectly by the listener (NB: no stimuli other

than 0° sources were placed in the left quadrant).

The manipulation of interstimulus interval in this study has failed to reveal a clear

point below which judgement accuracy increases substantially. Therefore, memory

did not appear to be involved in the response to current stimuli, but part of the

explanation for the high angle errors clearly lies with the method of collecting

responses. The introduction of discreet categories constrains error both by limiting

the subject's response range and by informing subjects of the actual target locations.

The ambiguity in judging 'virtual reality' sounds is clearly reduced by categorisation

and so it is a factor that should be considered when attempting to establish 'true'

localization ability.

92

Chapter 8: Effect of Stimulus Type and Response Method on Judgement Accuracy

CHAPTER 8

The Effect of Stimulus Type and Response Method on Judgement Accuracy.

ABSTRACT

This experiment firstly attempts to evaluate the most effective stimulus for

localization tasks by comparing three different stimulus types, and secondly, two

different methods of eliciting subject responses are examined.

Subjects were required to localize either clicks, white noise or speech in one of three

response conditions. The first was an unguided method where subjects were given

blank diagram with 10° markings around the outside. The second used the same

diagram but subjects were seated in front of a marker strip, with 10° azimuth

markings that matched 10° markings on the response sheet. This was to establish

whether there was difficulty in relating 3-dimensional perception to a 2-dimensional

response sheet. The third was a categorical method, where subjects were forced to

choose from a number of specified locations.

The speech stimulus and clicks ·produced the greatest accuracy for azimuth, showing a

statistically significant (p$0.05, related t-test) improvement over white noise.

However, no significant differences were found for elevation, where subjects found

the task in general considerably more difficult.

The categorical response method gave large improvements in judgement accuracy for

all three stimulus types in the horizontal plane. There was little effect ·found for

elevation. The lack of any effects for elevation judgements for either stimulus type or

response method, seem to indicate that subtle elevation cues are being lost. The

93


overall task difficulty may be overriding any other effects, such as response method

or stimulus type. Nevertheless, for azimuth judgements, the means of eliciting

response and stimulus type are clearly important in determining localization accuracy.

94


INTRODUCTION

Experiments have shown that a listener's ability to localize a sound source depends on

a number of factors. In previous (unpublished) work, we found that the overall mean

angle error can range from ±8° to ±27° in different experiments. Response method I

seemed to play an important part. However, since previous studies confounded

different response methods with different stimulus types, error values are difficult to

evaluate.

Localization studies using pure tones are therefore thought to produce the greatest

error, since there is only one frequency present. Depending on that frequency, either

interaural ·timing differences (ITD's) or interaural level differences (lLD's) are

effectively utilised, but not both. Noise or speech, on the other hand, comprise

several frequency components (or all), allowing both ITD's and liD's to play a part in

the localization process and so increase judgement acuity.

Makous & Middlebrooks (1990) looked at free-field localization using broadband

sounds which varied in both the vertical and horizontal planes. Subjects reported the

location of the sound source by orienting their head towards it. The angle errors

obtained were between 1.5° and 16.3° when averaged across all six subjects.

Begault & Wenzel (1991) used a speech stimulus that was filtered using

nonindividualized Head Related Transfer Functions (HRTF's). They asked subjects

to identify azimuth and elevation location by reporting spatial co-ordinates (e.g. "up

30, right !O"). They found the mean angle error to be ±27° for all subjects. They

compare this value to their earlier study (Wenzel et ai, 1991) where broadband noise

produced approximately the same mean angle error's. Whilst these error values are

much larger than those of Makous & Middlebrooks, there is some evidence

(Freedman & Fisher, 1968) that head movements increase accuracy, a variable

eliminated using Wenzel's technique.

Wenzel et al (1993) demonstrated that free-field and headphone listening were

comparable, and that both sets of data produced mean angle errors were much greater

than those obtained by Makous & Middlebrooks.

I See chapter 7.

95


HRTF's, therefore, have not proved to be an effective means of simulating free-field

listening. The use of a 'KEMAR manikin may replicate the free-field situation more

accurately by removing the extra step of deriving the stimulus - a complex variable

that could be contributing to the large errors when using HRTF's. Furthermore,

dummy head recordings eliminate head movements, making it possible to investigate

the location information derived from spectral information only.

Stimulus type also appears to influence localization accuracy. In earlier chapters a

number of different types of stimuli have been used in different experiments. Chapter

3 used clicks and a mean error of ± 190 was obtained using a category system for

subject responses. In chapter 4, white noise bursts were used in the azimuth and

elevation planes, with mean angle errors of ±24-27° for azimuth and ±26-30° for

elevation with an unguided response method.

It seems unclear which of the broadband stimuli; clicks, speech or white noise might

truly provide the most effective cues for localization, in either plane. Although, if

familiarity plays a part (as demonstrated by Coleman, 1962, in localization distance

judgement tasks) then one might expect that speech would give the greatest accuracy.

However, white noise should logically give the greatest accuracy, since its full

frequency spectrum offers a greater opportunity for the auditory system to analyse the

reflections and refractions caused by the pinna and the environment, of a flat

spectrum sound.

To address this problem, the following experiment uses a controlled comparison of

stimulus types. A I-second white noise burst, a click and a speech stimulus - ~chips'

(chosen for its broad-spectrum components) will be compared in azimuth and

elevation dimensions. The experiment also used three different response methods;

(1) categorical, (2) unguided (no indication of stimulus locations) and (3) unguided

with an azimuth judgement aid.

The judgement aid used in the third method was a marker strip put in the booth

around the subject at eye level. . The strip had equally spaced measurements on it

allowing subjects to refer to marked points on their response diagram. Using the

marker strip would ascertain whether or not subjects have difficulty mapping 3-

dimensional perception of the sound to a 2-dimensional response sheet. If use of the

marked judgement aid produced significantly fewer errors then it would suggest that

listeners need some point of reference between the booth surroundings and their

96


response diagram. An equivalent .marker strip for elevation was not incorporated at

this stage, since the necessary dimensions would have been more awkward for the

subject to view and so be less helpful. If the marker strip proved to be effective, a

second phase would be run with an elevation judgement aid included.

97


METHOD

Subjects

An opportunity sample of 27 undergraduate and postgraduate students were used, 20

males and 7 females. Ages ranged from 18 t046.

Design

2 experiments were presented:

1. Azimuth

2. Elevation

Within each experiment were the following conditions:

a) 3 response methods

b) 3 stimulus types

c) 7 locations (2 repetitions of each).

Each experiment consisted of three trials which were all recorded using a different

sound stimulus. The stimulus was either white noise, a click, or speech - the word

"chips". The stimulus recordings were made using a KEMAR manikin with

microphones placed at the internal meatus entrance. These were made at 7 azimuths,

all at 00 elevation, and were spaced 300 apart between 00 and 180. Elevation

recordings were made at 00 and 450 azimuth at seven locations lying between _45 0

and +900 at 22.5 0 intervals.

Stimuli were randornised within each trial and presented to subjects as two separate

experiments in quasi-random order.

Response Methods

All subjects listened to the stimuli through tubephones, but were divided into three

response groups.

1. Non-categorical (marker).

For the first condition, subjects were given a blank response diagram with lO°

markings around the circumference (see Figures 8.1 a & b), which

98


corresponded to 10° markings on the judgement aid erected in the booth. This

allowed subjects to associate what they might perceive 'in space' to the

diagram in front of them. No information about stimulus target location was

given in this condition.

2. Non-categorical (no marker).

The second group was given the same diagram, but no judgement aid was

used. Again, no stimulus target locations were indicated.

3. Categorical.

Stimuli

The final group used a categorical response method, which involved choosing

a response position from a number of specified (actual) target locations

provided diagrammatically (see Figures S.2a, b & c).

The stimuli were a I second whitc noise burst, a click and a speech stimulus. The

click was generated using a Masscomp Computer2 and played through a speaker

(Radio Spares Wide Range 6"). The white noise stimulus was produced using the

Sussex Synthesizer software on an Apple Macintosh computer. The speech stimulus

was a recording of a male voice reciting the word "chips". Microphones (Breul and

Kjrer 4134, OS') were placed at the inside end of a Zwislocki coupler, at the eardrum

position of a KEMAR manikin, which was fitted with (nonindividualized) pinnae.

For azimuth, all stimulus sounds were recorded using a ring, 3m in diameter, with the

7 speakers (Radio Spares, SQ, 3") placed around it in the specified locations. The

manikin was placed in the centre of this ring, with the speakers at its ear height.

Elevation recordings were made using a single speaker (Radio Spares Wide Range 6")

attached to a wooden pole at a distance of 1.5m. This was pivoted at the manikin and

rotated to the relevant position.

2 The stimulus produced by the Masscomp was had a non-flat spectrum. When played through a

speaker. the stimulus produced was flat-spectrum.

99


Each set of stimuli (white noise, clicks and speech, for azimuth and elevation) were

digitally recorded onto digital audio tape and then edited using "Audiomedia". T-he

stimuli were divided into type, and sub-divided into azimuth and elevation trials, and

a 6 interstimulus delay, using 'room silence', was added. The trials were then

randomised and re-recorded on the Betamax cassette. Stimuli were then played back

to subjects over tubephones (Etymotic ER-2).

Procedure

Subjects listened to the stimuli whilst seated in a sound attenuating booth. Each

subject was provided with response sheets corresponding to their response group (see

Figures 8.la & band 8.2a, b & c) and given instructions (see Appendices SE. I &

SE.2). Each subject began at a different point in the experimental trial sequence.

Thus subject I would start with azimuth and one of the three stimulus types. Subject

two would begin with elevation and a different stimulus type3. Between each of the

three azimuth trials (one for each stimulus type) and the three trials for elevation, the

experimenter entered the booth and gave the subject a new response sheet (6 in total).

This was intended to reduce practice and boredom effects.

3 See Appendix 4 for trial sequences.

lOO


Subject ......................... .

Sequence ..................... .

Trial ............................. .

350 o 10

320 40

310 50

270 o 230

220

190 180 170

Figure 8.1a: Blank response diagram for azimuth with 10° markings around the

circumference. For subjects in the non-categorical condition. either with or without the

judgement aid (marker strip).

101


Subject ......................... .

Sequence

Trial ............................. .

100 90 80

130

140 40

-140 -40

-50

-100 -90 -80

Figure 8.1 b: Blank response diagram for elevation, with 10° markings. This was the

response sheet provided for the non-categorical response condition.

102


Azimuth Positions

o

o 90

150

180

Figure 8.2a: Diagram of stimulus locations for azimuth. Each subject was provided with

this guidance diagram for reference throughout the categorical condition of the experiment.

103


Elevation Positions

90 68

45

-45

Figure 8.2b: Diagram showing the elevation stimulus locations. This diagram was provided

throughout the categorical response condition for reference.

104


Subject ........................ .

Sequence .................... .

Trial ............................ .

Stimulus I Response

1

I

2

3

4 I 5

6

7

8

9

10

11

12

13

14

Figure 8.2c: Response table used in conjunction with the categorical response condition.

The same sheet was provided for all azimuth and elevation trials (6 in total per subject).

105


RESULTS

Overall mean angle errors were calculated and corrected for front-back errors, for the

three stimulus types and for each different response method (see Tables 8.1 a & b

below). The categorical method of response produced a considerably lower error rate

than either of the non-categorical methods (significant at p~O.05 for all three stimulus

types, using an unrelated t-test) for azimuth judgements, but not for elevation.

To establish the effect of having knowledge of the speaker locations, responses in the

non-categorical condition were grouped into categories identical to those used in the

categorical method (see Tables 8.2a & b).

However, there is very little difference between the stimulus types. This is true for all

but one condition. Azimuth judgements using the non-categorical response method

(with no judgement aid) show a statistically significant increase in error when judging

noise, compared to clicks or 'chips'. All other differences between stimulus types are

minimal and not statistically significant using related t-tests.

106


The subjects weFedivided into two groups, each of which heard identical sets of

stimuli, but who were given different methods of response. The first group used a

categorical system, which comprised a forced-choice from a diagrammatic

representation of the actual -target positions (see Figure 7.la). Letters were attributed

to each of the 5 target positions and these were judged and noted down on a grid by

the subject (see Figure 7.lb). The second group were given a blank diagram

representing a 3600 horizontal plane surrounding the subject (see Figure 7 .lc) and

·listeners were asked to mark a cross at the perceived location of the sound source.

Hence in this non-categorical response method, no guidance was given as to the range

or exact positions of the speakers.

Stimuli

A I second white noise burst, with cut-off frequencies of 0 and 19KHz, was

generated using the Sussex Synthesizer package on an Apple Macintosh 11.

Sounds were recorded in a normally reverberant room using microphones (Breul and

Kjrer 4134, OS'), which were placed at the internal meatus entrance of a KEMAR

manikin, using a Zwislocki coupler. Stimuli were played in the frontal azimuth plane

through a speaker (Kef 8") placed on a speaker stand at ear height of the manikin.

This was moved around the manikin at a fixed distance of 1.5m, to the specified

locations.

The interstimulus delay comprised 'room silence' - identical to that in which the

stimuli were recorded, such that the ambient sound on the tape remained the same for

the duration of the trial. This was done by simply recording the quiet background

noise in the recording room for IS seconds, using the same manikin set-up as the

stimulus recordings.

The 25 stimulus sounds were digitally recorded onto a betamax cassette then these

recorded stimuli were transferred to the Audiomedia sound editing software on a

Apple Macintosh 11 computer. The sounds were edited to produce four playlists, each

with· different interstimulus intervals.

77

Chapter 7: Effect of Interstimulus Oelay and Response Method on Localization Accuracy

Front I

I

()

Back

Figure 7.1a: Blank response diagram given to subjects for the non-categorical response

condition. Subjects marked a cross on a separate diagram for each sound heard. The

diagram is actual size.

A

E

Figure 7.1 b: Guidance diagram (half actual size) provided to subjects in the categorical

response. condition. Subjects used the diagram in a forced-choice paradigm.

79


Subject ........................ .

Sequence .................... .

Start No ...................... .

Stimulus I Response 1

2

3

4

5 ~

,. 23

24

25

Figure 7.lc: Response sheet used in conjunction with the guidance diagram (Figure 7.lb).

Subjects indicated a response letter from the guidance diagram next to each stimulus

number. A new response sheet was provided for each interstimulus delay sequence.

80


RESULTS

Angle errors were calculated for all subjects and ·for the non-categorical response

condition, these values were front-back corrected2• The mean overall errors for the

non-categorical method were ±20° (front-back corrected) and for the categorical

method were ±8.2°.

When response method is compared, there is a large difference, as shown in Figure

7.2. The values for the non-categorical method were typically more than double those

of the categorical method - a statistically significant result (p:'>O.O I, ANOV A) for all

interstimulus delay times (see Tables 7.la-e {or ANOV A tables showing all

interstimulus intervals combined and· each interstimulus interval separately).

Figures 7.3 & 7.4 show mean angle errors averaged to give an overview of values for

each sequence (interstimulus delay time) at each target angle, for both the categorical

and free response methods. The two response methods show very different patterns

of judgement error, with the categorical condition showing increased accuracy at 0°

and 90° and the free condition giving the greatest accuracy between 45° and 68°.

Chance errors were added by generating a random range of possible response values

within the Excel analysis spreadsheet. Ten sets of random values were calculated and

the average taken. The results are superimposed onto each graph of judgement error

by target angle for the different response methods (see Figures 7.3 & 7.4).

2 See Chapter 3 "Methodologies"

81


Factor Levels Values category 2 1 2 angle 5 1 2 3 4 5 interval 4 1 2 3 4

Analysis of Variance for ISI

Source DF Seq SS Adj SS Adj MS F P category 1 8602.8 8602.8 8602.8 88.58 0.000 angle 4 637.9 637.9 159.5 1. 64 0.165 interval 3 417.0 417.0 139.0 1.43 0.235 Error 191 18549.4 18549.4 97 .1 Total 199 28207.1

Means for ISI

category Mean Stdev 1 8.224 0.9855 2 21. 341 0.9855

angle. 1 18.130 1.5582 2 14.908 1. 5582 3 13.496 1.5582 4 14.274 1.5582 5 13 .105 1.5582

interval 2-sec 16.332 1.3937 5-sec 16.090 1.3937 8-sec 13.082 1.3937 12-sec 13.626 1.3937

Table 7.1 a: Analysis of variance table for all interstimulus intervals combined showing the

differences between different response methods. A statistically significant improvement is

found for the different (category) response methods (shown in bold type).

82


Factor Levels Values category 2 1 2 angle 5 1 2 3 4 5

Analy§is of V§,riance for 2-se~

Source DF Seq SS Adj SS Adj MS F P category 1 3424.61 3424.61 3424.61 43.78 0.000 angle 4 151.11 151.11 37.78 0.48 0.748 Error 44 3442.17 3442.17 78.23 Total 49 7017.89

Means for 2-sec

category Mean Stdev 1 8.056 1.769 2 24.608 1.769

angle 1 16.500 2.797 2 18.580 2.797 3 13 . 540 2.797 4 15.480 2.797 5 17.560 2.797

Table 7.lb: Analysis of variance table for the 2-second interstimulus interval condition. A

statistically significant difference (bold type) is found for response method (category) but

not between the different target locations.

83



Analysis Qf Variance for 5-ses;:;

Source DF Seq SS Adj SS Adj MS F P category 1 1958.1 1958.1 1958.1 11.93 0.001 angle 4 263.0 263.0 65.7 0.40 0.807 Error 44 7221.1 7221.1 164.1 Total 49 9442.2

Means for 5-sec

category Mean Stdev 1 9.832 2.562 2 22.348 2.562

angle 1 20.060 4.051 2 15.980 4.051 3 16.420 4.051 4 13.090 4.051 5 14.900 4.051

Table 7.lc: Analysis of variance table for the 5·second interstimulus interval condition. A

statistically significant improvement (bold type). for the categorical response method was

noted.

84

Factor category angle

Analysis

Source category angle Error Total

Means for

category 1 2

angle 1 2 3 4 5


Levels 2 5

Values 1 1

2 2 3

of Variance for 8-sec

DF Seq SS Adj SS 1 1791. 61 1791. 61 4 538.42 538.42

44 2942.80 2942.80 49 5272 . 83

8-sec

Mean Stdev 7.096 1. 636

19.068 1. 636

17.460 2.586 12.690 2.586 13 _ 580 2.586 14.320 2.586

7.360 2.586

4 5

Adj MS F P 1791. 61 26.79 0.000

134.61 2.01 0.109 66.88

Table 7.ld: Analysis of variance table for the 8-second interstimulus interval condition. A

statistically significant difference (bold type) was found for response method only.

85



Anal:::z:::sis of Variance for 12-sec

Source DF Seq SS Adj SS Adj MS F P category ~ ~632.49 ~632.49 ~632.49 ~7.7~ 0.000 angle 4 368.15 368.15 92.04 1.00 0.419 Error 44 4056.48 4056.48 92 .19 Total 49 6057.12

Means for 12-sec

category Mean Stdev 1 7.912 1.920 2 19.340 1. 920

angle 1 18.500 3.036 2 12.380 3.036 3 10.445 3.036 4 14.205 3.036 5 12.600 3.036

Table 7.1e: Analysis of variance table for the 12·second interstimulus interval condition.

Statistically significant improvements (bold type) were found for the categorical response

method.

86

~

0 ~

~

0 ~ ~

w


35

30

25

Angle Errors of Categorical and Non-Categorical Response Methods

20 ---- Categorical mean error

1 5 ---- Non-Categorical

I---I e rro r

10 f 1 5

0

2 secs 5 secs 8 secs 12 secs

Interstimulus Delay

Figure 7.2: Chart showing the mean angle errors of the categorical and non-categorical

response methods, broken down into interstimulus delay time. Statistically significant

differences exist (p~O.OI. ANOVA) for all interstimulus delay times between the two

response methods, as shown by the ±2 standard error bars. There are no significant

differences between the different interstimulus intervals within each response condition.

87

mean

80-

70

60


Errors by Target Angle for the Categorical Response Method.

--2 Secs

--5 Secs

-:--8 Secs ;;- 50

i::~ ~ lIC~_::::=---lIlIEE--=;:oo_--qllC~/' --12 Secs

~categorical random responses

110 -1IC---- ~ 20

1~ .~_--+I ___ :I-_~_~_:? o 23 45 68 90

Target Angle (')

-X- non-categorical random responses l __ _

Figure 7.3: Errors by target angle for each of the interstimulus delay times for the

categorical response condition. No statistically significant differences were found for

judgement accuracy of each target angle within each interstimulus interval condition

(ANOY Al. Random response values (0' to 90° range) are given for the categorical and

non-categorical response conditions to illustrate chance levels.

88

Chapter 8 Effect of Stimulus Type and Response Method on Judgement Accuracy

Non-Categorical (marker) 24 . 1 Non-Cat~gorical (no marker) 22 .1

24.8 20.2

25.1

27.4 24.7 23.2

15 . 0

Table 8.1 a: Summary of overall mean angle errors for front-back corrected azimuth resulls.

There are 9 different subjects in each condition: non-categorical (with reference marker

strip in the sound booth, but for azimuth only), non-categorical (with no reference marker

strip) and categorical

Non-Categorical @zim. marker only) 45.9 Non-Categorical (no marker) 46.2

B·3 43.5

42.1

43.2

46.4 44.3 43.3

Table 8.1 b: Overall mean angle elTors for elevation trials. The subjects are the same in each

condition as those in the azimuth trials. Responses are corrected for front-back errors . The

unusually large angle errors are little better than chance.

107

Chaptet 8 . Effect 01 Shmulus f ype anu Response Method on Judgement Accuracy

a)

b)

Tables 8.2 a & b· Mean angle errors for the non-categorical condition showing 'actual' anti

'grouped ' responses . 'Grouped responses' are mean errors of the actual non-ca tegorical data

when banded into categories identical to those lIsed in the categorical method.

108

Chapter 8: Effect of Stimulus Type and Response Method on Judgemenl Accuracy

DISCUSSION

This experiment looked at the difference in localization accuracy of three stimulus

types; white noise, clicks and speech. This was done using 3 different response

methods; non-categorical, categorical and categorical with a response-maping guide

in the horizontal plane. Tables 8.1 a & b show the mean angle errors for the 3

stimulus types in each different response condition.

For azimuth judgements, clicks and speech produced significantly fewer errors than

white noise. For elevation, however, white noise produced the greatest accuracy,

although the difference was very small. This was surprising, since subjects reported

clicks to be more substantially more difficult to .Iocalize than either white noise or

speech. In fact, speech was deemed to be the stimulus most easily localized for both

the vertical and horizontal planes, although these reports were not upheld by the

numerical results.

For azimuth, the categorical response method produced statistically significant

improvements for all three stimulus types (clicks, noise and speech) when compared

to the non-categorical response method (for front-back corrected data). This result

was as expected. For the non-categorical method subjects had no indication of the

target locations and were given a free response range of 3600• Indeed, in this

condition subject placed some stimuli well outside the true range, increasing the mean

angle error considerably. With the categorical method, subjects were aware of the

target locations and their response choices were confined to those locations.

Therefore large errors from judging outside the actual range of speakers does not

occur.

The effect of having knowledge of speaker positions can be tested by grouping the

non-categorical data into categories identical to those used in the categorical method.

Thus, non-categorical responses are grouped into 300 bands that fall symmetrically

about the target location. This should reduce the mean error values since categories

'pull in' outliers to a fixed, correct response. However, in this case, there was only a

10 improvement (see Tables 8.2a & b) to the overall mean angle errors for all three

stimulus types. This clearly demonstrates that responses in the non-categorical

condition were off-target by more than half a category width, and therefore fell

109


outside the 'correct' category grouping. Thus, prior knowledge of the target positions

can almost halve the error rate.

Elevation judgements were not improved using the categorical method. Thus, the

lack of improvement when the non-categorical responses were grouped into

categories was not surprising, as it had been for azimuth. The angle errors for

elevation were large and subjects clearly have difficulty in distinguishing the

individual positions - a fact that seems to far outweigh the method of response. For

elevation judgements, there are no interaural time and intensity cues that provide

strong location information. We rely more heavily on pinna cues for judging

elevation - a finer and less robust cue - which may be more easily affected by

other variables. A visual correlate or loss of information in the recording process may

have weakened the available elevation cues.

The marker strip was expected to enhance the accuracy of judgements (in the

horizontal plane), since it should enhance the relationship of 3-dimensional perception

to a 2-dimensional response sheet. It seems reasonable to assume that perceiving a 3-

dimensional sound and attempting to accurately pinpoint its location is not well

served by providing subjects with a 2-dimensional response diagram. Problems may

arise when trying to relate an 'immersive' perception to a 'gods-eye' judgement.

Hence the judgement aid was erected at eye level such that subjects could locate the

sound source, attribute a location to it (using the markings on the strip) and then

correlate that with the appropriate point on their response diagram. However, there

was no difference in error between using the marker strip or not, which argues against

the hypothesis that listeners have problems matching perception with response.

This study has shown that for azimuth judgements the shorter duration stimuli

produced the greatest accuracy. Subject head movements may be confounding longer

signals (noise), thus confusing the subject and causing an increase in error. For

shorter stimuli, such as clicks, the stimulus is too brief for head movements to have

any noticeable effect. For elevation, however, no such effect was noted and here it is

likely that the overall task difficulty has overridden more subtle effects such as

interstimulus differences. However, these effects may also be combined with the fact

that noise provides only spectral information - useful only for elevation. This also

explains the increase in error for azimuth judgements but not for elevation

judgements.

110


In the horizontal plane, the substantial improvement in accuracy when UStng a

categorical response method clearly reduces much of the ambiguity inherent in

localization tasks. Indeed, mean angle errors are almost halved using this technique

- a finding which could extend to the large error values reported in similar published

studies (e.g. 8egault & Wenzel, 1991; Wenzel et ai, 1991) and which could explain

the small errors reported by other studies (e.g. Stevens & Newman, 1936; Shelton &

Searle, 1978). Thus it is questionable whether using a categorical system really tests

the ability of the auditory system to resolve spatial ambiguity. This makes response

method an important factor when determining localization accuracy for use in an

'open-field' virtual interface.

III

Chapter 9: Live Relay using the KEMAR Manikin

CHAPTER 9

'Live' Relay using the KEMAR Manikin.

ABSTRACT

Presenting pre-recorded sounds made using a KEMAR manikin has not produced the

localization accuracy of some reported free-field studies (e.g. Makous &

Middlebrooks, 1990; Stevens & Newman, 1936). A number of variables have been

investigated in an attempt to reduce the consistently large angle errors obtained,

although these refinements have had little effect.

One major factor not yet examined is the recording process. The steps involved in

creating a pre-recorded play list may cause some information to be lost, despite the

fact that high-fidelity digital recordings are made.

The following experiment compares a 'live' relay of stimuli presented in the horizontal

plane, with recorded presentations of these live trials. Stimuli were played to the

manikin and subjects heard these sounds as they were presented, but through

tubephones channelled into a sound-attenuating booth. It was expected that a direct

relay of the sounds, cutting out the recording stage, would increase the judgement

accuracy. The pinna, whose role for judgements in the horizontal plane is unresolved,

was also investigated. This was done by presenting the trials either through

nonindividualized pinnae, modelled on a listener, or no pinnae (infills).

The mean angle errors were similar for both the live and recorded conditions; ±24.7"

and ±19.9° respectively, when corrected for front-back errors. The difference was not

statistically significant (pS;0.05, related Hest). These values are high, but consistent

with previously reported experiments. It is clear from these results that the recording

112


INTRODUCTION

Studies reported in earlier chapters have identified a number of variables that may be

influential (to a varying extent) in the error values obtained in localization

experiments. These have included response method, stimulus type, relationship

between recording and playback location, and whether or not pinnae are used.

Although the experiments have been successful in identifying such factors as

influential in the localization process, there are still variables that remain unidentified.

It is the lower angle errors reported by some studies that has brought about an

awareness of other variables which are likely to be playing an important role.

However, those studies presenting lower error values use different methodologies,

which in themselves have helped to highlight where these other contributors may lie.

For example, the studies by Makous & Middlebrooks (1990) and Stevens & Newman

(1936) that report overall mean errors of±9° and ±14° respectively, are both free-field

studies, where head movements are allowed, thus implicating them as contributory

factors.

However, it is first necessary to examine all other variables that may be playing a part

in judgement accuracy, before moving on to free-field studies to incorporate head

movements or indeed, vision. One other possible contributor within a pre-recorded

method is the recording process. Many free-field studies give markedly smaller errors

than the results obtained in previous chapters. Perhaps important information is lost

by recording the sounds and playing them back from a tape, despite the fact that they

are hi-fidelity digital recordings.

The aim of this experiment was therefore to conduct a live relay through the manikin

such that subjects heard stimuli as they happened, and not from a pre-recorded tape.

The sounds were also presented in a richer auditory environment by including some

low level background sounds in an attempt to establish whether this aided

localization. However, to ensure that any observed improvement was a direct result

of cutting out the recording stage, and not the addition of background noise, the live

trials were also digitally recorded and presented from a tape as a comparison - both

with the live trials and with previous studies.

The majority of sounds chosen will vary from those used in previous experiments to

investigate a wider variety of complex sounds. But the sounds will also differ from

114


each other, such that the subject can distinguish each sound. Since there will be

background activity the stimuli must be readily identifiable and of sufficient duration

to attract attention and allow time for judgements to be made. This would give a

more realistic portrayal of everyday listening, since sounds vary in type as well as

location amidst other sounds.

In Chapters 4 and 6 the role of the pinna was investigated and found to have little

influence in the discrimination of sounds varying in azimuth. However, some

reported studies (e.g. Batteau, 1967; Freedman & Fisher, 1968) assert that the pinna is

functional in all dimensions. A pinna versus no pinna comparison is included in this

study to serve as further clarification of its function in the horizontal plane.

liS


METHOD

Subjects

Nine undergraduate students (age range 18 - 32) were used, 4 males and 5 females.

All were inexperienced listeners and reported nonnal hearing.

Design

Subjects heard stimuli relayed live through a KEMAR manikin in two conditions;

recorded versus live. These conditions -comprised 2 -identical sets of stimuli which

were relayed with and without pinnae. The pinna and no-pinna trials were given as

separate blocks and the blocks were presented in a counterbalanced order across

subjects. The stimuli were played in the following fixed randomised locations in the

horizontal plane:

I. 1800 - Metronome (4 clicks, total duration approximately 1.5 seconds)

2. 81 0 - Hand Claps (2 claps, duration approximately I second)

3. 341 0 - Xylophone (2 strikes, duration approximately I second)

4. 2580 - Paper tearing (duration approximately 2 seconds)

5. 00 - Male speech - the word "Chips" (duration 1.4 seconds)

6. 1280 - Bunch of keys rattling (duration I second).

Stimuli

Six different stimuli were used; metronome clicks, hand claps, xylophone strikes,

paper tearing, speech (the word "chips") and rattling keys. These were presented at

the locations specified in the Design, at a fixed distance of 1.5m from the manikin.

The stimuli were presented in one of two conditions:

I. The main condition was a live perfonnance of each of the stimuli in turn at

various positions around a KEMAR manikin. The stimuli were played in a

nonnally reverberant large room which was a working laboratory. This involved

people typing, doors opening occasionally quiet talking or whispering, printer and

116


computer noise and general movement of the four occupants around the room.

The manikin was placed at the centre of the room and the six stimuli were played

around the manikin at equal distances from it. The manikin had microphones

(Breul and Kjrer 4134, OS') placed at the eardrum location which were held in

place using Zwislocki Couplers. The microphones were fed through a Breul&

Kjrer power supply and pre-amp (Rote I RC-850) to an amplifier (Marantz PM-

45). From the amplifier, tube phones (Etymotic ER-2) were used to feed the sound

into the sound-attenuating booth to the subject, who was listening to the stimuli

live as they were played to the manikin.

2. For pre-recorded condition a selection of these live trials were recorded using a

Digital Audio Tape (DA T) player (Sony TCD-D7) which fed off the pre-amp.

Thus a selection of trials that might have subtly different background or

presentation characteristics were available for pre-recorded presentation.

Procedure

Subjects were seated in the booth and following the instructions l , they were played

examples of all six stimuli that had been pre-recorded onto digital audio tape in a

normally reverberant quiet room. This recording was to familiarise subjects with the

stimulus sounds so that there would be no confusion about the descriptions provided

about each sound in the instructions. Subjects were then provided with two response

sheets (see Figure 9. 1) - one for each pinna condition.

For those in the live condition the six sounds were played by the experimenter around

the manikin in the specified order with the aid of the male laboratory technician to

speak one of the stimuli (the word "chips"). This procedure was for the first pinna

condition and then after 30 seconds the experimenter said: "the second phase will

begin in 10 seconds". This was the listener's cue to prepare for the second set of

stimuli (different pinna condition) and mark their responses on the second response

sheet.

For trials that were being recorded, the DA T player was started after closing the booth

door and was left running until the subject had left the booth. This ensured that

I See Appendix SF

117


subjects in the.pre-recorded condition experienced an identical situation to those in

the live condition. Thus subjects in the recorded condition are unaware at the outset

that they were listening to a recording of a live situation. All subjects were fully

debriefed after the experiment.

o

Figure 9.1: Response diagram provided to subjects (I for each pinna condition). The square

represents the environment/room in which the stimuli are played, viewed from above. The

head shows the manikin's position at the centre of the room. The dimensions are not to

scale and no furniture or fittings are shown.

118


RESULTS

A two-way within subjects analysis of variance was used to analyse the results (see

Table 9.1). Absolute angular error for uncorrected judgements was ±63.1° for the live

condition and ±S9.4° for the recorded condition, with pinna and no pinna combined.

When corrected for front-back errors the results were ±24.7° for the live presentation

and ±19.9° for recorded presentation, for pinna and no pinna combined. Figure 9.2

shows the mean error values (front-back corrected) for each condition for pinna and

no ptnna.

For the live condition the front-back errors made were 30% for pinnae and 48% for no

pinnae, a statistically significant difference (p$O.OS, related t-test). For the recorded

conditions, pinnae and no pinnae produced similar values of 46% and 42%

respectively, which was not statistically significant (see Figure 9.3).

119


Anova: Two-Factor With Replication

SUMMARY Live Recorded Total Pinna

Count 9 9 1 8 Average 29.63 45.83 37.73 Variance 331.75 95.52 270.56

No Pinna

Count 9 9 1 8 Average 48.15 41.67 44.91 Variance 586.45 381.87 466.79

Total

Count 1 8 1 8 Average 38.89 43.75 Variance 522.86 229.25

#DIA Source ot Variation $ dt MS F P-value F crit Pinna/no Pinna 463.51 1 463.51 1.33 0.26 4.15 Live/Recorded 212.67 1 212.67 0.61 0.44 4.15 Interaction 1157.64 1 1157.64 3.32 0.08 4.15 Within 11164.72 32 348.90

Total 12998.54 35

Table 9.1: Analysis of variance for live and recorded presentations for pinna and no pinna.

No statistically significant effects were found, although the interaction between pinna effects

and presentation method was only marginally insignificant.

120

35

30 ~

0

25 ~

0 20 ~ ~

uu

~ 15

Cl I: 10 ...:

5

0


Mean Error for Live and Recorded Presentations.

Pinna

Pinna Condition

No Pinna

-D--Live ____ Recorded

Figure 9.2: Mean angle errors (front-back corrected) for the two presentation conditions;

live and recorded, for pinna and no pinna. ±2 standard error bars are shown.

121

III ~

0 ~ ~

w

-" (J

'" m , -c: 0 ~

u.

" Cl

'" -c:

" (J ~

" Q.


Front/Back Errors

80

70

60

50 -O-Live

40 ---- Recorded

30

20

10

0 Pinna No Pinna

Pinna Condition

Figure 9.3: Percentage of front-back errors for pinna and no pinna for the live and recorded

presentations. ±2 standard error bars are shown. Although the standard errors are large, a

statistically significant difference (p$;O.05, related t-test) was found between pinna and no

pinna for the live condition. There is also a small, statistically insignificant (ANOY A,

f = 3.32, df = 34) interaction between presentation method and pinna condition.

122


DISCUSSION

The mean front-back corrected angle errors of ±24.7° for the live presentation and

±19.9° for the recorded presentation are consistent with -previous measurements in

this thesis, but high in comparison to some reported localization studies (e.g. Makous

& Middlebrooks, 1990; Stevens & Newman, 1936). Although they do reproduce the

findings of other studies with a more similar methodology (e.g. Freedman & Fisher,

1968; Schlegel, 1994; Wenzel et ai, 1993; Wightman & Kistler, 1989). Cutting out

the step of recording to and playing back from digital audio tape did not increase

judgement accuracy, as was expected. In fact, it was the recorded condition that gave

a smaller error, but more front-back errors, although this was not statistically

significant.

The high errors, therefore, do not seem to be a result of the recording process, since

the live condition gave similar results. This leaves 3 alternatives; that there may be

problems with the manikin's relay of the sound or that the absence of visual cues, or

head movements, are causing inflated angle errors. Visual cues might include

knowledge of room size and surfaces as well as the more obvious links to the

perceived sound sources. Indeed, whilst for the live and recorded sounds subjects

reported a strong sense of presence, the number of judgement errors clearly indicates

that there are elements missing that may have increased the accuracy considerably.

For the pinna/no pinna comparison, no significant differences were revealed.

However, for the live presentations, the use of pinnae did produce a marked drop in

error (statistically significant, p$O.OS, related t-test) compared to no pinnae. The

results in earlier chapters have indicated that no difference could be expected and

furthermore, the pinna is only a secondary cue for azimuth discrimination. However,

the literature is conflicting regarding the role of the pinna for localization in the

horizontal plane. B ut these results clearly show that the pinna cannot be ruled out as

an influential cue for azimuth localization. Indeed, the finding for the live condition

that using pinnae resulted in fewer front-back errors, lends strong support to the

findings of Batteau (1967) and Freedman & Fisher (1968) that the pinna is useful for

localization in all dimensions. However, only the live presentation produced this

finding. For the recorded condition the values were reversed, with pinnae giving

higher front -back errors, although the difference was not significant.

123


The role of the pinna for detennining horizontal locations clearly remains unresolved.

Whilst it does not appear useful in aiding precise determination of locus, it has

resolved front-back errors, which plays some part in the process of making azimuth

judgements. The remainder, and majority, is done through interaural differences.

The sense of presence, mentioned earlier, was widely reported by the subjects for both

the live and recorded conditions. Listeners reported feeling very much as if they were

sitting in a 'busy' or 'active' room. Such reports are likely to be a result of including

background sounds, since no such sensations had been reported for previous

experiments. Yet these background sounds clearly did not aid the localization

process.

This study set out to investigate the effect of recording on localization judgements and

has revealed that the recording process does not confound judgement accuracy.

Indeed, the angle errors remain high despite eliminating the recording stage, although

these errors are consistent with earlier reported experiments. However, the manikin

recorded and live sounds do give a strong sense of presence and thus it is clear that

other important elements are missing in these presentations. A number of factors

have been investigated and perhaps the most obvious of those elements that remain

are vision and head movements, which now require investigation.

124

Chapter 10: Vision and Head Movements in Localization

CHAPTER 10

Vision and Head Movements in Localization.

ABSTRACT

Findings from previous chapters have shown that the spectral content of an auditory

signal is not used accurately by listeners to obtain information about a sound source

location. Optimising the stimulus type and using individualized pinnae has had little

effect on the results. While response method has been shown to halve the error

values, the results have failed to match those reported by some free-field experiments

(e.g. Makous & Middlebrooks, 1990).

One possibility is that any head movements made by the subject will confound the

signal when sounds are recorded on a manikin and played back over headphones in a

booth. Experiment I uses a head tracker to monitor the movement for a restrained

(clamped) and unrestrained still head. The results showed no statistically significant

differences (ANOY A) between the two conditions, indicating that the small head

movements made by subjects in the booth would be unlikely to affect judgement

accuracy.

In experiment 2, VISIOn and head movements are investigated by using sounds

presented in three listening conditions. Subjects listened to 3 stimulus types; speech,

clicks and noise presented in the horizontal plane in (a) a free-field condition or (b) a

pre-recorded condition with a visual correlate or (c) a pre-recorded condition with no

visual correlate. Head movements were allowed for half of the subjects, the other half

were kept stationary by use of a head clamp.

The pre-recorded stimuli played back with no visual correlate produced mean angle

errors similar (±1O.8° when corrected for front-back errors) to those obtained in

125

Chapter 10: Vision and Head·Movements in Localization

previous chapters employing similar test arrangements. The pre-recorded condition

with a visual correlate gave errors smaller than had previously been obtained (±3.7°

front-back corrected). The free-field results were surprisingly accurate with errors of

±O.3°. No improvement was noted for subjects in the 'head motion' condition.

These findings do not rule out the importance of head movements in localization, but

the accuracy of subject judgements with the addition of vision was so high, that head

movement cues were clearly too subtle to be detected.

126

Chapter ID: Vision and Head Movements in Localization

INTRODUCTION

Vision and head movements have purposely been omitted from earlier studies to

reveal whether the information provided by spectral cues alone are sufficient for

localization. However, results have been consistently poor despite changes in

variables such as stimulus type, number of sources, speaker span and using

personalised pinnae (see Chapters 4 to 8). Only a change in response method caused

a significant decrease in angle errors, but still these values are high compared to some

free-field studies.

One explanation for these large errors is that spontaneous head movements during

playback of recordings made using a still manikin may confound the signal. If a

subject moves during a signal, the percept does not remain in the same fixed position

in space but moves with the subject. Even though a subject is instructed to remain

still during experimental trials, small movements may result in confusing information

and poor judgements.

Experiment I examines whether head movements confound the signals in pre

recorded trials. A head-tracking device was used to gauge how much a subject moves

his/her head either when restrained using a head clamp, or unrestrained but

intentionally holding their head still. The Head Tracker can be used to monitor

azimuth, elevation and roll relative to a base unit.

Pollack & Rose (1967) carried out a series of studies to establish the role of head

movements in localization. The only condition that showed a clear and significant

improvement in acuity was with head motion - where subjects were presented with

a signal that remained until the head was aligned with the locus of the source.

Schlegel (1994) looked at free-field and headphone azimuth estimates of white noise

and clicks. Subjects, either blindfolded or wearing goggles, were required to turn and

face the sound and their position was recorded. Free-field azimuth errors were around

±3° when averaged across all angles. For locations at the sides, judgements were off

target by up to 10°, but in the midline errors were around 0°. Schlegel was also

interested in response method and as a comparison of motor and cognitive tasks, he

asked subjects in a separate series of trials, to report the location verbally in 5° classes

instead ofturning to face the sound (although the head was not fixed). The responses

127


were equally accurate for both the verbal and cognitive tasks, with the standard

deviations being generally higher·for the verbal task.

Headphone estimates of stimuli generated using HRTF's were considerably poorer

than free-field judgements, with large numbers of overestimates. Sixty percent of

subjects were off-target by 20° and 32% were off-target by 40°. In fact, some

subjects overestimated angles by an astonishing 50°. Schlegel gives positive bias as

the reason for slight overestimation, since 5_10° systematic overestimation has been

reported by Oldfield & Parker (l984a, b) and is evident in Schlegel's own free-field

condition. However, he argues that errors in the order of 40c50° cannot be explained

by positive bias alone, indicating failure to take account of head movements as a

likely cause.

Since head movements have been shown to be useful in the free-field, Experiment 2

incorporates two 'head motion' conditions. Subjects will either have their heads

restrained by the use of a head clamp, or they will be allowed to move their heads

freely. For the free-field condition, this should help establish whether or not head

movements contribute to localization. For subjects listening in the pre-recorded

stimulus conditions (made with the manikin), a comparison of a restrained or

unrestrained head will determine whether head movements confuse subjects by

confounding the signal.

Shelton et al (1982) looked at the role of vision in sound localization but where the

sound sources themselves could not be seen. Their free-field study, which allowed

head movements, involved subjects reporting the location of narrow band sounds by

pressing a button on a control box that was held out of sight in the lap. They found

that under normal seeing conditions localization was significantly more accurate than

when vision was obscured by goggles.

Lovelace & Anderson (1993) based their study of vision in sound localization on the

findings of Shelton et at. They investigated the possible benefit of general vision

during the presentation of sounds, but where the targets themselves could not be seen

(similar to Shelton et all. Subjects were required to localize a 2-second speech noise

by pointing to the perceived origin of the unseen sound source. Speakers were spaced

at 10° intervals in the front left quadrant with a cloth separating the speakers from the

subject. All subjects took part in the two conditions; eyes closed (with a blindfold)

and eyes open. A statistically significant increase in error was found for the no vision

128


condition compared to.the vision condition - from ±3.79° to ±6.18°. However, a

second experiment revealed that it is not the presence of vision per se that improves

accuracy. When subjects were asked to point their closed eyes towards the sound,

then open their eyes and verbally report the position, the accuracy was higher than

when a finger was pointed with eyes permanently open. So, it is .likely that vision in

this case was simply used to calibrate hand movement. Support for Shelton et aI's

findings is therefore not offered by Lovelace & Anderson, which leaves the role of

vision in localization unresolved.

Visual information was added in Experiment 2 by sitting the subjects in front of the

speakers in the free-field listening condition. Visual cues were also added to a pre

recorded listening condition which was created by placing the manikin amongst the

speakers (in place of the subject for the free-field condition) and recording the

experimental trial. The subjects were then seated where the manikin had been and the

pre-recorded sounds were delivered through headphones, rather than the speakers

themselves. This was compared to a third listening condition in which the same pre

recorded sounds were heard, but with no visual link to the sound sources.

Apart from vision and head motion, a third factor was varied. Since error may be a

facet of the distance between individual speakers, two speaker spacing intervals were

chosen. The greater interval (30°) lay outside mean error of 25° established from all

previous studies. If subjects are an average of 25° off-target, then 30° speaker

spacing should produce highly accurate judgements. The smaller interval - 20°, was

inside the mean error of 25° and thus should produce a very low target-response

mappmg.

129


EXPERIMENT 1

METHOD

Subjects

Five male postgraduate students and academic staff served as volunteers.

Design

A repeated measures design was used to compare the degree of movement with:

1. a restrained head using a head clampl

2. an unrestrained, still head.

The order of the head restraint measuring conditions was counterbalanced.

Procedure

The subject was seated in a chair at the centre of a wooden hoop, with 7 speakers

placed on the hoop in front of the subject2. The subjects were instructed to keep their

heads as still as possible and to keep their eyes fixed on the speaker directly in front

of them, until the experiment was finished. Subjects were not informed of the exact

length of the experiment to prevent them from anticipating the time and moving their

head before the experiment had finished. They were simply told to remain still for a

couple of minutes.

The Head Tracker3 took measurements of the three dimensions of head movement;

azimuth (side-to-side), elevation (up and down tipping) and roll (pivoting). These

measurements were taken simultaneously at I-second intervals for a period of I

I See Experiment 2 Method section for a description of the head clamp.

2 A full description of the hoop and speaker set·up is given in the Procedure seclion of Experiment 2.

3 See Chapter I1 for a full description of the Head Tracker.

130


minute. During this time subjects were seated in a normally reverberant room in front

of the apparatus used in Experiment 2 (see Experiment 2 Method). No sounds were

played, although there was a background noise level of 50 dB SPL, measured using a

sound level metre (Breul & Kj<er 2203).

RESULTS

For azimuth, elevation and roll, the self-restraint condition (no clamp) produced a

slightly greater range of movement than the forced-restraint (clamp) condition.

However, these differences were not statistically significant (p$0.05 related t-test)

(see Figure I O.l.l).

131

-o~ o

10

",~6 en c_ '" c a: '" '" E en'" '" > ~ 0 2 ~:;; «

-2


Differences in Motion between a Restrained and Unrestrained Head

~"----I o Free Head

• Fixed Head,

Elevation Roll

Plane of Movement

Figure 10" 1.1: Movement of the head when either restrained by a head clamp or held still

but unrestrained. (NB: The y-axis does not represent absolute angles). The measurements

for azimuth, elevation and roll dimensions were taken simultaneously by the Head Tracker.

±2 standard error bars are included. No statistically significant differences (p';;0.05, related

t-test) between the different head restraint conditions were found.

132


DISCUSSION

The results show that there is no statistically significant difference in head motion

between simply instructing subjects to keep their head still and fixing their head in a

clamp. This is despite a slightly greater range of movement for the unrestrained head

in all three movement dimensions. It should be noted, however, that these results

demonstrate how small spontaneous movements are when a subject is deliberately

keeping their head still, and not that head movements per se are not important.

Indeed, Thurlow & Runge (1967) profess the significance of head movements by

asserting that head turning is a spontaneous action performed by most listeners in

attempting to determine the location of a sound source.

In a free-field situation, where head movements are freely made, a large improvement

in localization acuity may well be demonstrated and this will be investigated in

Experiment 2. However, where only small changes in head position occur, no effect

would be expected. This supports the findings of Makous & Middlebrooks (1990),

who showed that movements in the order of 10 were effectively the same as a

stationary head.

Thus the movements that subjects make whilst listening to stimuli recorded using a

manikin, are not large enough to have any effect on the perception of those stimuli.

This has positive implications for the large angle errors reported in previous chapters,

in that they are unlikely to be the result of having an unrestrained head, such that head

movement confounds the signal.

133


EXPERIMENT 2

METHOD

Subjects

Participants were 48 undergraduate and postgraduate student volunteers (18 males

and 30 females). All reported having normal hearing and had no previous experience

of hearing experiments.

Design

A 3*2*2*3 design is used:

3 listening conditions a) free field (see Apparatus section),

b) pre-recorded stimuli in the free-field set-up (with a visual

correlate) and

c) pre-recorded stimuli played back in the booth (no visual

correlate).

2 head motion conditions - head either restrained by a clamp or able to move freely.

2 speaker spacings - 30° and 20°.

3 stimulus types a) speech ("chips")

b) clicks and

c) white noise.

Four different subjects were used for each combination of listening condition, head

motion condition and speaker spacing condition ( 4 x 3 x 2 x 2) giving a total of 48

subjects. All subjects heard all stimulus types.

For the free-field condition, sounds were presented in the horizontal plane at 7

locations (_90°, _60°, -30° 0°, 30°, 60°, 90° in the 30° speaker spacing condition and-

60°, -40°, _20°, 0°, 20°, 40°, 60° in the 20° speaker spacing condition).

134


For the pre-recorded conditions, the manikin was placed where the subject had been

and the experiment was recorded. These digitally recorded sounds were played back

to subjects whilst sitting amongst the speakers or in the booth.

Stimuli

Stimuli were either a I second white noise burst, a broad-spectrum click, or speech -

the word "chips" (all identical to those used in Chapter 8). The sounds were played

through 7 speakers (Radio Spares 8Q, 3"), placed on a wooden hoop (see Procedure).

The 3 stimulus types constituted separate trials, of which each comprised 14 stimuli

(2 repetitions of the signal at 7 speaker locations).

For the second two conditions, the sequence of trials was recorded onto Digital Audio

Tape (DA T) using microphones (Breul and Kjrer 4134, OS') placed at the eardrum

position (using a Zwislocki coupler) of a KEMAR manikin.

Apparatus

Condition 1-Free-Field

A wooden hoop, 3m in diameter and 3" in depth, was located l.lrn off the ground,

supported by wooden struts. The subject was seated at the centre of the hoop and

speakers were attached to the inside of the hoop, facing the subject and in front of

them (referred to as the 'ring' set-up). The subjects' chair was adjusted such that the

speakers were at ear-height. For subjects in the 'restrained head' condition, a head

clamp was placed behind the subjects' chair. This consisted of an upright wooden

pole with a semi-circular head support around the top into which the subjects head

was placed. The head was held firmly into the support by means of a thick adjustable

canvas strap that was secured around the forehead. Subjects in the 'free head'

condition were encouraged to move their heads freely once the signal had begun, but

were to relocate their heads to a central position after each response and during the

stimulus onset. The central position was the 0° azimuth speaker which subjects were

instructed to align their nose with.

135

Chapter 10: Vision 3Jld Head Movements in Localization

Condition 2 - Pre-recorded with a visual correlate Cm the ring)

Subjects were seated in the ring set-up, identically to those in Condition 1. The only

difference was that subjects were hence listening to recorded, not live, stimuli. These

recordings were made by placing the manikin at the centre of the wooden hoop, where

the subject had been placed in the free-field condition. The experimental trial was

then played from the speakers, as for condition I, and recorded by the manikin.

Subjects were then placed back where the manikin had been and listened to these pre

recorded stimuli over headphones, but were visually within the free-field set-up.

Condition 3 - Pre-recorded in the booth (no visual correlate)

Subjects were seated in a soundproof booth and heard the same recordings as in

condition 2. A diagrammatic representation of the speaker locations was provided -

either 30° or 20° spacing (see Figures 10.2.2a & b).

Procedure

Each subject was provided with three response sheets - one for each stimulus type

(see Figure 10.2.1). The stimuli were played in a fixed randomised sequence, with a 5

second interstimulus delay. These were controlled through a switch-box, operated by

the experimenter in another room. Each subject began the sequence at a different

point4.

4 Each subject heard a different permutation of the three stimulus types. However, since 8 subjects

were used in each condition and only 6 permutations result from 3 different variables, two of the

sequences were heard twice. Ordering was similar to that shown in Appendix G - for the azimuth

trials only.

136


Subject ........................ .

Sequence .................... .

Trial ............................ .

Stimulus I Response

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Figure 10.2.1: Response diagram given to subjects in all conditions. Beside each stimulus

number a response letter had to be recorded (A - G). according to the perceived location of

the sound source.

137


Azimuth Positions

D

A o G

Figure 10.2.2a: Diagram given to subjects showing the speaker locations in the horizontal

plane. Speakers were spaced at 30° intervals. Subjects heard each sound source and were

required to choose one of the letters, which represented actual target locations.

138


Azimuth Positions

c D E

B F

o

Figure 1O.2.2b: Diagram given to subjects representing 20° speaker spacing ID the

horizontal plane.

139


RESULTS

Error values for all subjects were averaged to give overall means for the three

presentation conditions. In the free-field condition the mean angle error was ±O.3°.

For pre-recorded stimuli in the ring, the mean error was ±3.7° and for the pre

recorded stimuli in the booth, the error was ±10.8°. A statistically significant

difference was found between the three listening conditions (p:;O.O 1, ANOV A).

For each of the three listening categories, values were broken down into speaker

spacing, head restraint and stimulus type, as shown in Tables 10.2.2a, b & c. A

significant effect was also found between speaker spacings for the clicks stimulus,

where the mean for the 20° spacing was 5.4° and for the 30° spacing was 7.3°

(p:;0.05, ANOVA). However, no other significant effects were observed (see Tables

1O.2.la-d).

The angle errors shown in Tables 10.2.2a, b & c typically show the values for 20°

speaker spacing condition to be smaller or roughly equal to the 30° spacing condition.

This contradicts the hypothesis that the 20° spacing would increase judgement errors.

However, angle error may not be useful in determining whether there was greater

confusion in one condition. If a subject is one speaker off-target, with 20° spacing a

20° error is obtained, whereas a 30° error is obtained for the 30° spacing. Thus,

whilst there may be more confusion with the 20° spacing condition, the 30° condition

may give larger errors and falsely indicate that the 30° condition was more poorly

judged.

Means for the different stimulus types were taken across all speaker spacing and head

restraint conditions. For Clicks the overall mean was higher (±6.38°) than "Chips"

(±4.l2°) and noise (±4.33°). This shows the opposite trend to the comparison of the

same three stimulus types in Chapter 8 (±l9.6° for clicks, which is lower than the

±20.8° and ±22.5° for "Chips" and noise). (NB: The values in the earlier experiment

are higher because a different response method was used. Also, the errors for this

study include low free-field error values). The differences between the stimulus types

in this experiment and in Chapter 8, however, are not statistically significant.

Information analysis is a method that examines the genuine degree of confusion in

such cases. The results are given in Tables 1O.2.3a, b & c. Confusion matrices for

140

Chapter to: Vision and Head Movements in Localization

each group show the pattern of responses (Figures IO.2.2a-I). A near perfect

transmission value of 2.76 bits was obtained overall for the free-field condition - a

perfect set of responses would give a value of 2.81 bits. For the ring playback

condition there was a mean value of 2.23 bits and for the booth condition an overall

mean of 1.51 bits.

141


Factor spacing place movement

Levels 2 3 2

Values 1 1 1

2 2 2

3

Analysis of Variance for all sound

Source DF Seq SS Adj SS spacing 1 582.8 582.8 place 2 11977.8 11977.8 movement 1 665.3 665.3 Error 43 15144.4 15144.4 Total 47 28370.3

Unusual Observations for allsound

Obs. allsound 23 151. 430

Fit Stdev.Fit 45.824 6.057

Adj MS 582.8

5988.9 665.3 352.2

Residual 105.606

R denotes an obs. with a large st. resid.

Means for allsound

place Mean Stdev free-field 0.3 3.485 ring 3.7 3.485 booth 10.8 3.485

F 1. 65

17.00 1. 89

St.Resid 5.95R

P

0.205 0.000 0.176

Table 1O.2.la: Analysis of variance for all stimulus types combined for different; speaker

spacings ("spacing") - 20° and 30°, playback locations ("place") - free-field, ring and

booth, and head restraint conditions ("movement") - fixed (clamped) or free. Statistically

significant resu1ts are shown in bold type.

142


Factor Levels Values spacing 2 1 2 place 3 1 2 3 movement 2 1 2

Analysis of Variance for "chips"

Source DF Seq SS Adj SS Adj MS F P

spacing 1 10.89 10.89 10.89 0.83 0.366 place 2 661.67 661. 67 330.84 25.33 0.000 movement 1 7.19 7.19 7.19 0.55 0.462 Error 43 561.60 561.60 13.06 Total 47 1241. 35

Unusual Observations for "chips"

Obs. "chips· Fit Stdev.Fit Residual st. Resid 11 17.1400 3.7498 1.1664 13.3902 3.91R 14 15.0000 4.5240 1.1664 10.4760 3.06R 23 17.1400 10.0590 1.1664 7.0810 2.07R

R denotes an ohs. with a large st. resid.

Means for "chips"

place Mean Stdev 1 0.1787 0.9035 2 3.6606 0.9035 3 9.1956 0.9035

Table IO.2.lb: Analysis of variance for the speech stimulus "Chips" for the two different

speaker spacings, three playback locations and two head restraint conditions. Statistically

significant results are shown in bold type.

143



Analysis of Variance for clicks

Source OF Seq SS Adj SS Adj MS F P

spacing 1 47.68 47.68 47.68 5.70 0.021 place 2 1089.05 1089.05 544.52 65.06 0.000 movement 1 23.49 23.49 23.49 2.81 0.101 Error 43 359.91 359.91 8.37 Total 47 1520.13

Unusual Observations for clicks

Obs. clicks Fit Stdev.Fit Residual St.Resid 14 15.0000 7.7687 0.9337 7.2313 2.64R 36 10.0000 4.3762 0.9337 5.6238 2.05R 38 0.0000 5.7754 0.9337 -5.7754 -2.11R 39 12.8600 5.7754 0.9337 7.0846 2.59R


Means for clicks

place Mean Stdev 1 0.7588 0.7233 2 6.0725 0.7233 3 12.4113 0.7233

Table 10.2.1 c: Analysis of variance clicks, for the two speaker spacing conditions, the three

playback locations and two head restraint conditions. Statistically significant results are

shown in bold type.

144



Analysis of Variance for noise

Source DF Seq SS Adj SS Adj MS spacing 1 194.2 194.2 194.2 place 2 2709.7 2709.7 1354.9 movement 1 333.6 333.6 333.6 Error 43 10960.8 10960.8 254.9 Total 47 14198.3

Unusual Observations for nojse

Obs. noise Fit Stdev.Fit Residual 23 119.290 21.657 5.153 97.633


Means for noise

place Mean Stdev 1 0.0000 3.99l 2 2.4181 3.99l 3 17.0094 3.99l

F 0.76 5.32 1. 31

St.Resid 6.46R

P

0.388 0.009

0.259

Table 10.2.1 d: Analysis of variance for the noise stimulus for the different speaker spacings, .

playback locations and head restraint conditions. Statistically significant resulls are shown

in bold.

145


Tables 10.2.2a--<:: (shown on the next page) Mean angle error values for the free-field

condition are shown in 1O.2.2a. Averages are broken down into speaker spacing (30° and

20°), head restraint (either fixed in a clamp or free to move), and stimulus type ("Chips",

Clicks and Noise). Identical breakdowns are given for subjects listening to pre-recorded

stimuli in the original recording set-up (i.e. with a visual correlate) in (1O.2.2b) and for pre

recorded stimuli played back in the booth (10.2.2c).

146

Chapter 10 : Vision and Head Movements in Localization

10.2.2a .---------F-r-ee- .-F-i-e-Id- ----,

30° Spacing

"OllpS" Oick Noise

HeadRxed 0 .0 0.7 0.0

Head Fr ee 0 .0 2 .1 0.0

MEANS 0 .0 1 .4 0 .0

20° Spacing

Head Fi xed 0 .7 0.4 0 .0

Head Free 0.0 0.0 0 .0

M EANS 0 . 4 0 . 2 0.0

10.2.2b Ring Playback

30° Spacing

"Ol i ps" Oick Noise

Head Fi xed 3 .2 7.5 2 .3

Head Fr ee 3 .8 5 .9 2.1

M EANS 3.5 6.7 2.2

20 ° Spacing

Head Fixed 1 .8 5 .7 2 .9

Head Fr ee 3 .2 4.6 1 .8

M EA NS 2. 5 5.2 2.3

10.2.2c Booth Playback

30° Spacing

"Ol i ps" Oick Noise

Head Rxed 12 .3 16.1 13.9

Head Fr ee 7.0 11 .8 7 .0

M EA NS 9 .6 13 .9 10 . 4

20° Spacing

Head Fi xed 9 .3 11.4 10 .0

Head Fr ee 8.2 10.4 12 .1

MEANS 8.7 10.9 11 .1

147


(a)

Free-Field

Speaker Head Transmitted I Number of Overall Spacing Motion Information (bits) Positions Mean

3(J' fixed .2.75 I 6.7 I I

free 2.71 I 6.6 2ff

I fixed 2.75 f 6.7

! free I 2.81 I 7 6.75

I

(b)

Ring

Speaker Head Transmitted I Number of Overall Spacing Motion Information (bits) i Positions Mean ,

30" I fixed I 2.22 i 4.7 I ! I

I free 2.27 ! 4.8 -----

2ff fixed 2.45 : 5.5 ,

free 1.96 I 3.9 4.73 I !

(c)

Booth

I

Speaker Head Transmitted I Number of Overall Spacing Motion Information (bits) I Positions Mean

3(J' fixed 1.45 I 2.7 free 1.75 I 3.4

2ff fixed 1.37 I 2.6

free 1.47 I 2.8 2.88

Tables 10.2.3a--<:: Information analysis for the free-field (a). ring playback (b) and booth

playback (c) conditions. Breakdowns are given in terms of speaker spacing and head

motion conditions. Transmitted information is in bits and the corresponding number of

reliably identified positions. from the total of 7. is also shown.

148


Figures 10.2.2a--d: (shown on page /52) Confusion matrices showing the pattern of

responses for the free-field condition for (a) 30° speaker spacing with head fixed, (b) 30°

speaker spacing with head free, (c) 20° speaker spacing with head fixed and (d) 20°

speaker spacing with head free.

Figures 10.2.2e-h: (shown 011 page 153) Confusion matrices forthe ring playback condition

for (e) 30° speaker spacing with head fixed, (I) 30° speaker spacing with head free, (g) 20°

speaker spacing with head fixed and (hj' 20° speaker spacing with head free.

Figures 10.2.2i-l: (shown on page 154) Confusion matrices showing the response pattern

for the booth playback condition for (i) 30° speaker spacing with head fixed, U) 30°

speaker spacing with head free, (k) 20° speaker spacing with head fixed and (I) 20° speaker

spacing with head free.

\

149

c o

<= --'" o ~

(a)

-90

"60

-30

0

30

60

90

(b)

-90

-60

-30

0

30

60

90

-90

-60

-30

0

30

60

90

(d)

-90

-GO

-30

0

30

60

90


-90 -60 -30 0 30 60 90

24

24

24

24

24

22 2

24

-90 -60 -30 0 30 60 90

24

24

24

24

24

23 1

2 22

-90 -60 -30 0 30 60 90

24

24

24

24

24

24

2 22

-90 -iiO -30 () 30 60 90

24

24

24

24

24

24

24

Response Position

150

= o .--.-rJl o

Q...

(e)

-90

-60

-30

0

30

60

90

(0

-90

-60

-30

0

30

60

90

(g)

-90

-60

-30

0

30

60

90

(h)

-90

-60

-30

0

30

60

90

Chapler 10: Vision and Head Movements in Localization

-90 -60 -30 0 30 60 90

20 4

7 17

5 19

1 1 22

24

2 21 1

4 20

-90 -60 -30 0 30 60 90

18 6

3 20 1

1 3 20

3 21

23 1

24

3 21

-90 -60 -30 0 30 GO 90

21 3

24

2 21 1

24

21 3

22 2

1 23

-90 -GO -30 0 30 60 90

18 5 1

3 18 3

1 3 20

1 23

1 18 5

13 11

6 18

Response Position

151

= o .--.-fIl o ~

(i)

-90

"60

-30

0

30

60

90

-90

-60

-30

0

30

60

90

(k)

-90

-60

-30

0

30

60

90

(I)

-90

-60

-30

0

30

60

90


-90 -60 -30 0 30 60 90

18 6

6 10 8

2 6 1 1 5

8 14 1 1

3 13 7 1

3 9 12

1 4 19

-90 -60 -30 0 30 60 90

19 4 1

3 17 4

9 13 2

1 5 17 1

1 18 5

3 16 5

3 21

-90 -60 -30 0 30 60 90

17 6 1

6 12 5 1

1 6 13 4

1 6 16 1

2 12 7 3

7 8 9

1 1 1 12

-90 -60 -30 0 30 60 90

15 8

4 15 5

3 1 1 9

1 1 3 18 1

12 9 2

4 1 1 8

1 " 12

Response Position

152

Chapter 10: Vision and Head Movemems in Localization

DISCUSSION

The free-field localization task gave very accurate results, with angle error values no

larger than ±2°. This result seemed to be unaffected by changes in speaker spacing or

subject head motion conditions. However, there is a strong floor effect for the free

field results where several subjects obtained 0° error and so no effect of the different

conditions can be seen. Localization of pre-recorded signals in the ring set-up

produced higher errors than the free-field presentation (overall mean of ±3.7° for pre

recorded and ±O.50 for free-field - a statistically significant difference, ANOV A).

A significant difference (p!>O.Ol, ANOV A) was again found between the ring

presentation and the booth presentation of pre-recorded sounds (overall mean of

±1O.8° for the booth compared to the ring value of ±3.7°). Indeed, the error values

obtained for the booth condition in this study are comparable to those of previous

studies that have used the same (categorical) method of playing back in the booth

(where mean errors have ranged from ±8° to ±19°).

The three different stimulus types showed no statistically significant differences.

However, for all three listening conditions clicks produced the greatest error. Whilst

all three are broadband sounds, clicks are by far the shortest duration and subjects

reported the greatest difficulty in localizing this stimulus. Such brief signals may not

give subjects adequate time to glean useful information and therefore longer duration

sounds should be used to optimise performance.

Speaker spacing showed only a small effect, with the angle error for the 30° spacing

being higher than for the 20° spacing. This was contrary to the hypothesis that the

errors for the 20° spacing would be higher, since subjects are more likely to be

confused if the sound sources are closer together. As -previous chapters have shown,

25° spacing is necessary, on average, to give reliable localization accuracy. Thus if

the spacing is less than 25°, the error is likely to increase.

However, when attempting to ascertain confusion within a task, angle error provides

only limited information. Angle errors obtained when the speakers were spaced at

20° intervals can only be smaller than error values from the task where the speakers

were spaced 30° apart (because the possible error is made smaller by decreasing the

speaker spacing). Information analysis is therefore more appropriate, since it gives a

153


measure of confusion within a task, rather than the overall error. The degree of

confusion for all presentation conditions is similar for both speaker spacings. This

shows that a similar type and number of errors were made for the 20° (mean 2.14 bits)

and 30° (mean 2.19 bits) spacings, even though the angle errors for the 30° spacing

are higher. For the free-field condition, however, it is unclear whether any

differences in confusion exist, since there is an apparent floor effect.

Thus the only large effect was between the different presentation conditions. The

most obvious cause for the discrepancy in angle error values between the free-field

and the ring presentations is that the ring uses recorded sounds delivered over

headphones. This implies that either some information is lost in the recording

process. This may have been a result of using nonindi vidualized pinnae, although

previously this has been shown to have little effect (see Chapter 4). It may also be

due to more subtle effects produced by an artificial ear canal, or a hollow torso cavity.

Head motion is the other main difference between the ring and the free-field.

However, the results show that better head restriction has no effect in the free-field

and this is surprising given the number of studies that have professed the importance

of head movements.

Pollack & Rose (1967) have shown that moving the head improves localization in a

number of ways. Firstly, movement allows the listener to orient their head towards a

sound and as studies have shown (e.g. Stevens & Newman, 1936; Makous &

Middlebrooks, 1990; Schlegel, 1994), people can determine location and distinguish

sounds more accurately in the midline. Thus by centring the head on the sound

source, judgements are likely to be more accurate. Secondly, movement increases

acuity because we can judge relative sounds more accurately than absolute sounds, as

is highlighted by the findings of Minimum Audible Angle studies (e.g. Mills, 1958;

Perrott, 1984; Perrott et ai, 1993). By moving the head the sound becomes a

continually changing stimulus which can be determined more exactly by the listener.

Finally, moving the head eliminates front-rear azimuth confusions by disambiguating

identical interaural timing and level differences that occur when the sound is in the

same location in front or behind the listener.

But whilst Pollack & Rose found that turning to face a sound source significantly

reduced the error rate, this only occurred with a sustained signal that allowed time for

subjects to orient their head towards it. For short-duration sounds, where the listener

154


does not have time to face the sound source, there is no evidence that the information

gained from movement is any more beneficial than when the head is stationary.

This might explain the finding that -head movements had no effec~ since none of the

signals had more than a I-second duration. But the·presence of vision may also explain

this result. In cases where moving the ·head has clearly been an aid to localization (e.g.

Pollack & Rose, 1967; Makous & Middlebrooks, 1990; Stevens & Newman, 1936),

subjects have been unsighted (either with closed eyes, blindfolded or hidden from the

speakers by acoustically transparent cloth). In this study, where vision is incorporated,

subjects may be using visual information as the dominant cue, overriding the cues

provided by head movement. This is certainly supported by the finding that the visual

(ring) presentation produced more accurate performances than the in booth. Although

the differences between the two playback settings may also have had an effect.

For example, playing sounds back in a room congruent with the recording environment

may have increased accuracy. Subjects are denied knowledge of the original recording

room acoustics by playing sounds back in a sound-attenuating booth, which may

confuse the listener. Therefore, the increase in acuity in this study for pre-recorded

stimuli, may not be totally due to providing a visual element (the presence of speakers).

It is more likely to be a combination of playing back with visual aids in an acoustically

similar environment. Determining the effect of having identical room acoustics for

recording and playback could be done by providing subjects with a diagrammatic

representation of the speakers in the recording room, rather than the booth.

This study has found vision to play a major role in enhancing auditory localization.

However, head movements have not been ruled out as a useful cue. Indeed, they may

play an important part in localization, but subjects were so accurate with the addition of

vision, that head movement cues were clearly too subtle to be detected. Perhaps in the

absence of vision, head movements would have the same highly beneficial effect on

accuracy. But for this study, a visual correlate appears to be the major factor in

increasing judgement accuracy.

These findings have important implications for VR implementations. If a high

judgement accuracy relies on vision, then it is unlikely that auditory cues alone can be

used to identify targets. Auditory information can only be used to supplement or guide

VISIOn.

155

Chapter 11: Head Movements using the Head Tracker

CHAPTER 11

Head Movements using the Head Tracker.

ABSTRACT

The importance of head movements in localization tasks have been emphasised by

many researchers (e.g. Van Soest, 1929; Young, 1931; Wallach, 1939, 1940; Thurlow

& Runge, 1967; Wightman et al, 1987). Previous attempts in this thesis to assess

head movements (Chapter 10) were not effective in isolating head motion and no

effect was found. Furthermore, this attempt did not incorporate the technology used

in auditory virtual reality.

This study attempts to establish the contribution of head movements for head-tracked

and non head-tracked HRTF-generated stimuli - the method typically used in VR.

Listeners were asked to judge the location of white noise bursts presented in the

horizontal plane. For each of three head motion conditions (still, controlled

movement or free movement) the Head Tracker was either switched on or off, thus

either accounting or not accounting for head movements.

The overall angle errors were high; ±19.So with the Head Tracker on and ±21 0 with

the Head Tracker off, for all head movement conditions combined. Only one head

tracked motion condition showed an improvement over any of the non head-tracked

conditions - when subjects were able to move their heads freely (a 60 improvement

which was statistically significant, ANOVA, f = 6.99, df = 2).

Support for the literature is offered by these findinjSs. While head movements can

improve accuracy substantially (when subjects can move their heads as desired), head

motion does not produce near-perfect accuracy. However, equipment constraints

156


were identified which may have increased the error values. Once the spatial

resolution in current technology has been refined, the potential for reducing error is

likely to be much higher.

157

Chapter 11: Head MoYements using the Head Tracker

INTRODUCTION

Head movement is a fundamental method of reducing the ambiguities relating to a

sound source location (e.g. Van Soest, 1929; Young, 1931; Hirsch, 1971). Primarily,

head motion is argued to increase accuracy by disambiguating front-rear azimuth

confusions (e.g. Wightman et al; 1987). Although the work of Young (1931) showed

that moving the head resulted in a changing binaural stimulus pattern, which in itself

aided localization accuracy.

Wallach (1939, 1940) argues a similar case to Young. He demonstrates that if one

turns one's head whilst a sound is being delivered, then cues are obtained for several

lateral angles for the same sound source direction. He shows that "Geometrically, a

sequence of lateral angles obtained in this manner completely determines a given

direction". Wallach also argues that the head movements are the primary means of

disambiguating front-back confusion and that the pinna only plays a part in the

absence of head movements (not very frequent in ordinary listening conditions) and

that front-back discrimination is the pinna's minor role.

The significance of head movements in localization was also emphasised by Blauert

(1983). He argued that if head movements are available, then all cues acquired from

monaural signal characteristics are overridden and motion becomes the dominant cue.

Support for this proposal was offered by Makous & Middlebrooks (1990), who

obtained very low mean angle errors (between 1.5° & 16.3°) for their free-field study

in which head movements were used.

This study attempts to isolate and investigate the effect of head movements on

localization accuracy. In Chapter 10, where head movements and vision were

investigated together, the effects of head motion were overshadowed by visual cues.

Furthermore, the study was conducted in the free-field, which fundamentally differs

from simulated 3-dimensional sound.

Sounds were generated using 'head related transfer functions' (HRTF's). The HRTF

determines the set of filter coefficients that models the 3D sound. For any given

source position, the HRTF will produce a specific set of filter coefficients based on

the azimuth and/or elevation location. Head movements can then be incorporated by

adding a head-tracking device. This device monitors the head position, which is read

158

Chapter J J: Head Movements using the Head Tracker

by a computer and fed back to the sound-generating equipment. The new position is

then accounted for by producing a new set of HRTF's and the process is then repeated

for each new head position. This method of generating sound and monitoring head

movement is akin to that used in current virtual reality set-ups. The equipment should

therefore give a true representation of what can be expected from the motion cues

provided by this technology but without the visual element.

Thurlow et al (1967) looked at the types of head movements listeners made when

asked to localize a sound source. Subjects' movement was filmed and later examined

in terms of rotation (side-to-side), tip (up and down) and pivoting (roll- causing an

increase in height of one ear and a decrease in the other). A rotate-tip movement was

the most frequent with a mean of 66% of subjects performing this motion pattern.

The most common single action was rotation (with a mean of 36° total movement)

and the least common was pivoting (only 4% of subjects showing this pattern with an

average movement range of 10° - although there is less scope for movement in this

plane). The HT! has restricted capabilities and can only account for movement in 2

dimensions simultaneously. Therefore, for the head motion conditions, azimuth

(rotation) and elevation (tipping) were tracked and roll (pivot) was ignored. Thurlow

et aI's findings suggest that this should have very little effect on the results.

Indeed, as Wallach (1939, 1940) argues, any movement taken account of should give

some benefit, since it will disambiguate front from rear as well as providing a

changing stimulus which will increase localization cues.

This experiment will not just assess accuracy with and without head movements.

Subjects will be allowed to move their heads 'freely' whilst listening, since this will

allow them to exercise their 'natural' and experienced method of locating sounds.

However, if subjects are told to move their heads generally in a head movement

condition, each subject will move their head differently. Thus, a controlled

movement condition will be used in addition to a 'free movement' condition, to

maintain a cOri.sistent, and comparable, set of head movements. The subject will be

instructed to move their head 45° to the right at the onset of each stimulus. Since all

sounds are between 0° and 180°, this might be the closest movement to mimicking the

natural action in such a situation, since we naturally turn towards or in the direction of

a sound source (Thurlow & Runge, 1967; Pollack & Rose, 1967).

159


METHOD

Subjects

Ten subjects were recruited by opportunity sampling. All were postgraduate students

(6 male and 4 female) between 22 and 40 years of age. All had reported normal

hearing and 3 had taken part in one hearing experiment previously.

Design

A 2*3 repeated measures design was used. Seven sound sources were played at 7

locations (0°, 30°, 60°, 90°, 120°, 150°, 180°) in the horizontal plane, all at head

height. The sound sources were repeated twice to form a trial comprising 14 sounds.

Each trial was played in a fixed randomised sequence and all subjects listened to these

trials under 6 conditions:

Head Tracker On - subject's head remains still

- subject's head moves to the right

- subject's head moves freely

Head Tracker Off - subject's head remains still

- subject's head moves to the right

- subject's head moves freely

Trials were counterbalanced to reduce practice effects.

Stimuli

The stimulus Wi\S a I-second gaussian noise burst with 25 ms onset and offset ramps.

The noise was generated using a C function, from a set of stimulus generation

routines on an AP2 computer card. The noise sequence was then transferred to a DSP

module (located in a Power SDAC unit, Tucker-Davis Technologies). Here, it was

convolved with the HRTF corresponding to the target location and also (where

relevant) in relation to the head position, as read from a Head Tracker (Polhemus

160


3Space, Isotrak Il}. The SDAC unit then transformed the signal from digital into

analogue form, to be played through headphones.

The sound sources were played randomly from a play list, based on the 'stimulus

parameters' and 'experiment parameters' entered into the computer which controlled

the Head Tracker and Convolving apparatus. The stimulus parameters set four

variables. Firstly, the stimulus duration, which was set to I second to provide a

comparison with previous experiments that had used a noise stimulus. Secondly,

interstimulus interval was set to 6 seconds, adequate time for the subject to perform

any head movement necessary and record their judgement. Thirdly, attenuation (in

dB) was set to 6S dB SPL and lastly, whether or not the Head Tracker was switched

on or off was controlled.

The experiment parameters were the stimulus locations themselves (as specified in the

Design section).

Procedure

Subjects listened to the sounds over headphones (Senheiser HD-414). whilst seated in

a normally reverberant quiet room. Each subject was provided with instructionsl and

6 response sheets (see Figure ILl), one for each trial. Listeners were asked to locate

their head to a forward-facing position by aligning their head with a yellow cardboard

spot (I" in diameter) affixed to a vertical surface in front of the subject, at a distance

of I.S ft. A fully adjustable chair was provided to ensure the spot was at eye-level

and that the correct distance was maintained.

Each trial was initialised from a computer keyboard. It was imperative that as this

key was pressed the subject's head was facing the forward position. The Head

Tracker would set a relative 0° azimuth position from the subject's head position at

that point in time. Subjects were verbally reminded of this at the beginning of every

trial.

I See Appendix 50

161


For the 'head still' condition, the subject could move after the stimulus offset to record

their responses and then the head was returned to the central position. For the 'head

moves right' condition, the subject moved their head as soon as the sound began to a

45° location (the centre of a specially positioned computer screen). After recording

their response, they were again required to affix on the yellow spot. For the 'head

moves freely' condition, the subject was free to move their head as desired once the

stimulus had begun, but returned the central position for the onset next stimulus.

Each trial was presented at approximately 1 minute intervals, which was the time

taken for the computer to be re-set in order to run a new sequence with different

parameters. Subjects were unaware of the Head Tracker function and all other

stimulus and experiment parameters during the experiment, but they were fully

debriefed after completing the 6 trials.

162

-------


·Front

Left o Right

Back

Figure 11.1: Response diagram (actual size) given to subjects. Subjects marked the

numbers 1 to 14 on the diagram - the total number of stimuli per trial. A new response

sheet was provided for each of the 6 trials.

163


RESULTS

Mean angle errors were calculated by averaging error values across subjects. For the

Head Tracker On condition, the overall mean error was ±19.5° and for the Head

Tracker Off the overall error was ±21°. A more detailed breakdown of the findings is

shown in Figure 11.2. A within-subjects analysis of variance (see Table 11.1) was

used to analyse the results.

With free head movements allowed and the Head Tracker On, the accuracy was

significantly higher than for all other conditions (ANOVA, f = 8.39, df = 2). For the

Head Tracker On condition, there were large differences between the different head

motion conditions: The angle error for the 'head move right' condition (±27°) was

significantly higher than the errors for the 'still' and 'move freely' conditions (±16,7°

and ±14.9° respectively). This pattern of results was not replicated with the Head

Tracker Off, where all head movement conditions showed similar errors.

A statistically significant (ANOV A, f = 6.6, df = 2) interaction was found between the

Head Tracker status (On/Off) and head motion conditions (see Figure 11.2).

The number of front-back errors was calculated to establish whether there was a

difference between the two Head Tracker conditions (On and Off). With the Head

Tracker switched on, the number of front-back confusions was I1 %. This value rose

to 15% with the Head Tracker switched off - a statistically significant increase

(p:'>O.05, unrelated t-test).

164


ANALYSIS OF VARIANCE

analysis of variance for errors

SOURCE HTstatus motion :Interaction Error Total

HT status On Off

motion Head Still Head Move R Head Free

DF 1 2 2

54 59

Mean 20.6 21.0

Mean 19.2 25.3 17.9

SS MS F P 2.77 2.77 0.08 0.785

616.52 308.26 8.39 0.001 485.38 242.69 6.60 0.003

1985.05 36.76 3089.72

Individual 95% Cl --------+---------+---------+---------+---

(-----------------*------------------) (-----------------*-----------------)

--------+---------+---------+---------+---

19.2 20.4 21. 6 22.8

Individual 95% Cl -------+---------+---------+---------+----

(-------*-------) (-------*-------)

(-------*-------) -------+---------+---------+---------+----

17.5 21. 0 24.5 28.0

Table 11.1: Analysis of variance table showing mean angle error values. Head Tracker

status represents the Head Tracker On or Off. Motion refers to the three head movement

conditions; head still, controlled head movement to the right and head allowed to move

freely. Statistically significant effects are shown in bold type.

165


Mean (Front/Back Corrected) Errors

35 T 30 J

~ 25 ~ .. 0 20 .. .. -0- Head Tracker ON t.l

" 1 5

j .. c ..:

10

---Head Tracker OFF

5 ~

I 0

Still Move Right Move Freely

Head Motion Condition

Figure 11.2: Mean angle errors with the Head Tracker switched on and off for the three

different head motion conditions ('still' but without restraint. a controlled 'movement to the

right' of 45° and 'free' movement). A statistically significanr improvement for the 'move

freely' condition with the Head Tracker On was found, over all conditions with the Head

Tracker Off. There was also a statistically significant improvement over the 'head move

right' condition with the Head Tracker On (ANOY A, f = 8.39, df = 2) and a statistically

significant interaction between HT status and head motion (ANOY A, f = 6.6, df = 2).

166


DISCUSSION

The incorporation of head movements through the use of a Head Tracker has

produced a improvement in judgement accuracy, but not to the degree that was

anticipated. With the Head Tracker On, only one head movement condition showed

an improvement over Head Tracker Off conditions - where the head could be moved

'freely'. Furthermore, the angle error for this (statistically significant) improvement

was still surprisingly high. The benefits provided by visual cues, reported in Chapter

10, are clearly not matched by head movement cues alone. But although the angle

errors in general have remained fairly high, a clear benefit has been demonstrated for

cases where the listener is able to move their head as desired. 'Unnatural' or

controlled motion does not aid localization any more than keeping the head still. In

fact in many cases, unnatural movements appear to be confusing and make

judgements more difficult than when head movements are not incorporated. This is

highlighted by the statistically significant interaction between head motion condition

and the Head Tracker status (On or Off).

Despite the high errors, the pattern of results offers support to some published studies

which report an improvement with head movements, but not to the degree of

producing almost perfect accuracy (e.g. Hirsch, 1971; Blauert, 1983; Wightman et al,

1987). In particular, it closely matches Thurlow & Runge's (1967) study where they

report a reduction in error with head movements, but only of 30%. The reduction in

this study is also approximately 30%, from 21 ° to IS°.

One puzzling result was the difference between the 'still' head conditions with the

Head Tracker On and Off. It was assumed that these two conditions would produce

the same, or very similar results. If the head is still, then no head movements are

accounted for, making the two conditions apparently identical. However, there is a

rise in accuracy for the Head Tracker On condition (although marginally statistically

insignificant) that implies that additional cues may have been obtained from the very

small head movements that some subjects made. However, this was thought unlikely,

since such small head movements (in the order of I - 2° for the head still condition)

have been shown to have no effect (e.g. Makous & Middlebrooks, 1990. See also

Chapter 10).

167

Chapler 11: Head Movemenls using the Head Tracker

One other explanation for the difference between the two 'head still' conditions may

be the limited resolution of the Head Tracker. Although HTI has a resolution of 1°,

the HRTF bank has only a 5° resolution, which causes perceptible jumps or slight

irregularities in movement. The small movements of less than 5° (actually in the

order of I - 2°) made by subjects should therefore have no effect since the HRTF

resolutions require 5° movement in either direction to cause a perceptible change.

But if those movements are about the 0° point (which they often were), then a

noticeable change would occur, since from 359° to 0° is where a 5° 'jump' occurs.

Thus, even though these small movements shouldn't have had an effect, the very fact

that the I or 2° spanned the 0° azimuth position meant that their movements were

exaggerated and affected accuracy. Thus one might assume that with correction of

the resolution, no 'jumps' would be perceived and the 'head still' conditions for the

Head Tracker on and off would be very similar.

The number of front-back errors was calculated with and without head movements. A

statistically significant reduction in the number of front-back confusions was found

with the Head Tracker switched on. Combined with the overall improvement in angle

error, this front-back error reduction supports Wightman et aI's (1987) theory. They

claim that head movements reduce errors by preventing azimuth confusions. But if

head movements only aid localization through reduction of front-back errors, then

when the data is corrected for these errors, the effect of the head movements should

be nullified. Thus the 'head movement' and 'no head movement' data should be

similar when front-back corrected. Yet this is not the case, demonstrating that head

movements are performing a more complex function than simply correcting for

confusions. Such findings may lend support to Wallach (1939, 1940) and Young

(1931) who assert that head movements play a more extensive role than simply

resolving azimuth confusions.

The resolution of the equipment clearly presents problems for auditory virtual reality.

In order to fully establish the contribution of head movements as a cue, the (5°)

resolution must be refined to produce perceptibly smooth and small changes.

Certainly, to provide HRTF interpolations that match the capabilities of the

equipment (usually I ° resolution) is essential. But this study goes some way towards

highlighting the potential value of head movements in localization.

168

Chapter 12: General Discussion and Conclusions

CHAPTER 12

General Discussion and Conclusions

12.1 SUMMARY

The localization experiments reported in this thesis have provided an insight into

A YR. The major problem surrounding these 'virtual' sounds were the (apparently

unrealistically) large angle errors, which would result in real problems in safety

critical situations. The thesis provides a fundamental evaluation of the importance of

acoustic cues in locating targets.

Manikin recordings have been used extensively to assess localization acuity. A

manikin provides a direct and accurate means of achieving 3-dimensional sound that

isolates auditory cues (head movements, vision and pinna cues can be eliminated as

variables). A number of basic factors were then investigated in terms of their

contribution to localization.

The significance of pinna-based spectral cues was assessed in terms of making

azimuth and elevation judgements. The accuracy of locating sounds with one's own

pinnae was measured against using another person's pinnae or no pinnae at all. The

conclusions had a strong bearing on the future cost of producing A VR sounds and on

subsequent experiments.

Different response methods and stimulus types were used in several experiments

throughout the thesis. However, disparities in the results led to a controlled

comparison of these variables, to clear up ambiguities that surrounded these issues.

A systematic investigation of recording and playback techniques further contributed

to the extensive examination of factors that might be involved in localization. Yet

169


with little success in reducing the angle error, these variables were clearly not the

major cues being utilised by the listener. The non-acoustic cues of head movement

and vision were incorporated and examined. A discussion of the major findings and

conclusions of all of these investigations is outlined in the sections below (major

conclusions are underlined for ease of reference). Some of the limitations of the work

are also presented with reference to possible improvements and suggestions for future

research.

12.2 DISCUSSION AND CONCLUSIONS

12.2.1 Individualized pinnae

A comparison of individualized, nonindividualized and no pinnae for azimuth and

elevation judgements is reported in Chapter 4. The results showed that there was a

small but statistically insignificant increase in accuracy when using one's own pinnae.

For azimuth judgements, this increase was 3°, which with a larger sample size may

have been significant. The effect for elevation was much smaller. only 1.3°, which is

less likely to give a significant result with a larger sample size. These results offer

support to Freedman & Fisher (1968) who similarly found no difference between

individualized and nonindividualized pinnae for azimuth and elevation judgements,

Therefore, in terms of maximising cues in AYR, using pinnae will undoubtedly lend

support. But the time consuming and costly procedure of manufacturing

individualized pinnae was not found to be worthwhile.

12.2.2 Pinnae/no pinnae

Chapter 4 provides the first comparison in this thesis of pinnae and no pinnae for

azimuth and elevation judgements, For the azimuth judgements, using pinnae (either

individualized or nonindividualized) gave similar results to no pinnae (0.3°

difference). This highlights the dominance of interaural timing and level differences

for azimuth discrimination, These results were replicated in Chapter 6, where the

pinna also showed no effect for azimuth. In Chapter 9, which looks only at azimuth

judgements, a larger improvement was found when using pinnae, but the 3° difference

170


overall was insignificant. For elevation, however, a strong effect was found in

Chapter 6. The pinna increased judgement accuracy in the vertical plane

significantly. (by 7°) perhaps demonstrating the primary function of the pinna In

Chapter 4 the difference between pinna and no pinna for elevation was also larger

than for azimuth, by 5°, although this result was marginally insignificant. In this

particular case, where the subjects reported problems with localizing the stimuli,

overall task difficulty may have masked the subtle pinna effects. Generally, however,

these findings lend support to Freedman & Fisher (1968). They showed that for

elevation judgements there was an improvement for pinnae over no pinnae (as in

Chapters 6 and 9). However, as mentioned above, using individualized pinnae over

nonindividualized pinnae did not improve accuracy further.

Further support for Freedman & Fisher's findings are the large angle errors in Chapter

4. Initially, Freedman & Fisher's results seemed questionable because of their

untypically high errors (overall average of around ±34° without head movements).

However, the even higher mean angle error (±43°) in Chapter 4 reinforces their result.

These sets of results could be an indicator of judgement accuracy using pinna cues

alone in the vertical plane. Although it may also be that pinna cues are not being fully

utilised in this particular task. When Freedman & Fisher incorporated head

movements, their accuracy with pinna in the vertical plane rose to 22.5°. Thus with

the addition of cues such as head movement. the role of the pinna could be critical.

12.2.3 Stimulus type

Controlled comparisons of different stimulus types; speech, clicks and white noise

were reported in Chapters 8 and 10. Chapter 8 found that for azimuth, speech

produced the highest accuracy and noise the lowest, although this difference was not

statistically significant. Elevation showed the opposite effect, with noise producing

higher accuracy than either clicks or speech, but this result was also statistically

insignificant. The results are surprising given that the majority of subjects found the

click stimulus significantly more difficult to localize than either the noise or speech

sounds. Part of this result might be explained by expectation. A speech sound is a

known stimulus and we are familiar witl;! its movement around us in the horizontal

plane. But for elevation, the untypical behaviour of a person's voice varying with

height only, and being presented from higher elevations than usual, may lead to

greater difficulty in placing the sounds. However, it should be noted that the

171


differences between stimulus types are subtle and not significant and may therefore be

due to individual variation.

Chapter 10, which compares the same 3 stimuli as Chapter 8, but for azimuth only

and in the free-field, also showed that clicks gave the greatest accuracy. When

comparing other experiments in the thesis, those using noise (e.g. Chapters 6 and 7,

roughly 24 ° and 20° on average) have given higher errors than those using clicks (e.g.

Chapters 4 and 5 - overall means of 15° and 19°). However, it is only possible to

make these comparisons between experiments with certain similar methodological

features. Variables such as response method (discussed below in section 12.2.4) have

a strong effect that overrides stimulus type. Thus, for experiments using a similar

response method. this thesis has generally found clicks to give the lowest errors for

azimuth judgements. For elevation. noise gives marginally greater accuracy than

clicks or speech.

12.2.4 Response method

Investigations into the effects of response method have highlighted an important

variable in localization studies. Whilst published literature has used a wide variety of

response techniques (e.g. Stevens & Newman, 1936, categorical; Pollack & Rose,

1967, head alignment; Wenzel et ai, 1993, verbal co-ordinate reporting; Lovelace &

Anderson, 1993, hand pointing), the effects of different response methods was

undetermined.

Experiments reported in Chapters 4 to 11 have either adopted a categorical method or

have allowed subjects to make a 'free', non-categorical judgement. Categorical

judgements, by their very nature, give cues to the locations of the targets and hence

guide the subjects' judgements. A non-categorical method of eliciting responses

allows for a completely unknown number and placement of sound sources, resulting

in 'true' placement of the perceived sound locations. Knowledge of the speaker

positions may have been the reason for obtaining such a huge improvement in

accuracy for the categorical method (from ±20° to ±8°), in a controlled comparison in

Chapter 7. Chapter 8 also showed a reduction in error from 24° to 15° when using a

categorical method for making azimuth judgements. For elevation judgements,

however, the effect was minimal, with a difference of just 2°. Subjects reported

considerable task difficulty when judging elevation which is likely to have

172


outweighed the method of response. Experiments in this thesis have shown response

method to be highly influential for localization in the horizontal plane. These

findings should make response technique a fundamental element in future localization

research.

12.2.5 Visual stimuli

A number of acoustic and non-acoustic cues have been isolated, manipulated and

investigated in an attempt to increase accuracy. Yet none of these either alone, or in

combination, appear to offer sufficient localization cues to obtain free-field accuracy

(e.g. Makous & Middlebrooks, 1990, between ±1.6° and ±16° on average). It was not

until vision was included (Chapter 10) that there was a significant drop in error. For

pre-recorded sounds the addition of a visual context increased accuracy from ±ll ° to

±4°. Free-field presentation of the sounds reduced the error further to ±0.3°. The

discrepancy between pre-recorded sounds with a visual link and free-field

presentation may be a result of using nonindividualized pinnae. Although previously

this has been shown to have only a small effect (Chapter 4). It may also be due to

more subtle effects produced by the KEMAR' s artificial ear canal, or hollow torso

cavity. Nevertheless, providing a visual context or link to the sound sources

drastically reduces judgement error. even for sounds recorded using a manikin and

presented over headphones.

The work of researchers like Jackson (1953) and my own free-field study, where

vision was dominant, shows that the visual and auditory system must complement

each other and work together to avoid confusion. At close visual and auditory

stimulus deviations (up to at least 30°) vision will be dominant, and even when the

deviation between an auditory and visual stimulus is 90°, vision can have some effect.

Thus it is not possible to implicate only an auditory cueing system. that may conflict

or fail to be guided by vision. where 100% accuracy is required.

12.2.6 Live/recorded stimuli

A KEMAR has been used to assess localization accuracy in the majority of

experiments reported. An important contribution of this thesis has been the

examination of the process of recording stimuli using this technique. There is a clear

discrepancy in the literature between studies of localization that have used simulated

173


3-dimensional sounds and those that have been conducted in the free-field. The latter

has produced results far more accurate (e.g. Stevens & Newman, 1936; Makous &

Middlebrooks, 1990) than those using pre-recorded or generated stimuli (e.g.

Wightman & Kistler, 1989; Wenzel et ai, 1993; Chapters 4 - 9 & 11). Although the

free-field might incorporate a very different set of acoustic cues to a 'virtual' auditory

environment, these could never be identified absolutely because the free-field studies

differed a great deal in methodology.

So what is it about the simulation process that inevitably results in higher errors? By

presenting sounds 'live' through the manikin in Chapter 9, instead of making digital

recordings and playing back from a tape, it was possible to show that little

information (if any) is lost in the recording process. Although live presentation

improved accuracy by 5° overall, the effect was not statistically significant. This

demonstrates that the recordings are a hi-fidelity reproduction of the original signal.

Pinna cues were also investigated in this chapter (see section 12.2.2) and were found

to have little effect on the overall accuracy in the absence of other factors.

12.2.7 Head movements

Head movements were initially examined in a free-field set-up in Chapter 10. It was

hoped that the cues provided by head motion would produce the same high level of

accuracy as visual context cues. However, there was no significant difference in

accuracy between restraining the head in a clamp (±O.30) or allowing it to move freely

(±0.35°). However, there was a strong floor effect and so the lack of statistical

significance is likely to be a methodological artefact.

In Chapter 11, head-tracked HRTF's were used to account for head movements. This

is an accurate representation' of the technique used in VR systems and was expected to

gi ve a good indication of potential accuracy. For each of 3 different head motion

conditions the Head Tracker was either switched on or off. It was therefore possible

t~ asses not 'only whether head movements aid localization but if certain types of

movement are preferred. Overall accuracy was low (±200) and there was only a 2°

improvement for head-tracked over non head-tracked conditions. But where subjects

moved their head 'naturally' (as opposed to a specified controlled movement or no

movement at all) there was a large (statistically significant) improvement in accuracy,

for head-tracked over non head-tracked stimuli - from ±21 ° to ±14.9°.

174


Clearly. head movements will only reduce error significantly where subjects are

allowed to move their heads as is natural and typical for them. Nevertheless.

incorporating head motion using HRTFs. without a visual context. does not produce

the near-perfect accuracy of free-field listening with a visual context (Chapter 10).

The ability to monitor all 3 dimensions of head movement may slightly improve

judgement accuracy. For .head-tracked 3D audio sounds only azimuth (side-to-side

rotation) and elevation (up-down tipping) movements were accounted for, roll

(pivoting) was ignored due to software limitations. The decision to omit roll, rather

than either azimuth or elevation, was based on findings by Thurlow et al (1967), who

found roll to be the least performed or necessary movement in making localization

judgements.

12.3 IMPLICATIONS FOR VR

In a system where auditory warning cues work alone, this thesis has shown that

potentially large misjudgements will undoubtedly occur. What is apparent from the

findings is the significance of including a visual context at the very least. Certainly in

terms of producing almost perfect accuracy. which is paramount in safety-critical

situations. auditory localization is not sufficient as a sole cue to location.

The problems encountered in VR are not helped by the limitations of the equipment.

The resolution of head-tracked HRTF generated stimuli will almost certainly cause

problems for A VR unless it can be refined. No doubt these refinements will be

achieved in the relatively near future, since VR is a fast-moving field. What is

required of the technology is at least a 10 resolution to give a more realistic

representation of the way a sound behaves when we move our head. This will also

give a more accurate assessment of the effects of head movements on accuracy.

Training of pilots might improve matters, although the effects of training may be

minimal. This is because rnany other factors in cockpits will hinder the available

localization cues. Cockpit noise. excess auditoz:y information and headphone quality

may limit localization performance to a degree that perhaps even extensive training

cannot compensate. Certainly, for situations where the user has little or no time to

become accustomed to the equipment and sound sources, then the localization errors

175


reported in Chapter II should be taken as true indicators of the accuracy that can be

expected for auditory virtual reality sounds.

A VR systems might be effective for drawing attention to instruments in the cockpit

(where vision is available) but not remote unseen objects or targets. especially where

front-back errors may intrude.

12.4 PROPOSALS FOR FUTURE WORK

Attempts have been made to study the fundamental processes that contribute to

auditory localization. Rigorous investigations of acoustic and non-acoustic cues have

been successful in increasing our understanding of how we localize sounds. But

despite the progress made, the work is by no means complete. As ever there remain

some puzzling aspects of localization which require further investigation if they are to

be resolved. It is hoped that the work set out in this thesis has laid the groundwork for

future research within this field.

Throughout the experiments some suggestions for future directions and also some

specific areas of necessary research have been made. Outlined below is a summary of

these ideas and some new proposals based on the conclusions drawn in the sections

above (12.2.1-5).

I. Experiments in a cockpit set-up with the relevant visual and tactile cues.

Applying the major findings of this thesis to a simulated cockpit environment

will provide conclusive information about the validity of the results. It will

allow important research to be geared within the relevant visual and spatial

setting and may reveal many new problems as well as solutions to old

problems.

2. A critical factor is to implement the auditory equipment that would actually be

available in simulated cockpit environments in order to work with the existing

facilities and adapt localization cues accordingly. For example, if the

headphones used tended to attenuate high frequency sounds, thus reducing the

176


pilots front-back discrimination, then high frequency components of the sound

could be boosted to compensate.

3. Taking account of cockpit noise is essential in attempting to produce a useful

set of auditory cues. It is likely that the range of optimal signal types outlined

in this thesis will need to be rethought in light of the new auditory

environment. The fairly long-duration broadband signal that give the greatest

accuracy with head movements allowed may become lost in a consistently

noisy background. Studies on masking will undoubtedly be of use to such

research.

4. Research into the effects of high stimulus intensity on localization. Loud

cockpit noise will require a fairly intense signal, which must still remain

within a safe threshold. This area of research is much needed since no

information is available about the effects of near-threshold sounds on acuity.

The problem of stimulus intensity is strongly linked with the problems

surrounding stimulus type. The solution may be to analyse the noise in

different types of cockpits and generate a signal that is optimised in such noise

and could therefore be less intense. Perhaps even work into noise cancellation

would aid the process if intense signals were found to significantly hinder

localization accuracy.

5. Equipment refinement.

a) Experiments using head-tracked HRTF generated 3D stimuli should be

conducted with a software capable of superior resolution - at least 10.

It may well be that some institutions are already developing or in

possession of such equipment, in which case it will be readily available

in the near future. A realistic representation of how a sound behaves as

we move our head is critical if insight is to be gained into the role of

head movements using head-tracking equipment.

b) Also linked to refined technology is the need for full tracking of a

subjects' head motion during such experiments. The software utilised in

this thesis could only account for head motion in two of the three

dimensions of azimuth, elevation and roll. Thus a choice was made to

incorporate azimuth and elevation, based on the findings of Thurlow et

al (1967) that found roll to play a minor part. Nevertheless, roll may

177


have a small effect and give a more realistic experience for listeners,

thus boosting acuity. It is essential to replicate, as accurately as

possible, everyday listening conditions.

178

References

REFERENCES

Attneave F (1959) Applications of information theory to psychology. Henry Holt and

Company - New York.

Batteau D W (1967) The role of the pinna in human localization. Proceedings of the

Royal Society 168, 158-180.

Begault D R & Wenzel E M (1991) Headphone localization of speech stimuli.

Proceedings of the Human Factors Society. San Francisco CA. September.

Bekesy G (1960) Experiments in Hearing. McGraw-Hill, New York.

Blauert J (1969) Sound localization in the median plane. Acustica, 22,205-213.

Blauert J (1983) Spatial hearing: The psychophysics of human sound localization.

MIT Press: Cambridge, MA.

Butler R A (1969) Monaural and binaural localization of noise bursts vertically in the

median sagittal plane. Journal of Auditory Research 3, 230-235.

Butler RA & Humanski (1992) Localization of sound in the vertical plane with and

without high-frequency spectral cues. Perception and Psychophysics, 51 (2),

182-186.

Coleman P D (1962) Failure to localize the source distance of an unfamiliar sound.

Journal of the Acoustical Society of America, 34 (3), 345-346.

Durlach N I & Col burn HS (1978) Binaural Phenomena in "Handbook of

Perception", edited by E Carterette, Academic, New York, Vo!. IV

Durlach N I et al (1992) On the externalization of auditory images. Presence, 1 (2),

251-257.

179

References

Edwards E (1969) Infonnation transmission. An introductory guide to the application

of the theory of information to the human sciences. Chapman and Hall.

Freedman S J & Fisher H G (1968) The role of the pinna in auditory localization. In:

Freedman S J (Ed) "The neuropsychology of spatially oriented behaviour."

Dorsey Press, Homewood, Illinois.

Gardner M B & Gardner R S (1973) Problem of localization in the median plane:

effect of pinnae cavity occlusion. Journal of the Acoustical Society of America,

53 (2), 400-408.

Gelfand S A (1990) Hearing. An introduction to psychological and physiological

acoustics. Marcel Dekker, Inc.

Giguere C & Abel S (1993) Sound localization: Effects of reverberation time, speaker

array, stimulus frequency and stimulus rise/decay. Journal of the Acoustical

Society of America, 94 (2), 769-776.

Good M D & Gilkey R H (1996) Sound localization in noise: The effect of signal-to

noise ratio. Journal of the Acoustical Society of America, 99 (2), 1108-1117.

Hake H W & Gamer W R (1951) The effect of presenting various numbers of discrete

steps on scale reading accuracy. Journal of Experimental Psychology, 42, 358-

366.

Hartmann W M & Wittenberg A (1996) On the externalization of sound images.


Hebrank J & Wright D (1974a) Are two ears necessary for localization of sound

sources on the median plane? Journal of the Acoustical Society of America, 56,

935-938.

Hebrank J & Wright D (1974b) Spectral cues used in the localization of sound

sources on the median plane. Journal of the Acoustical Society of America, 56,

1829-1834.

180

References

Hirsch I J (1971) Masking of speech and auditory localization. Audiology, 10, 110-

114.

Jackson C V (1953) Visual factors in auditory locazlization. Quarterly Journal of

Experimental Psychology, 5, 52-65.

Loomis J M, Hebert C & Cicinelli J G (1990) Active localization of virtual sounds.

Journal of the Acoustical Society of America, 88, 1757-1764.

Lopez-Poveda E A (1996) The physical origin and physiological coding of pinna

based spectral cues. Doctoral Thesis, Loughborough University.

Lovelace E A & Anderson D M (1993) The role of vision in sound localization.

Perceptual and Motor Skills, 77,843-850.

Makous J C and Middlebrooks J C (1990) Two dimensional sound localization by

human listeners. Journal of the Acoustical Society of America, 87 (5) 2188-

2200.

Middlebrooks J C, Makous J C & Green D M (1989) Directional sensitivity of sound

pressure levels in the human ear canal. Journal of the Acoustical Society of

America, 86, 89-108.

Middlebrooks J C & Green D M (1991) Sound localization by human listeners.

Annual Review of Psychology, 42, 135-\'59.

Miller G A (1956) The magical number seven plus or minus two: some limits on our

capacity for processing information. Psychological Review, 63 (2), 81-97.

Mills A W (1958) On the minimum audible angle. Journal of the Acoustical Society

of America, 30 (4), 237-246.

Musicant A & Butler R (1984) The influence of pinnae-based spectral cues on sound

localization. Journal of the Acoustical Society of America 75, 1195-1200.

Old field S R & Parker SPA (1984a) Acuity of sound localization: a topography of

auditory space. I. Normal hearing conditions. Perception 13, 581-600.

181

References

Old field S R & Parker SPA (I984b) Acuity of sound localization: a topography of

auditory space. II. Pinna cues absent. Perception 13, 601-617.

Perrott D R (1984) Concurrent minimum audible angle: A re-examination of the

concept of auditory spatial acuity. Journal of the Acoustical Society of America,

75 (4),1201-1206.

Pick H L, Warren D H & Hay J C (1969) Sensory conflict in judgements of spatial

direction. Perceptual Psychophysics, 6, 203-205.

Pollack I & Rose M (1967) Effect of head movement on the localization of sounds in

the equatorial plane. Perceptual Psychophysics, 2, 591-596.

Lord Rayleigh (1907) Our perception of sound duration. Philosophical Magazine, 13,

214-232.

Sandel TT, Teas D C, Feddersen W E & Jeffress L A (1955) Localization of sound

from single and paired sources. Journal of the Acoustical Society of America,

27, (5) 842-852,

Sayers B & Cherry C (1957) Mechanism of binaural fusion in the hearing of speech.

Journal of the Acoustical Society of America, 29, 973-987,

Schlegel P A (1994) Azimuth estimates by human subjects under free-field

and headphone conditions. Audiology 33, 93-116.

Searle D et al (1975) Binaural pinna disparity: another auditory localization cue.


Shaw E A G & Taranishi R (1968) Sound pressure generated in an external ear replica

and real human ears by a nearby point source. Journal of the Acoustical Society

of America, 44, 240-249.

Shelton B R & Searle C L (1978) Two determinants of localization acuity in the

horizontal plane. Journal of the AcousticalSociety of America, 64 (2), 689-691.

182

References

Shelton B R & Searle C L (1980) The influence of VISIOn on the absolute

identification of sound-source position. Perceptual Psychophysics, 28, 589-596.

Shelton B R, Rodger J C & Searle C L (1982) The relationship between, head motion

and accuracy of free-field auditory ·localization. Journal of Auditory Research,

22,1-7.

Siegel J A & Siegel W (1972) Absolute judgement and paired-associate learning:

Kissing cousins or identical twins? Psychological Review, 79 (4), 300-316.

Stevens S S & Newman E B (1936) The localization of actual sources of sound.

American Journal of Psychology, 48, 297-306.

Thurlow W R & Runge P S (1967) Effect of induced head movements on localization

of direction of sounds. Journal of the Acoustical Society of America, 42, 480-

488.

Thurlow W R et al (1967) Head movements during sound localization. Journal of the

Acoustical Society of America, 42 (2), 489-493.

van Soest J L (1929) Richtungshooren bij sinusvorrnige geluidstrillingen [Directional

hearing of sinusoidal sound waves]. Physica 9,271-282.

Wallach H (1939) On sound localization. Journal of the Acoustical Society of

America, 10, 270-274.

Wallach H (1940) The role of head movements and vestibular and visual cues in

sound localization. Journal of Experimental Psychology, 27, 339-368.

Wenzel E M, Wightman F L & Kistler D J (1991) Localization with non

individualized virtual acoustic display cues. In: Proceedings of CH! '91, ACM

Conference on Computer-Human Interaction. New York: ACM Press, pp 351-

359.

Wenzel E M et al (1993) Localization using non individualized head-related transfer

functions. Journal of the Acoustical Society of America, 94, (1) 111-123.

183

References

Wightman F L & Kistler 0 J (1989) Headphone simulation of free-field listening. 2:

Psychophysical validation. Journal of the Acoustical Society of America 85 (2),

868-878.

Wightman F L, Kistler 0 J & Perkins M E (1987) A New Approach to the Study of

Human Sound Localization. In Directional Hearing W A Yost & G Gourevitch

(Eds.) Springer-Verlag.

Woods W S & Kulkarni A (1992) Some examples of binural recordings with KEMAR

in anechoic and reverberant environments. Unpublished, Department of

Biomedical Engineering, Boston University.

Wright D, Hebrank J H & Wilson B (1974) Pinna reflections as cues for localization.


Young P T (1931) The role of head movements in auditory localization. Journal of

Experimental Psychology, XIV, 2, 95-124.

184

Appendix I: Pinnae photographs

APPENDIX 1

Pinnae Photographs

Photographs of the pinna supplied with the manikin and examples of the pinna moulds

made for use in this thesis.

The photographs show the following:

1. The standardized rubber pinnae supplied with the KEMAR. Shown are models

DB-065 (the larger, red, right pinna mould) and DB-06l (the smaller, pink, left

pinna mould).

2. Pinnae of subject AC, moulded using the technique described in Chapter 3. The

pinnae of volunteer AC were typically used for 'nonindividualized' pinna

conditions.

3. The 'infills' used as a no-pinna condition. These fit flush with the KEMAR's

head and were used in place of pinnae in a number of experiments.

185

Right

Knowles-manufac tured

KEMAR pinnae

DB-06S

Cm.

186

Appendix 1: Pinnae photographs

1.

2.

3 .

4 .

S. Knowles-manufactured

6. KEMAR pinnae

DB-061

Cm.

Cm.

Left Infill

1 .

2.

3.

4.

s. 6.

Appendix 1: Pinnae photographs

AC (Left)

2.

3.

4.

5.

6.

Right Infill

Moulded pinnae of subject AC (used for all nonindividualized pinnae conditions) (top)

and infills (bottom).

187

Appendix 2: Calculation of transmitted information

APPENDIX 2

Calculation of transmitted information.

A typical subject's responses are given below:

RES P ON S E S

1 2 3 4 5~ 6 7 8 9

S P 1 5 1

T 0 2 5 1

S 3 3 2 1

M 4 1 4

U T 5 5 1

L 6 3 3

U 0 7 1 4 1

S N 8 2 4

9 2 4

TOTALS 13 4 2 4 6 1 8 15 1

1.74 bits

188

TOTALS

6

6

6

6

6

6

6

6

6

GRAND TOTAL

54

Appendix 2: Calculation of transmitted infonnation

Calculation of transmitted information (HT) is based on 3 ancillary measures:

HS: Stimulus total frequencies are divided by the grand total (the sum of all stimulus

and response totals). This gives the value of P;.

HS = I P;log2P,

HR: Response total frequencies are divided by the grand total yielding the value of

P;.

HR = S-P;log2P;

HSR: Individual response frequencies are divided by the grand total to give P;

(ignoring empty cells since they give zero values).

HSR = S-P;log2P;

HT = HS + HR - HSR. In this case transmitted information is 1.74 'bits' per stimulus.

2HT: Gives the number of alternative positions that can be identified without error.

189

Appendix 3: Headphone and Tubephone Responses

APPENDIX 3

Responses of the headphones and tUbephones to a click.

00'9GlLI

OOL£S9I

OO'SliSSI &" -0 00'6SISI 0

'" c OO'OLlivl 0 -" 0.. OO'HIL£l '" .0 ~ f- 00'Z60£l

OO'£OvZl

OO'vILI[

OO'SWll

&" OO'S££Ol -0 0 00'9v96 '" c: 0 OO'LS68 17 -" 0.. c:

-0 00'8908 " '" " " <::r' :r: 00'6LSL " - -" '" >, 00'0689 ~

t 00'lOZ9

OO'ZlSS

00'£Z8V

OO'v£[v

OO'Slili£

bO 00'9sa " Cl 00'L90Z 0

" OO'SL£l c: '50 '~ 00'689 0

f OO'l

C> C> C> C> C> C>

0 C> C> C> q C> 0 0 0 ~

C> C> C> ~

C> C> ~

~ ~

(liP) aplQ!ldwv

190

Appendix 4: Trial Ordering

APPENDIX 4

Trial ordering- Chapter 8.

The following sequence was looped to form a continuum.

Subject Trial I I Trial 2 I Trial 3 I Trial 4 I Trial 5 I Trial 6 I Az I click Az I noise I Az I chips El I click El I noise Ell chi~

2 El I click El I chips El I noise Az I click Az I chips Az I noise 3 Az I noise Azl chips Az I click Ell noise Ell chips Ell click

4 Ell noise Ell click Ell chips Azl noise AzI click Az/chi~

5 AzI chips Az I click Az I noise Ell chills Ell click Ell noise

6 Ell chips Ell noise Ell click Az I chips Az/noise Azl click 7 El I click El I noise El I chips Az I click Az I noise [ Az I chips

8 Az I click Az I chips Az I noise El I click El I chips El I noise

9 El I noise Ell chips Ell click Az I noise Azl chips Azl click 10 AzI noise AzI click I Az I chips Ell noise Ell click El/chi~

II Ell chips Ell click Ell noise AzI chips AzI click Azl noise 12 AzI chips Az I noise Az I click Ell chips Ell noise I Ell click

191

Appendix SA: Subject Instructions

APPENDIX SA

Subject Instructions

You will be listening to sets of sounds (clicks) over the tubephones (the experimenter will assist you with inserting the tubephones). 6 sets will be presented in total- 3 now and 3 after a 10 minute break.

In front of you are your first 3 sets of response sheets. These are either for making "azimuth" or "elevation" judgements (the experimenter will tell you which ones you have and will demonstrate what they represent). Each set is made up of 25 click sounds which are spaced 5 seconds apart. Note that each response sheet comprises 25 individual small diagrams. Use a separate diagram for each of the 25 clicks. Make your response by placing a cross anywhere in or on the circle, that corresponds to where you think the sound is coming from. Try to judge each sound as quickly and accurately as possible. If you have difficulty, then make the best guess you can. If you do miss a sound, leave the diagram blank and move onto the next one.

Whilst you are listening it is important that you keep your head still and fixed on the cardboard spot on the wall in front of you. You can move your head to make your response, but return your head to the central position as quickly as possible. Remember that the sounds are only 5 seconds apart, so you will need to make your response fairly quickly.

When you are ready, the booth door will be closed and the sounds will begin after a few seconds. The door is not locked and you are free to leave at any time should you feel uncomfortable.

192

Appendix 5B: Subject Instructions

APPENDIX 58

You will be presented with 54 clicks which are spaced at 4 second intervals. You must listen carefully and try to judge the location of the clicks as accurately as possible.

You should try to match your perception of each sound with one of the 9 target locations shown on the diagram in front of you. Then you must record that position next to each stimulus number on your response sheet (numbered 1 to 54).

The stimuli are spaced at 4 second intervals, so you will need to make your response fairly promptly and prepare for the next sound. Try to keep your head as still as possible and pointing straight ahead whilst you are listening to the sounds.

The booth door is not locked and you may leave at any time if you feel uncomfortable.

193

Appendix se: Subject Instructions

APPENDIX se

You will hear sets of white noise bursts (a "shhh" sound) which are I-second in duration. You must try to judge the location of each of the sounds as accurately as you can. You should make your response by placing a cross on the diagram in front of you - a separate diagram should be used for each judgement you make.

There are two sets of judgement tasks - 'azimuth' and 'elevation'. For azimuth, there are 28 noise bursts in total and your response sheet has 28 corresponding diagrams. For elevation, there are 56 sounds in total, which again match the number of diagrams on your response sheet. Your experimenter will inform you which of these two tasks you will complete first. When the first trial is over, you will be given the response sheet for the second set of sounds.

It is imperative that whilst listening, you keep your head still and facing straight ahead. You can move to record your responses, but should return to a forward-facing position as soon as possible. The sounds are spaced at 4-second intervals, so you will need to make your responses quickly.

Try to respond to all sounds, and if you have difficulty then make the best judgement you can.

The booth door is not locked and you may leave at any time should you feel uncomfortable,

194

Appendix 5D I: Subject Instructions

APPENDIX 50.1

For the categorical response method.

You will be presented with 4 sequences each comprising 25 sounds. The 4 sequences will each have different intervals between the sounds. For some sequences the sounds will be presented very quickly, for others the interval between sounds may be longer. Whatever the spacing, it will be regular for the whole sequence.

You must try to judge the location of each sound source. You should keep your head still and pointing straight ahead whilst you listen, although you can move your head to record your response.

You should try to match, as accurately as possible, your perception of the sounds with one of the target letters that represent positions - shown on the diagram in front of you. Then you must record that position (letter) next to each stimulus number on your response sheet, which is numbered from 1 to 25.

Once you have heard 25 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the delay between sounds will be different. You will receive 4 response sheets in total.

The booth door will be shut before the experiment commences. However, it is not locked and you are free to leave at any time should you feel uncomfortable.

195

Appendix 5D2: Subject Instructions

APPENDIX 50.2

For the 'free' (non-categorical) response method.

You will be presented with 4 sequences each comprising 25 sounds. The 4 sequences will each have different intervals between the sounds. For some sequences the sounds will be presented very quickly, for others the interval between sounds may be longer. Whatever the spacing, it will be regular for the whole sequence.

You must try to judge the location of each sound source, as accurately as possible. You should keep your head still and pointing straight ahead whilst you listen, although you can move your head to record your response.

For the first stimulus you hear, put a "1" on your response diagram that matches where you think the sound came from. For the second stimulus you hear, put a "2", and for the third, a "3" etc. up to "25". If two stimuli appear to come from the same place, just put the number underneath/next to the first number, to form a diagonal (the experimenter will explain this more fully).

Once you have heard 25 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the delay between sounds will be different. You will receive 4 response sheets in total.

The booth door will be shut before the experiment commences. However, it is not locked and you are free to leave at any time should you feel uncomfortable.

196

Appendix 5El: Subject Instructions

APPENDIX 5E.1

For the 'free' response method. The red text was omitted for subjects not using the guidance ring in the booth.

Thank you for taking part in this experiment.

You will be presented with a set of 14 sounds (either clicks, white noise, or the word "chips"). You must listen carefully and try to judge the location of the sound source as accurately as possible. It may help to close your eyes whilst listening and it is imperative that you keep your head still and pointing straight ahead at the 0° mark at the onset of each sound. Using the ring surrounding you as guidance, you should write your response on the diagram in front of you.

For the first stimulus you hear, put a "1" on the diagram that matches where you think the sound came from (there is no need to put your response next to one of the markers). For the second stimulus you hear, put a "2", and for the third, a "3" etc. up to "14". If two stimuli appear to come from the same place, just put the number underneath/next to the first number, working in towards the centre, so that the location is the same, but the distance from the head gets closer (distance is not a variable in this experiment) - see diagram below.

The stimuli are spaced at 5 second intervals, so you will need to make your response fairly promptly and prepare for the next sound.

Once you have heard 14 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the stimulus sound will be different.

There will be 3 azimuth trials and 3 elevation trials. The experimenter will clarify the response procedure and trial details with you as the experiment progresses.

The booth door is not locked and you are free to leave at any time should you feel uncomfortable.

197

Appendix SE I: SUbject Instructions

350 0 10 340 20 30

14- 40

2-310 S 50

b

280 80

270 0 90

12-

260 100

110 11

~ \3 p

130 230

9 140 'P 3

200 160 190 180 170

198

Appendix SE.2: Subject Instructions

APPENDIX 5E.2

For the categorical/guided method of response.

Thank you for· taking part in this experiment.

You will be presented with a set of 14 sounds (either clicks, white noise, or the word "chips"). You must listen carefully and try to judge the location of the sound source as accurately as possible. It may help to close your eyes whilst listening and it is imperative that you keep your head still and pointing straight ahead at the onset of each sound. You should write your response on the sheet in front of you.

You should try to match, as accurately as possible, your perception of the sounds with one of the target positions shown on the diagram in front of you. Then you must record that position next to each stimulus number on your response sheet (1 to 14) (see diagram below for a specimen). The stimuli are spaced at 5 second intervals, so you will need to make your response fairly promptly and prepare for the next sound.

Once you have heard 14 sounds, the experimenter will come in and give you a new sheet. You must then repeat the procedure, except that the stimulus sound will be different.

There will be 3 azimuth trials and 3 elevation trials. The experimenter will clarify the response procedure and trial details with you as the experiment progresses.

The booth door is not locked and you are free to leave at any time should you feel uncomfortable.

199

Appendix SE.2: Subject Instructions

Stimulus Response

1

2

3

4

5

6

7

8

9

10

1 1

12

13

14

200

Appendix SF: Subject Instructions

APPENDIX SF


You will be listening 'live' to a laboratory setting and your task is to judge (as accurately

as you can) the location of a number of specified sounds within that setting. The list

below shows the order in which those sounds are played. For each stimulus you should

write down where you perceive the sound source to be located (by using the stimulus

number - I to 6). The sequence will be played twice and you will be told when to listen

for the second set. Two response sheets are provided to record your answers.

The stimuli will occur at 15 second intervals which should leave you plenty of time to

respond to each sound. It is important that you try to keep your head as still as

possible and pointing straight ahead whilst you listen. It may also help to close your

eyes whilst listening.

It is important to note that whilst the experiment is taking place, you are likely to hear a

number of other sounds. You should not make a note of these. The 'other' sounds may

include typing, phone rings, printer, door opening and talking. Try to concentrate on

and listen specifically for the sounds you must locate.

After the experiment you will be fully debriefed.

The booth door is not locked and you are free to leave at any time should you feel

uncomfortable.

You will now be played a tape of each of the sounds to familiarise you with them. After

this, the experimenter will come in to see if you have any queries. The experiment will

then commence.

P.T.O.

201

Appendix SF: Subject Instructions

Stimulus sounds to listen for:

Stimulus Number Sound

Metronome clicks (4 in total) ........................................... "2" .............. ···· .. ···· .. ············ .. I·········· .. ·H'aii·a .. crap(nil·q·ulck"successlo·iiy··········· ............................................ ")" .......... ···· .. ··_····· .. ················1···························"Xylopli"o·ii·e··C':(strnces)"""························· ············································4·················· .. ·······················1·····································Papec·tearl"ii-g·························· ........... .

·············································5···················· .. ·······················:···········Miije··vOlce·sayTii-g·ihe··word··;;-Eill"ps·';·········· ············································K·········································1····························s·iilicil·of"Reys··cii·itrl"ilg························ ..

202

Appendix 5G: Subject Instructions

APPENDIX 5G


You will be presented with 6 sets of 14 sounds (white noise - a "shhhhhhh" sound). You must listen carefully and try to judge the location of the sound source as accurately as possible.

You should write your response on the sheet in front of you.

For the first stimulus you hear, put a "1" on the diagram that matches where you think the sounds came from. For the second stimulus you hear, put a "2" and for the third, a "3" and so on, up to "14". If two sounds appear to come from the same place, just place the number underneath/next to the first number,

. working in towards the centre in a diagonal (so that the position is the same, but the distance from the head gets closer (distance is not a variable in this experiment).

You will be given a new response sheet for each of the 4 trials.

The sounds are spaced 6 seconds apart, so you will need to make your response promptly and then re-position your head to the central position (yellow dot), fixing your eyes on the cross, and prepare for the next sound.

For two trials you will be asked to keep your head as still as possible whilst listening to the sounds. For two you should move your head to the right as soon as the sound begins (the experimenter will demonstrate this to you). For the other two you can move your head freely, to do whatever you feel helps you to judge the sound more accurately.

Reminders of the procedure will be given at the beginning of each trial to guide you.

Due to equipment faults, the first sound on every trial you do must be ignored. Start recording number 1 from the second sound you hear.

203

Absolute auditory object localization · ITD: KEMAR: KHz: LVP: lIT MVP: Pa: secs: SPL: YR: List of...

Documents

Transcript of Absolute auditory object localization · ITD: KEMAR: KHz: LVP: lIT MVP: Pa: secs: SPL: YR: List of...