Disruption of Short term Recognition Memory for Tones ...psych.cf.ac.uk/home2/jones_dylan/1997...

22
Disruption of Short-term Recognition Memory for Tones: Streaming or Interference? D.M. Jones, W.J. Macken, C. Harries University of Wales, Cardiff, U.K. A sequence of auditory stimuli interpolated between the initial presentation of a tone and a comparison tone impairs recognition performance. Notably, the impairment is much less with interpolated speech than with tones. Six experiments converge on the conclusion that this pattern of impairment is due more to the organization of the interpolated sequence than to its similarity to the to-be-remembered standard. Factors that contribute to the coherence of the interpolated sequence into a stream distinct from the initial tone are primary determinants of the level of impairment. This is demonstrated by manipulating factors that contribute to the coherence of the interpolated sequence by the action of temporal, spatial, timbral, and tonal attributes. However, the relative immunity of recognition performance to the interpolation of unprocessed digit sequences is not explained wholly by such coherence. The judgement of the similarity of pitch between two successive test tones is critically dependent on the presence and nature of irrelevant sound interpolated between the tones. Two interpretations of this effect are contrasted in this paper: One considers that the disruption is produced by interferenceÐ that is, on the similarity of representations within memory of the test tones and the irrelevant tones; the other supposes that the interference is the product of the degree to which the interpolated material is perceptually integrated with the test tones. These issues are addressed using a methodology ® rst employed by Deutsch (1970). The paradigm is one in which memory for the pitch of tones over a period of a few seconds is assessed. Subjects ® rst hear a standard tone, followed by a sequence of sounds that they are instructed to ignore, and then, after a short interval, a comparison tone. They are asked to judge whether the standard tone and the comparison tone are of the same or different pitch. The main variable of interest is the different types of sound in the interpolated sequence. In the study by Deutsch (1970), interpolated spoken digits and interpolated tones produced markedly different effects on the pitch judgement task: Speech produced errors on 2% of trials, but tones produced THE QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1997, 50A (2), 337± 357 Requests for reprints should be sent to Dylan Jones, School of Psychology, PO Box 901, University of Wales, Cardiff, CF1 3YG. Email: [email protected] The work reported here was supported by a project grant from the Economic and Social Research Council. Thanks are due to Seth Schwartz for his help in running Experiment 1. The development of the experimental series bene® ted from discussions with Clive Frankish. Revisions to the manuscript were helped greatly by Robyn Boyle and Karen Howes. Ó 1997 The Experimental Psychology Society

Transcript of Disruption of Short term Recognition Memory for Tones ...psych.cf.ac.uk/home2/jones_dylan/1997...

Disruption of Short-term Recognition

M em ory for Tones: Stream ing or Interference?

D.M . Jones, W.J. M acken, C. Harries

University of Wa les, Ca rdiff, U.K.

A sequence of auditory stimuli interpolated between the initial presentation of a tone and a

com parison tone impairs recognition performance. Notably, the impairment is much less w ith

interpolated speech than with tones. S ix experiments converge on the conclusion that th is

pattern of impairment is due m ore to the org anization of the interpolated sequence than to its

sim ilarity to the to-be-remem bered standard. Factors that contribute to the coherence of the

interpolated sequence into a stream distinct from the initia l tone are prim ary determinants of

the level of impairment. This is dem onstrated by manipulating factors that contribute to the

coherence of the interpolated sequence by the action of tempora l, spatial, tim bral, and tonal

attributes. However, the relative immunity of recognition performance to the interpolation of

unprocessed digit sequences is not explained wholly by such coherence.

The judgement of the sim ilarity of pitch between two successive test tones is critically

dependent on the presence and nature of irrelevant sound interpolated between the tones.

Two interpretations of this effect are contrasted in this paper: One considers that the

disruption is produced by interferenceÐ that is, on the sim ilarity of representations

within memory of the test tones and the irrelevan t tones; the other supposes that the

interference is the product of the degree to which the interpolated material is perceptually

integrated w ith the test tones. These issues are addressed using a methodology ® rst

employed by Deutsch (1970). The paradigm is one in which memory for the pitch of

tones over a period of a few seconds is assessed. Subjects ® rst hear a standard tone,

followed by a sequence of sounds that they are instructed to ignore, and then, after a short

interval, a comparison tone. They are asked to judge whether the standard tone and the

comparison tone are of the same or different pitch . The main variable of interest is the

different types of sound in the interpolated sequence. In the study by Deutsch (1970),

interpolated spoken dig its and interpolated tones produced markedly different effects on

the pitch judgement task: Speech produced errors on 2% of trials, but tones produced

THE QUARTERLY JOU RNAL OF EXPERIM ENTAL PSYCHOLOGY, 1997 , 50A (2), 337 ± 357

Requests for reprints should be sent to Dylan Jones, School of Psychology, PO Box 901 , University of Wales,

Cardiff, CF1 3YG. Email: jonesdm@ cardiff.uk.a c

The work reported here was supported by a projec t g rant from the Economic and Social Research Council.

Thanks are due to Seth Schwartz for his help in running Experim ent 1. The development of the experim ental

series bene ® ted from discussions with Cliv e Frankish . Revisio ns to the manuscript were helped greatly by Robyn

Boyle and Karen Howes.

Ó 1997 The Experimental Psychology Society

errors on 32% of trials. The lower level of disruption by speech led Deutsch (1970; see

also Deutsch, 1984, for a summary) to suppose that recognition memory was related to

the degree of sim ilarity between the standard tone and the interpolated material.

The very marked difference produced by speech and non-speech also suggested to

Deutsch (1970) that speech and non-speech are functionally distinct w ithin memory. This

interpretation was reinforced by Pechmann and M ohr (1992), who v iewed their ® ndings

using the Deutsch (1970) paradigm as supporting a modi® ed working-memory model in

which non-speech and speech were functionally distinct, stored respectively in a ``tonal

loop’ ’ and an ``articulatory loop’ ’ . By both these accounts the marked degree of inter-

ference by interpolated tones is explained in terms of retroactive interference with the

standard tone by the subsequent interpolated tones, with the suggestion that this inter-

ference is especially low in the case of speech becau se distinctiveness is not merely

structural but may also be functional.

The main thrust of the present paper is that the results of Deutsch (1970) and

Pechmann and M ohr (1992) can also be explained in terms of a processing system within

which representations in memory are organized according to the principles of auditory

scene analysis (Bregman, 1990). Such an approach focuses on the characteristics of

sounds that lead to their grouping and coherence. For exam ple, temporal proxim ity or

spectral sim ilarity can be used to organize materials into perceptual streams correspond-

ing to distinct entities in space and over time. It is argued that such organization is

re¯ ected in the represen tation within memory, which, in turn, has repercussions for

the accessibility of events within and between streams. Thus, organizational factors will

not on ly separate different stream s of sound so that they are distinct in phenomenal

terms, but will also determine organ ization in memory and consequently determine the

ease of retrieval from memory. This ob ject- and stream-formation framework has already

proved usefu l in the context of interference effects in short-term serial recall (e.g. Jones,

M acken , & M urray, 1993). The curren t series of experiments seeks to extend its

applicability to the phenomena of interference with pitch memory as investigated by

Deutsch (1970).

From this streaming standpoint, two factors not controlled in the experimental para-

digm used by Deutsch (1970) may have resulted in a misleading picture of how inter-

polated items interfere with memory for pitch. By controlling for these factors in new

experiments, resu lts consistent with an auditory scene analysis framework and at variance

with the interference framework should emerge. The ® rst factor is that in the original

paradigm the tim ing of the standard tone in relation to the interpolated sequence of tones

encouraged the perception of the standard as an element of the interpolated sequence.

Speci® cally, the interval between the standard tone and the ® rst interpolated tone was the

same as that between successive interpolated tones (300 msec). This timing is likely to

lead to all those items (both standard and interpolated) being organized into a single

higher-order perceptual unit. Failure to isolate the standard tone would prejudice the

process of matching it to the comparison tone and therefore lower the recognition rate.

This is based on the generalization that the more attributes successive stimuli have in

common, the more likely they are to be grouped togetherÐ a phenomenon that leads to

dif® cu lty in isolating information about single events within a sequence (see Bregman &

Rudnicky, 1975, for an illustration). For example, if a sequence of stimuli share both

3 3 8 J O N E S , M A C K E N , H A R R I E S

timing and timbre, even though they may differ in pitch, the stimuli will be bound

together more ® rmly than if they shared only timing. With speech, therefore, the group-

ing of the standard tone with the interpolated items will be less compelling, as it is distinct

in timbre, though not in tim ing, from the interpolated sequence. The effect of interpol-

ated tones can therefore be accounted for in part by the dif® culty of segregating percep-

tually the standard tone from the tones in the interpolated sequence with which it is

grouped. If this account of the greater disruption produced by interpolated tones is

correct, increasing the interval between the standard and the ® rst stimulus of the inter-

polated set should signi ® can tly reduce disruption.

The second confounding factor is that in the comparison of speech with non-speech by

Deutsch (1970) the degree of acoustic variation of the interpolated materials was not

controlled appropriately. As a result, the dig it sequence contained items that were not

only distinct from the standard tone, but were also more distinct spectrally from one

another than were the items within the sequence of interpolated tones. Additionally, in the

materials used by Deutsch, the tone sequences varied in frequency over an octave, but the

speech stimuli shared a common fundam ental frequency. Speech by its nature d iffers

acoustically from pure tones; however, given that an important distinction rests on this

confounding, it seems reasonable to argue that a more exacting analysis of the effects of

different characteristics of speech is required. Speci ® cally, such a test should match the

variation in both speech and non-speech stimuli along one dimension while controlling all

other stimulus dimensions. For example, it might be desirable to match the frequency of

tones to the fundamental frequency of each speech stimulus. In this way the degree of

variation in pitch w ithin the interpolated sequence wou ld be varied to the same degree for

the d ifferent classes of material.

The following sequence of experiments explores the pattern of interference in pitch

memory as investigated by Deutsch (1970) and Pechmann and M ohr (1992) with a view to

establish ing the applicability of a stream ing fram ework to the results.

E X P E R I M E N T 1

Experiment 1 is a replication (w ith m inor variations) and extension of the main conditions

of Deutsch (1970). The central aim of this experiment was to test two propositions: ® rst,

that the timing of the standard tone in relation to the sequence of interpolated tones will

determ ine, in part, the degree of disruption via its effect on grouping; and second, that

when the differences within each type of interpolated sequence are made comparable as

far as is practicable, sequences of speech when compared to sequences of non-speech will

produce equivalen t degrees of disruption of memory for tones. W hen both of these

factors are controlled, it is argued, a more accurate picture of the mechanisms of disrup-

tion of memory for pitch will emerge.

Two forms of manipulation were exam ined in Experiment 1. One involved changing

the interval between the standard and the ® rst item in the interpolated sequence.1

Two

conditions were contrasted: In one case, the interval was the same as that separating the

M E M O R Y F O R T O N E S 3 3 9

1Thanks are due to Peter Bailey for pointing to an unpublished study reported as a footnote in the paper of

Deutsch (1978 ) in which the results of a similar experiment are described brie¯ y.

items in the interpo lated sequence (350 msec), in the other it was appreciably longer

(1500 msec). On the basis of the streaming hypothesis, it was predicted that recognition

memory for the standard would be superior when the interval between the standard and

the interpolated list is longer, but that this improvement may depend on the role of other

factors (including the acoustic composition of the interpolated list).

In order to investigate the role of the composition of the interpolated list and its joint

action with the standard-list interval, four types of interpolated sequence were used. Two

of these were also used in the Deutsch (1970) experiment: pure tones at different fre-

quencies and sequences of random digits. Two further conditions were added: In the ® rst,

a sequence of cello notes having the same pitch as the sequence of pure tones was used to

compare the effect of having the interpolated sequence distinct in timbre from both the

standard and the comparison sounds. Semal and Demany (1991) found that manipulating

the timbre of the interpolated series did not improve recognition performance; however,

in their version of the Deutsch parad igm , the interval between the standard and the ® rst

item of the interpolated series was longer (at 500 msec) than the interval between mem-

bers of the interpolated set (300 msec), thus possibly serving as a basis for segregating the

standard . Additional cues to segregation based on the timbre of the interpolated set may

not have been useful under these conditions. This suggests that the ro le of timbre will be

revealed by its interaction with the interval between the standard and the ® rst item of the

interpolated list. A lthough listeners trade temporal and timbral information, w ith in limits,

it seems plausible that when tim ing does not serve as a strong basis for segregation (as in

the 350-msec interval condition of Experiment 1) the effect of distinctiveness of timbre

will be made more likely. A sim ilar logic applies to the second additional condition in

which a single token of the spoken digit ``one’ ’ is repeated.

Changing the timing between the standard and the interpolated list would only affect

conditions in which the interpolated items were tones, not those comprising speech or

cello notes, as the timbral difference between these interpo lated items and the standard

tone means that the standard is not likely to be grouped with them anyway.

In Experiment 1 there are slight differences to the procedure adopted by Deutsch

(1970) and Pechmann and M ohr (1992)Ð nam ely that all the sounds were 350 msec long

(not 200 msec) and in the interpolated sequence there were nine items, rather than six,

separated by 350 msec, rather than 300 msec. These values were adopted following pilot

studies and were designed to prevent effects of timing between the standard and the

interpolated sequence being masked by ceiling effects. These changes are relatively minor,

and it seems safe to regard such differences between the studies as unlikely to prejudice

the hypotheses under test.

M e th o d

S u b j e c ts

M ale and female student volunteers were recru ited for the study. E ach reported normal hearing.

Subjects were screened for their ability to discrim inate between pure tone stimuli a semitone apart. A

criterion of 7 correct discrim inations in 10 presentations was used. Of the subjects, 8 failed to meet

the criterion; 20 subjects undertook the full experimental procedure after passing the test.

3 4 0 J O N E S , M A C K E N , H A R R I E S

A u d i t o r y M a t e r i a l s

All sounds were recorded and edited digitally to 16-bit resolution and sampled at 48 K Hz.

The sounds were tape-recorded onto a high-quality cassette tape recorder for playback during the

experiment.

Each trial consisted of a standard pure tone (lasting 350 msec), followed, after an interval of either

350 msec or 1500 msec, by an interpolated sequence of nine item s, each of which was 350 msec long

and separated from its neighbours by silence of 350 m sec. Following a further interval of 1500 msec,

the com parison tone (lasting 350 msec) was presented. Care was taken to tailor, by digital editing, the

rise and fall time of the tones so that no artefactual clicks were produced (each tone rose and fell its

full amplitude in 50 m sec). An interval of 10 sec followed the compar ison tone, during which the

subject was required to mark the judgement ``same’ ’ or ``d ifferent’ ’ on a response blank. On half the

trials standard tones and comparison tones were the sam e; on the other half they were a semitone

apart (randomly either higher or lower).

Four types of interpolated sequence were assembled: (a) a sequence of pure tones drawn randomly

from the set of 12 semitones in the range C4 (262 Hz) to B4 (494 Hz); (b) spoken digits, the integers

1 to 9, recorded individually in a male voice in a monotone at a pitch corresponding to C4 (262 H z)

and thereafter assembled using digita l editing into a quasi-random sequence in which no digit was

repeated; (c) a repeated syllable spoken in a random sequence of fundamental frequencies in the sam e

range as for tones; in this case, a single token of the word ``one’ ’ w as spoken at a pitch corresponding

to C4 (262 Hz) and was subsequently transformed using digital signal processing techniques using

Digidesign’ s S oundE dit software so that the same token was played at each of the pitches correspond-

ing to those of the pure tones without a change in the duration of the stimulus; (d) bowed cello notes

at fundamental frequencies in the same range as that for tones (taken from com mercially recorded

digital samples using Sa mpleCell software). The pitches of standard and test tones were chosen from

those not already selected as interpolated items.

Each of the speech tokens was edited to last 350 msec (thus making them the same physical length

as the non-speech tokens) by means of a routine in Digidesign ’ s S oundE dit software in which the

length of the token could be reduced w ithout any changes in any other feature such as the pitch of the

token. Individually, transformed tokens were not perceptibly different from untransformed tokens.

P r o c e d u r e

Subjects were given written instructions in which they were asked to judge the sim ilarity of the

standard and the comparison tones by responding ` s̀am e’ ’ or ``d ifferent’ ’ as quickly and accurately as

possible. They were asked to ignore the interpolated material and were told explicitly that they would

not be tested on its contents. The cond itions were presented in random order from trial to trial, and

each subject undertook 160 trials in all, 20 in each condition, half the trials having the correct

response ` s̀am e’ ’ and half having the correct response ``different’ ’ .

R e s u l t s a n d D i s c u s s i o n

Percentage correct responses for this and all other experiments in the series are given in

Table 1. S tatistical analysis by ANOVA revealed no overall effect of the interval between

the standard and the interpolated list, F (1, 19) = 4.02, p > .05. There was however a

signi ® cant effect of interpolated material, F (3 , 57) = 28.52, p < .0001, and a signi ® cant

interaction between interpolated material and interval, F (3, 57) = 4.31, p < .01.

M E M O R Y F O R T O N E S 3 4 1

As Deutsch (1970) and Pechmann and M ohr (1992) found, the difference between

tones and the digit sequence was highly signi ® cant, both in the 350-msec and the 1500-

msec interval conditions: F (1, 57) = 89.44, p < .0001, and F (1, 57) = 23.27, p < .001,

with tones showing much greater disruption of recogn ition . However, the contrast

between the cello sequence and the repeated syllable sequence was clearly non-signi ® cant,

F < 1 in both the 350-msec and the 1500-msec interval conditions. Interpolated tones led

to poorer performance than the repeated syllable sequence for the 350-msec interval,

F (1, 57) = 13.76, p < .001, but not for the 1500-msec interval, F < 1. The same pattern

was also obtained with the comparison between interpolated tones and interpolated cello:

F (1, 57) = 7.74, p < .01 for the 350-msec interval, and F < 1 for the 1500-msec interval.

In sum , although speech made up of a sequence of spoken digits shows the effect found

by Deutsch, when speech and non-speech are matched in terms of between-stimulus

acoustic change within a sequence, the difference between speech and non-speech is non-

signi ® cant.

3 4 2 J O N E S , M A C K E N , H A R R I E S

T A B L E 1

S u m m a r y o f R e s u l t s f o r t h e E x p e r i m e n t s i n t h e S e r i e s

Interva l

350 msec 1,500 msec

Experiment 1 tones 57 67

digits 82 80

repeated syllable 67 66

cello 64 68

Experiment 2 9-tones 60

17-ton es 67

Experiment 3 same ear 52

different ear 60

both ears 60

Experiment 4 digits/common F0 82

digits/changing F0 65

one digit/changing F0 74

Experiment 5 cello 62

digits 86

instrument/repeated 66

instrument/changing 56

Experiment 6 tone 63

digits 84

reversed digits 79

glides 73

In relation to the effect of tim ing, the bene ® cial effect on recognition of extending the

interval between the standard and the interpolated sequence from 350 msec to 1500 msec

was restricted to the case of interpolated tones, F (1, 19) = 14 .45, p < .0004. Extending the

interval from 350 msec to 1500 msec had no effect on sequences with interpolated cello,

interpolated repeated syllable, or interpolated spoken digits, F (1, 19) = 1.69, p > .05,

F (1, 19) = 0.08, p > .05, and F (1, 19) = 0.70, p > .05, respectively. Overall, these results

support the streaming hypothesis and identify the role of confounding factors in the

original Deutsch (1970) study. The results also speak to the interplay of tim ing and timbre

in stream ing: Only when the standard was bound perceptually to the interpolated list by

timing did the action of timbre serve to isolate the standard, thereby enhancing recogni-

tion performance.

One possible alternative interpretation is that increasing the time interval between the

standard and the interpolated sequences might provide subjects with add itional time to

encode more accurately the standard tone before it is degraded by the subsequent inter-

polated tones, leading to improved p itch recognition. This seems unlikely. The interval of

350 msec is longer than that wh ich usually gives rise to auditory backward recognition

masking (e.g. Kallman & M assaro, 1979), and so an exp lanation of the high level of

disruption found with interpolated tones in these terms would seem to be implausible

(see also Kallman , Cameron, Beckstead, & Joyce, 1987). Furthermore, the longer interval

of 1500 msec is within the time range that gives rise to the perception of single isolated

auditory events, rather than grouped items (Fraisse, 1978).

Overall, the results of Experiment 1 are most plausibly interpreted in terms of an

auditory scene analysis approach, within which speech and non-speech have similar

effects. F irst, there was no difference in terms of the disruption produced by matched

speech and non-speech sounds. Second, the effect of the timing of the standard and the

interpolated sequence was lim ited to tones, implying that the original ® nding of Deutsch

(1970) was subject to the effect of perceptual grouping. The results of Experiment 1

suggest strongly that speech and non-speech sounds are equipotent in their capacity to

disrupt memory for sound, and that speech is not a suf® cient condition for low inter-

ference. In the next two experiments, we further explore the role o f grouping manipula-

tions in determining pitch recognition performance.

E X P E R I M E N T 2

The effects of timing of the standard tone in Experiment 1 point to the importance of the

organization of the interpolated list. This indicates the possible value of construing the

disruption of pitch memory in the Deutsch (1970) paradigm in terms of the processes by

which items are organized into higher-order units, or streams, rather than in terms of a

degradation in the representation of the standard tone by retroactive interference due to

the similarity of the interpolated list to the standard tone. A further test of this proposi-

tion is mounted in Experiment 2 by manipulating the number of tones in the interpolated

sequence. We compare the case in which 9 tones are presented in the interpolated

sequence (as in Experiment 1) with the case where the number of tones is increased to

17, without changing the interval within which these tones are presented. From the

viewpoint of retroactive interference, it would be expected that an increase in the number

M E M O R Y F O R T O N E S 3 4 3

of similar stimuli should lead to an increase in interference, wh ich wou ld be signalled by a

decrease in recognition performance. But if, as is strongly suggested by Experiment 1,

grouping processes in¯ uence the degree of disruption produced by interpolated tones,

the effect of nearly doubling the num ber of interpolated tones (while keeping the total

duration of the interpolated sequence constant) will be to isolate perceptually the inter-

polated tones from the test tone. As the num ber of interpolated tones is doubled, the

standard no longer shares the inter-item tim ing of the sequence as a whole. Therefore, the

streaming hypothesis makes the opposite prediction to that of the interference hypo-

thesis Ð n am ely that nearly doubling the number of tones in the interpolated interval

will increase the likelihood of correct recognition.

M e th o d

S u b j e c ts

Twenty-one underg raduates, each of whom reported normal hearing and passed the tone test

descr ibed in Experim ent 1, volunteered to take part and were paid an honorarium . N ine potential

subjects failed the tone test.

M a te r i a l s a n d P r o c e d u r e

Only tones were used as interpolated materials in this study, but the overall nature of the task and

the frequency range of the tones were identical to the corresponding conditions of Experiment 1.

The control condition was identical to the 350-msec interpolated (` t̀ones 350’ ’ ) tones condition of

Experiment 1. In the experim ental condition the test tone was separated by 350 msec from an

interpolated set of 17 tones, ne arly tw ice the number used in the control cond ition. In no case

was a tone immediately followed by another of the same frequency w ithin the interpolated sequence.

Although in the 17-tone condition one tone ® n ished as another star ted, the rise and fall tim es of the

tones (as in Experiment 1), coupled with a change of frequency at each tone in the sequence, meant

that they were phenomenally separate and distinct. In both cond itions the com parison tone occurred,

as in Experiment 1, 1500 msec after the end of the interpolated list.

In each condition, 20 trials were presented, with ``same’ ’ and `̀ d ifferent’ ’ responses equal in

number appearing in random order. The procedure was as in Experiment 1.

R e s u l t s a n d D i s c u s s i o n

Results were analysed in terms of percen tage of correct responses. Performance in the

pure-tone control condition (60% correct) was comparable to that found in the ``tones

350’ ’ condition of Experiment 1 (in which the level was 57% correct). In addition,

recognition performance was signi ® cantly better with 17 interpolated tones (67% correct)

than with 9 interpolated tones, F (1, 20) = 6.05, p < .025.

The results of Experiment 2 were directly in line with the stream ing hypothesis and at

variance with the retroactive interference hypothesis: Almost doubling the number of

interpolated tones did not increase disruption; rather, it was reduced. The effect was one

roughly comparable in magnitude to that found by extending the interval between the

standard and the interpolated tones in Experiment 1. Any account based on retroactive

3 4 4 J O N E S , M A C K E N , H A R R I E S

interference by interpolated sounds would be hard-pressed to predict that an increase in

the number of interpolated tones would improve recognition of the test tone.

Although the results of Experiment 2 support the general thrust of the argument

developed in Experiment 1, there is residual ambiguity about the outcome of Experiment

2, stemming from a confounding of interstimulus interval and the number of interpolated

stimuli. As the number of interpolated stimuli was doubled, the interstimulus interval

within the interpolated list was reduced from 350 msec to zero, but the interval between

the standard and the ® rst item of the interpolated sequence remained at 350 msec. This

ambiguity cannot be resolved fully; therefore subsequent experiments of the current

series of experiments seek evidence using other stimulus manipulations that help con-

verge on the stream ing hypothesis.

Thus far, two different techniques of manipulating stream ing have provided conver-

gent evidence for the usefulness of the aud itory scene analysis fram ework in exam ining

disruption of memory for pitch in the Deutsch paradigm . Experiment 3 seeks to extend

this theme using another type of streaming manipulation.

E X P E R I M E N T 3

In this experiment the role of streaming processes in determining disruption of pitch

memory is exam ined by varying the spatial location from which the interpolated items are

presented. This represents an extension of a previous study by Deutsch (1978) in which a

comparison was made between conditions whereby standard and comparison tones were

presented monaurally, and interpolated tones were presented either to the same ear as

standard and comparison tones or to the other ear. Conditions in which the interpolated

tones were presented contralaterally to the standard and the comparison led to signi ® -

cantly better performance than cond itions where the interpolated tones were presented

ipsilaterally.

This result may be seen as supporting the interference view of the impairment of pitch

memory, in the sense that the degradation of the representation of the standard tone by

the subsequent occurrence of interpolated tones may be attenuated or elim inated by

assigning those interpolated tones to a different ear-speci® c represen tation.2

However,

another possible interpretation Ð one in line with a stream ing approach Ð is that spatial

location serves as a powerful segregation cue to disembed the standard from the early

items in the interpolated sequence. The conditions used by Deutsch (1978) do not allow

discrim ination between these two possibilities. In Experiment 3, the possib ility that the

® nding of Deutsch (1978) is better conceived of in grouping terms is explored by adding a

third condition to her design. As well as presenting interpolated tones both ipsilaterally

and contralaterally to the standard and comparison tones, we also include a condition in

which the interpolated tones are presented to both ears simultaneously. Because it is

presented binaurally, the interpolated sequence will be perceived as originating some-

where in the centre of the head, thus forming a distinct perceptual group relative to the

M E M O R Y F O R T O N E S 3 4 5

2In fact Deutsch (1978 ) in terpreted her results in an attentional rather than in a retroac tive in terference

framework . Her results are compatible with a retroactive in terference explanation, however.

standard . From the point of view of the streaming hypothesis, therefore, this condition

would be expected to give rise to an improvement in recognition equivalent to that found

with contralateral presentation of the interpolated tones. From the viewpoint of the ear-

speci ® c channel hypothesis, such a condition wou ld be expected to be sim ilar in its effect

to ipsilateral presentation, as interpolated tones are still p resented to the same ear as the

standard and the comparison and therefore enter the same ear-speci ® c channel.

M e th o d

S u b j e c ts

Twenty subjects were selected on the basis that they passed two screening tests. The ® rst was the

pitch discrim ination test used in Experiments 1 and 2. The second involved testing subjects for their

ability to discrim inate the location of presented tones. Subjects were played 15 tones: ® ve to the left

ear, ® ve to the right ear, and ® ve to both ears at once. They were required to m ake a three-alternative

forced-choice decision as to whether tone s occurred in the left, right, or centre locations. Only

subjects who scored 13 or m ore correct out of 15 went on to the experiment proper, and 7 potential

subjects were rejected as a re su lt of these two tests.

A p p a r a tu s /M a t e r i a l s

The 12 pure tones at sem itone intervals of the scale C4 to B4 were generated as described in

Experiment 1. T hese tones were assembled to provide three different types of trial sequence. The

tem poral features of all three types of sequence were identical, with a standard tone, lasting 350 msec,

followed after 350 msec by nine 350-msec interpolated tones, each separated from its neighbours by

350 msec. After a silent interval of 1500 msec at the end of the interpolated sequence, a comparison

tone, also lasting 350 msec, was presented. On half the trials, the comparison was the same pitch as

the standard; on the other ha lf, it d iffered by a sem itone (half were higher and half lower than the

standard). The frequencies of the tones in the sequence were selected in the same way as the tone

sequences described in Expe rim ent 1.

Three types of sequence were generated. In the same-ear cond ition, the standard was presented to

a single ear, followed by the interpolated sequence to the same ear, followed by the comparison tone

also to the same ear. In the different-ear cond ition, the interpolated tones were presented to a

different ear to that of the standard and the com parison tones. In the both-ears cond ition, standard

and comparison tones were both presented monaurally to the same ear, but the interpolated tone s

were presented stereophonically to both ears. The amplitude of the binaurally presented interpolated

sequences was set in pilot tria ls such that they were of equivalent loudness to the m onaurally

presented tones. Tones presented to a single ear were presented to the left ear on half the trials

and to the right ear on the other half of trials.

The sequences were stored as ``snd’ ’ resources on a M acintosh IIcx for playback in a predeter-

mined random order via a Hyperca rd program during the experiment. Subjec ts wore stereo head-

phones throughout the screening tests and experiment.

D e s i g n a n d P r o c e d u r e

Subjects were presented with 60 trials in all, 20 for each of the three conditions (sam e-ear,

d ifferent-ear, both-ears). Conditions changed randomly from trial to trial. In other respects the

procedure was the same as Experim ent 1.

3 4 6 J O N E S , M A C K E N , H A R R I E S

R e s u l t s a n d D i s c u s s i o n

Subjects’ responses were calculated in terms of percentage correct for each condition.

M eans were as follows: 52% in the same-ear condition; 60% in the d ifferent-ear condi-

tion; and 60% in the both-ears condition. There was a signi ® cant overall effect of con-

dition, F (2 , 38) = 4.82, p < .02. Planned comparisons revealed signi® cantly better

performance in both the different-ear and the both-ear conditions relative to the same-

ear cond ition , F (1, 38) = 7.00, p < .02, and F (1, 38) = 7.45, p < .01, respectively. There

was no difference between differen t-ear and both-ears conditions, F < 1.

These results converge with those of Experiments 1 and 2 in providing further support

for the streaming hypothesis of interference effects in pitch memory. If the marked level

of disruption found when interpolated tones are presented in the sam e spatial location as

the standard tone was due to interference w ithin an ear-speci® c channel, then the both-

ears condition should have led to the same degree of disruption as the same-ear condition.

Instead, the results support the argument that disruption effects in pitch memory may be

viewed usefully in the context o f the percep tual processes whereby items and events are

organized into higher-order perceptual groups, or streams (see, for example, Bregman,

1990 ). To the extent that such processes allow the standard and the interpolated tones to

be perceived as distinct entities, then recognition performance will be enhanced. The

results also serve to reinforce those of Kallman et al. (1987), who found the superiority of

the d ifferent-ear condition was restricted to a procedure in which conditions were blocked

rather than randomized. The results of the curren t series of experiments show that the

effect also occurs with randomized conditions.

Thus far, the results have suggested that the auditory scene analysis fram ework pro-

vides a useful way of approaching the ® ndings of Deutsch (1970, 1978) and Pechmann

and M ohr (1992). W ithin the present series, g rouping manipulations (by time, by timbre,

by location, and by num ber) have been shown to modulate the interference of pitch

memory cau sed by interpolated tones. In the three ® nal experiments we return to the

question, ® rst posed in Experiment 1, of why spoken dig its produce such low levels of

disruption. As has already been mentioned, such stimuli d iffer from the tone sequences in

poten tially critical ways: First, each item in the sequence is spectrally distinct from its

neighbours in a way that the tones are not; and second, the spoken digits do not change in

overall pitch in the way that the tones do. In the following experiments, the in¯ uence that

these factors may have on modulating the disruptive capacity of interpolated sequences is

investigated.

E X P E R I M E N T 4

A potentially critical difference between the interpolated tones and the digits used by

Deutsch (1970) was that whereas the tones varied in frequency from item to item , all

digits shared a common fundam ental frequency (that is, they were spoken at the same

overall pitch). In Experiment 4, we examine the possibility that this common overall

fundamental frequency contributes to the low level o f disruption of pitch memory found

with such digit sequences. From the streaming point of view, we may hypothesize that a

shared fundam ental frequency am ong items increases the likelihood that such items will

M E M O R Y F O R T O N E S 3 4 7

be grouped together coherently, thus facilitating the perceptual isolation of the standard

tone from the ® rst few items in the interpolated sequence. In Experiment 4, the effect of

interpolated digit sequences that share a common fundamental is compared with

sequences that change in fundam ental frequency from item to item. Chang ing the funda-

mental frequency should reduce the tendency for items to be grouped together coher-

en tly. From the streaming view, changing the fundam ental frequency from digit to digit

may reduce the strength with which such items may be grouped together separately from

the standard, therefore increasing the level of disruption of pitch memory. The technique

used in this experiment is to change the frequency of a token using digital signal process-

ing techn iques allow ing all other characteristics (such as duration and intonation) to

remain ® xed while the pitch is changed. Phenomenally the effect of this is to make the

digits sound as if they are spoken by different voices.

In order to exam ine these possibilities, Experiment 4 employs conditions in which the

content of the interpolated sequences is manipulated in three ways: (a) the usual spoken

digit condition as used in Experiment 1; (b) a condition that involves presentation of the

nine spoken digits, bu t in th is case each item is shifted in pitch to 1 of 12 semitone steps;

and (c) a single spoken digit (``one’ ’ ) shifted in pitch in the same way as in Condition (b).

Cond itions b and c, therefore, share a common range of pitches but differ in terms of the

variation in timbre; Condition c has a common timbre throughout the interpolated

sequence, but in Condition b the timbre varies. Repeating the same speech token, albeit

at d ifferent pitches, shou ld also increase the likelihood that such a sequence w ill form a

more coherent group than sequences that have both pitch and timbral change (as in

Cond ition b). Therefore, from the point of view of the grouping hypothesis, we would

expect Condition c to produce less disruption than Condition b. Thus, the conditions

exam ined in Experiment 4 may be seen within the streaming framework as representing

three levels of coherence in the interpolated items, with repeated digits sharing a common

fundamental being the most coherent, and pitch shifted versions of all the d igits the least

coherent.

M e th o d

S u b j e c ts

Eighteen sub jects participated in the experiment for an honorarium. They were selected on the

basis that they passed the pitch discrim ination test as described in Experiment 1; 4 potential subjects

were rejected on this basis.

A u d i t o r y M a t e r i a l s

Three types of sequence were constructed: (a) d igits 1 ± 9/common F0, identical to those used in

the ``digits 350’ ’ condition used in Experiment 1; (b) digits 1 ± 9/changing F0, in which each of the

digits 1 to 9 was digitally processed such that each had a fundamental frequency corresponding to one

of the semitone steps in the octave C4 to B4; and (c) single digit/changing F0, in which the spoken

digit ``one’ ’ was digita lly copied and pitch-shifted to each of the semitone steps of the C4 to B4

octave. The pitch sh ifting technique was the same as that used in Experim ent 1. The overall order of

the sequences was the same as that used in the 350-msec cond itions of Experiment 1.

3 4 8 J O N E S , M A C K E N , H A R R I E S

D e s i g n a n d P r o c e d u r e

Twenty sequences of each type were constructed and randomly ordered on a digital tape for

playback through headphones during the experiment. The general procedure was the same as used in

Experiment 1.

R e s u l t s a n d D i s c u s s i o n

Subjects’ responses were scored as percentage correct in each condition. The means for

the three conditions were as follows: digits 1± 9/common F0, 82%; digits 1 ± 9/changing

F0, 65%; and single-digit/changing F0, 74% . A repeated-measures ANOVA revealed a

signi ® cant main effect of condition, F (2, 34) = 18.78, p < .0001. Planned comparisons

indicated that digits 1 ± 9/common F0 led to better recogn ition performance than either the

digits 1 ± 9/changing F0 condition or the single-digit/changing F0 condition, F (1, 34) =

37.47, p < .0001, and F (1 , 34) = 7.85, p < .01, respectively. There was also a signi ® cant

difference between the digits 1± 9/changing F0 and the single-digit/changing F0 condi-

tions, F (1, 34 ) = 11.02, p < .005, w ith digits 1 ± 9/changing F0 producing a greater degree

of disruption of pitch recognition.

These results clearly ind icate that one feature of spoken digit sequences giving rise to

low levels of disruption is that the items share a common fundamental frequency. Chan-

ging the F0 breaks up the coherence of such sequences. Presenting such sequences with

each item at a different frequency leads to a substantial increase in their disruptive

capacity, to the order of a 17% increase in errors. The results also suggest that this is

due, at least in part, to the effect that changing the frequency from item to item has on the

strength w ith which such sequences form coherent groups. By repeating the same item at

different frequencies (single-d igit/changing F0), performance is improved relative to the

digits 1 ± 9/changing F0 by approximately 9% Ð an improvement that we suggest is due to

the increased likelihood with which sequences may be coherently grouped, facilitating the

perceptual iso lation of the standard tone.

E X P E R I M E N T 5

Experiment 4 focused on some of the aspects of speech sounds that may give rise to low

levels of disruption. In Experiment 5 attention is turned to non-speech sounds. In

particular, questions are posed relating to whether the low level of disruption produced

by spoken digits can also be produced by non-speech sounds that share the character-

istics of spectral change from item to item while retaining a common fundamental

frequency. W ithin the current series of experiments it has been shown already that

speech (repeated syllables) can be made to exhibit effects sim ilar to those of non-speech

sounds (repeated cello notes) in Experiment 1. For analytic purposes, it would be useful

also to demonstrate the complementary effect Ð that is, to show that non-speech sounds

can produce effects sim ilar to complex and varying speech. This would show that

speech was not a necessary condition, augmenting the evidence o f earlier experiments

in the current series that demonstrated that speech was not a suf® cient condition for

low levels o f disruption.

M E M O R Y F O R T O N E S 3 4 9

In Experiment 5, two conditions are introduced to examine this possibility. In one, the

interpolated sounds are made up of a range of different instruments playing the same

note, thus varying in timbre but not in pitch. This is contrasted with a condition in which

the same range of instruments play different notes: speci® cally, the range of pitch change

used in previous experiments in semitone steps w ithin an octave range. The comparison

of these two conditions should indicate whether change in pitch and timbral qualities

within a sequence of non-speech sounds gives rise to the same type of effect as has been

shown with speech sounds in Experiment 4. As control, conditions identical to the

``cello’ ’ and ``digits’ ’ (at 350-msec interval between the standard and the interpolated

sequence) conditions of Experiment 1 are also used.

In sum , Experiment 5 poses two questions: Does varying a common timbre augment

or reduce the group ing, and does an additional change of pitch reduce coherence still

further?

M e th o d

S u b j e c ts

Twenty subjects, screened by a test for tone discrim ination (see above), were paid an honorarium

for taking part. N ine potentia l sub jects were rejected on the basis of the pitch discrim ination test.

P r o c e d u r e

The general procedure was identical to that used in Experiment 2. Four conditions were used in

which interpolated materials were constructed from the following sounds: (a) cello, which was

identical to the `̀ cello 350 ’ ’ condition of Experim ent 1; (b) digits, identical to the ``digits 350’ ’

condition of Experiment 1; and two new conditions: (c) an instruments/repeated-pitch cond ition

comprising sequences of instrumentsÐ cello (bowed), guitar, French horn, saxophone, pipe-org an,

¯ ute, p iano, glockenspiel, trumpet Ð p laying the sam e note (C4) in random order; and (d) an instru-

ments/changing-p itch condition using the same randomly ordered sounds, but in add ition the notes

varied in the sam e range as those for ``cello ’ ’ Ð that is, 12 notes in semitone steps for the octave C4

and above.

The instrumental sounds were based on digitized samples of sound effects stored on compact disk

(D igidesign Sa mpleCell). T hese were subsequently pitch-shifted and edited (as in previous experi-

ments in this series) using Digidesign SoundTools digital signal processing software. T he tim ing of

the stimuli was as in the `̀ 350’ ’ conditions of Experiment 1 Ð that is, with the standard spaced

350 msec ahead of the ® rst interpolated sound and each of the interpolated sounds spaced at

350 msec.

Subjects were presented w ith 20 trials per condition, as in previous experiments.

R e s u l t s a n d D i s c u s s i o n

Results were analysed in terms of the percentage of correct responses in each condition.

The means for the four conditions were as follows: instruments/repeated-pitch, 66% ;

instruments/changing-pitch, 56% ; cello, 62%; and spoken digits, 86% . An overall

ANOVA showed the effect of the type of interpolated material to be highly signi ® cant,

3 5 0 J O N E S , M A C K E N , H A R R I E S

F (3, 57) = 26.89, p < .0001. Planned comparisons ind icated, just as shown with spoken

digits in Experiment 4, that changing the pitch from item to item in the instrument

sequences led to an increase in errors relative to the instruments that share a common

fundamental frequency, F (1, 57) = 5.56, p < .03, although the instruments/repeated-

pitch still p roduced more errors than the spoken dig its condition, F (1, 57) = 34.77, p <

.0001.

The resu lts of Experiment 5 also speak to the matter of timbre. The level of perform-

ance in the instruments/changing F0 condition was as poor as that found for tones in

Experiment 1. The main difference between this use o f instrumental sound and its use in

Experiment 1 is that timbre changed from one item in the interpo lated sequence to the

next. Added to the loss of coherence in timbre is the effect of change of frequency.

The substantial difference between the disruption produced by the instruments/

repeated F0 condition and the spoken digit condition indicates that common fundamental

frequency and spectral change from item to item in the interpolated sequence are not in

themselves suf® cient to give rise to the very low levels of disruption of pitch recognition

found with digit sequences. Nonetheless, the comparison between instruments/changing

F0 and instruments/repeated F0 in the present experiment, as well as the comparison

between spoken d igits and pitch shifted digits in Experiment 4, indicates that a shared

fundamental frequency am ong items leads to a decrease in disruption Ð probably, we

would argue, because a shared fundamental frequency among items increases the like-

lihood that such items will be grouped together coherently.

However, the question remains as to what other characteristics of spoken digit

sequences lead to very low levels of interference. One possib ility is that even though

each spoken digit is spectrally distinct from others, they still share many attributesÐ other

than fundamental frequency Ð which may also contribute to the likelihood of their being

grouped together. As the digits were all spoken in the same voice and therefore share

certain acoustic qualities in a way in which different instrument sounds do not, it seems

possible that such items are more likely to be grouped together. Such an explanation

would appear to be plausible in the light of previous results that have highlighted the

importance of grouping processes in modulating the disruptive capacity of interpolated

sounds (see Jones & M acken, 1995; Jones, M acken, & M urray, 1993). An alternative

explanation may be that the semantic content of the speech serves to bind the items

more coherently together. Thus there may be some ``top-down’ ’ processes that serve to

integrate the items more coherently into a separate perceptual group. We explore these

possibilities in the next experiment.

E X P E R I M E N T 6

Experiment 6 evaluates the effect of sequences of non-speech sounds constructed in a way

likely to increase their binding together of the items. Coherence within the interpolated

sequence was achieved by generating a continuous glide varying randomly in frequency.

Continuous glides were produced by ® rst low-pass ® ltering pink noise at a very low

frequency and then using this randomly varying signal to drive a voltage-controlled

oscillator. The oscillator then generated a sound with a pitch in proportion to the voltage

driving it. Th is glide was then interrupted regularly by shor t periods of silence. Hence, a

M E M O R Y F O R T O N E S 3 5 1

sequence of discrete items was created with each item pointing in frequency to the next

one in the sequence. Bregman and Dannenbring (1973) exam ined the effect of frequency

glides join ing two steady-state tones at different frequencies on the tendency for such

items to be formed into a single stream . They found that even when such glides contained

a silent interruption, they still increased the tendency for the sounds to be integrated into

a single stream . Thus, the way in which the glide at the offset of one sound pointed to the

glide at the onset of the next sound served to bind the tones more coherently together.

The glides used in Experiment 6 resemble those of Bregman and Dannenbring (1973) in

that the end of each glide segment points in frequency to the beginning of the next one. If

the tendency for interpolated items to be grouped together coherently is an important

determ inant of pitch recognition performance, then low levels of d isruption should be

expected for such sequences. An interpolated tone condition is also included as a baseline.

A clear prediction is that reversed digits will not have the semantic content of forward

speech, and therefore, if the meaning of the items is important in determining the effect

of interpolated digits, an effect on recognition memory by reversing the items in the

interpolated list would be expected. Forward and reversed digits share the same long-

term spectrum , but in reversing a spoken digit the ordering of periodic and non-periodic

features together with changes in the shape of the formant trajectories w ill mean that the

sequence will sound rather differen t to that in normal English. One possibility, therefore,

is that the ordering or trajectory of such features will reduce the coherence of the inter-

polated sequence, thereby decreasing the degree of disruption of recognition memory.

M e th o d

S u b j e c ts

M ale and female student volunteers were each subjected to the screening test described for

Experiment 1. Of the subjects, 5 failed the test, and the remaining 18 participated in the experiment.

P r o c e d u r e

As before, subjects were presented w ith 20 trials per condition. Four conditions, d iffering with

respect to the content of the interpolated list, were presented: tones, random digits, reversed random

digits, and pitch glides. The tones condition was identical to that used in the ``tones 350’ ’ cond ition of

Experiment 1. The digits were also those used in Experiment 1. D igits were reversed by m eans of

SoundDesigner software and edited to produce the same tim ing as in Experiment 1. T he glides

condition was constructed by generating a pitch glide and then editing silences so that 350 msec

of glide was alte rnated with 350 msec of silence.

To construct the pitch glides, p ink noise from a Cons ilium Industri noise generator (M odel PNG

11) was low-pass ® ltered by a Barr and Stroud ® lter (Type EF3) se t to pass frequencies be low 0.7 Hz,

with an increase in attenuation of 24 dB/octave above that point. The resu lting randomly varying

signal was ampli® ed and served as a control voltage for a Farnell DS61 oscillator. A s the voltage fed to

it var ied in a random manner, so did the pitch of the sound produced by the oscillator. Due to the

variation of loudness w ith pitch, the glides resu lted in a percept that varied in both pitch and

loudness. The range over which the pitch varied (the depth of modulation, covering roughly 500 ±

900 Hz with a mean of 650 Hz) was adjusted to give a discernible deg ree of variability while at the

3 5 2 J O N E S , M A C K E N , H A R R I E S

same time ensuring that the excursions of pitch at the lower bound to frequency did not lead to the

perception of silence due to the particular insensitivity of the ear to low frequencies (see Jones,

M acken, & M urray, 1993). This continuous glide was then edited. Periods of silence were produced

by editing the continuous glides using Digidesign SoundDesigner software. E ach set of interpolated

materia l contained a continuous glide from which portions had been rem oved and silence of the sam e

duration had been substituted.

R e s u l t s a n d D i s c u s s i o n

Subjects’ responses were calculated in terms of percentage correct for each type of

interpolated material. The means for each condition were as follows: tones, 63%; d igits,

84%; reversed digits, 79% ; and glides, 73% . An ANOVA carried out on the data revealed

a signi ® cant overall effect of interpolated material, F (3, 51) = 11.70, p < .0001. Planned

comparisons indicated the usual effect of greater disruption from tones than for d igits,

F (1, 51) = 31.87, p < .0001. However, reversing the digits had no reliable effect on their

disruptive capacity relative to forward versions of the same stimuli, F (1, 51) = 1.94, p >

.05 , nor was there a signi ® can t difference between reversed digits and glides, F (1, 51) =

2.37, p > .05 , although glides did produce signi® cantly more disruption than forward

digits, F (1, 51) = 8.60, p < .01. Furthermore, the glides produced signi ® cantly less

disruption than interpolated tones, F (1, 51) = 7.36, p < .01.

These results indicate clearly that semantic processing is not an appreciable factor in

the effect of interpolated digits, as reversed digits do not produce signi ® cantly more

disruption. This result also suggests that spectral features shared by individual items

uttered in the same voice serve to b ind items into a coherent perceptual group separate

from the standard tone, thus facilitating pitch recognition. However, the disruptive role of

different voices is not examined directly here and wou ld seem to be a usefu l line of

follow-up work. The importance of such cues to perceptual binding is also indicated by

the relatively low levels of interference produced by glides. It is argued that such glides

achieve coherence by virtue of the underlying continuity of the pitch glide, interrupted by

silence. That is, the impression of `̀ common trajectory’ ’ (or in Gestalt terms, ``good

continuation’ ’ ) is maintained, even in the discontinuous stimulus. Certainly, these glides

produce considerably less disruption than other non-speech stimuli used in previous

experiments. For example, in Experiment 4, whereas the instruments/repeated F0

sequences produced 20% more errors than spoken digits, the glides used in the present

experiment produced only approximately 11% more errors than spoken digits.

One possible confounding factor is the average p itch of the p itch glides. At 650 Hz, the

average is appreciably above that of the standard and higher than that of any of the

interpolated stimuli in the current experimental series. Although remote pitches in the

interpolated sequence have been associated with decreased disruption (Semal & Demany,

1991 ; Experiment 1), the level of performance w ith glides is still appreciably worse than

digits spoken at frequencies relatively close to the standard .

Taken together with the results of Experiments 4 and 5, Experiment 6 provides further

support for the app licability of the auditory scene analysis fram ework to disruptive effects

in pitch memory by indicating that such disruption may be determined by varying the

ex tent to which interpolated items form a coherent perceptual entity.

M E M O R Y F O R T O N E S 3 5 3

G E N E R A L D IS C U S S I O N

Before embarking on a discussion of the broader implications of the ® ndings, it may be

useful to summarize the main outcomes. Experiment 1 replicated the major feature of the

studies of Deutsch (1970) that a digit sequence produced much less interference than a

tone sequence when interpolated between the standard and a recognition test. The use of

a voice (repeated ``one’ ’ w ith each item in the sequence at a different pitch) alone was

insuf® cient to produce the low level of d isruption found w ith a string of digits; instead,

the effects were very much like those of a sequence of cello notes produced with the same

range of frequency variation. Speech is therefore not a suf® cient cond ition for the low

degrees of disruption typically found with unprocessed digit sequences. Increasing the

interval between the standard and the ® rst item in the interpolated set improved perform-

ance only for interpolated tones, in line w ith the idea that part of the dif® culty in making

the recognition judgement was isolating the standard from subsequent sounds rather than

retroactive interference by these sounds.

By examining the effect of nearly doubling the number of interpolated tones in

Experiment 2 , two contrasting predictions could be adjudicated: If the effect of inter-

polated sequences on memory performance was largely due to retroactive interference,

the increased number of interpolated tones should produce a sharp decline in recognition

performance, but if the effect was largely due to grouping, the interpolated list would be

more distinct from the standard, and recognition should improve markedly. The results

favoured the latter view. Experiment 3 further demonstrated the importance of grouping

processes by manipulating the spatial location of the interpolated tones relative to the

standard and comparison. Both binaural and contralateral presentation of interpolated

tones produced less disruption than ipsilateral presentation , but they did not differ from

each other in their disruptive capacity. The results therefore cannot be ascribed to the

action of an ear-speci ® c store; rather, they are in line with the idea that spatial location

may serve to differentiate the standard from the interpolated list. The results of Experi-

ment 4 suggested that an important feature of spoken digits leading to the low level of

disruption cau sed by such interpolated items is the common pitch of all items in the

sequence: D igit sequences sharing a common frequency caused less disruption than do

those changing in frequency (see further discussion ). Experiment 5 showed a similar

pattern with non-speech sounds. Interpolated sequences constructed from different

instrument sounds sharing a common fundamental frequency produced less disruption

of pitch recognition than did sequences of instrument sounds changing in fundamental

frequency from instrument to instrument. However, these stimuli still produced substan-

tially more disruption than did spoken digits. Reversing the digits Ð thus rendering them

meaningless wh ile maintaining a common spectrum Ð increased disruption, but not by an

appreciable amount. W hether this was an effect of reducing meaning is doubtful. It is

more likely that the process of reversal disrupted dynam ic cues to coherence. The

importance of such dynamic cues in binding a stimulus sequence was illustrated in

Experiment 6 by the relatively low levels of disruption produced by discrete frequency

glides lying on a common frequency trajectory.

Overall, these results h ighlight the importance of grouping processes in determ ining

how representations are formed in memory. It is the grouping of the material into a

3 5 4 J O N E S , M A C K E N , H A R R I E S

coherent whole, enabling the perceptual isolation of the standard, wh ich determines

subsequent recogn ition accuracy. The characteristics of the interpolated sequence will

determ ine the organization of its elements into a coherent whole. The notion of grouping

also ® ts neatly with other ® ndings in the area of pitch recogn ition , includ ing those of

Deutsch . For example, Deutsch (1980 ) found that interpolated sequences constructed

from monotonic ascending or descending sequences of tones produced less disruption

than d id the same tones ordered random ly. This outcome, we would argue, is due to the

effect such a man ipulation will have on the tendency for the interpolated items to form a

coherent perceptual group. In another experiment, Deutsch showed that this bene® cial

effect of monoton ic ordering of interpolated tones was attenuated by increasing the range

over which the items varied from one to two octaves. Such a manipulation would be

expected to decrease the tendency for coherent grouping, as frequency proxim ity is a

powerful cue to stream formation (Bregman, 1990).

Widening the spectra of the interpolated tones, thus cau sing them phenomenally to

sound more ``noisy’ ’ , has been shown to improve recognition performance compared to

pure tones (Semal & Demany, 1993, Experiment 4 ). Such effects of timbre are in line with

the results of the current series of experiments, but inconsistent with the results of an

earlier study by Semal and Demany (1991, Experiment 1). However, evidence from a

range of other paradigms converges to suggest the importance of timbre. For exam ple,

pitch and timbre do not completely dissociate if subjects have to categorize the pitch of a

single tone as quickly as possible (Crowder, 1989; Krumhansl & Iverson, 1992) or if

successive comparisons have to be made between tones with same or different timbres

(M elara & M arks, 1990; Singh & Hirsh, 1992). Given that stimulus features other than

timbre seem to point to the role of stream segregation in memory for pitch within the

Deutsch parad igm , it would be rather surprising on logical grounds if timbre were an

exception to the general rule, particularly in light of its pre-eminent role in perceptual

segregation in studies of auditory perception (see for example, van Noorden, 1975; Singh,

1987 ).

With a broad view of the present series of experiments, the evidence strongly suggests

that interpolated speech is not a suf® cient cond ition for low levels of disruption, but the

case that it is not a necessary condition has also not been fully ¯ eshed out. Granting a

special status for speech usually also invokes the notion of speech modules or special

modes o f processing (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967;

Limberman & M attingly, 1985 ), a step that lacks parsimony and is attended by a

number of logical dif® culties (see Pisoni, 1978, for an overview). Certain ly, the results

of Experiment 4 of the current series of experiments do not allow a strong version of

the modular action of speech to be entertained. In that experiment, speech stimuli were

shown to produce disruption sim ilar to that of tones and more than that of pitch glides.

This result alone is enough to cast doubt on the necessary status of speech: A strict

version of modularity would restrict its action to speech at an abstract level and would

be independent of particular instantiations such as those produced by a change of voice.

However, the case is not a conclusive one, and further work systematically assessing

factors such as spectral continuity is needed before speech is granted special status.

W hatever the precise status of speech, the presen t series points to the pre-em inent role

of organizational factors and, in more abstract terms, to the intimacy of the relationship

M E M O R Y F O R T O N E S 3 5 5

between perception and memory. Generally, in models of memory, it is assumed that

perception is logically prior to and separate from memory. An emerging alternative

view is the proceduralist position. As viewed by Crowder (1989, 1993), a leading pro-

ponent of th is view, the proceduralist position ``denies that information is retained in

memory stores in a sim ilar way to receptacles’ ’ ; instead, ` r̀eten tion is a natural con-

sequence of the information processing that was originally engaged by the experience

in question’ ’ (1993, p. 115). Although Crowder (1993) sought to align the proceduralist

view with a neuropsychological instantiation, it is entirely possible that instead of viewing

distinctiveness of representation anatomically, it can be viewed propositionally Ð that is,

in terms of some abstract representation of the rules of organization. It follows that those

rules governing organization determine the distinctiveness of representation in memory.

In fact, previous accounts of the procedural v iew of memory have explicitly excluded the

results of Deutsch (1970), Pechmann and M ohr (1992), and by inference those of Semal

and Demany (1991, 1993), regarding them as one of a small number of special cases

calling for a modular architecture. Indeed, Crowder drew speci® c attention to their

results, claim ing that they `̀ bespeak a tidy modu larity in memory’ ’ (1993, p. 124) as

they re¯ ect the action of sensory storage systems that fall outside the procedural mechan-

isms of short-term memory. On the basis of the foregoing experiments, exclusion from

the general proceduralist framework seems unwarranted.

The results of the foregoing series of experiments, by illustrating the importance of

grouping, have thrown into relief the importance of perceptual factors in determ ining

recognition performance. A challenge for subsequent work will be the ¯ eshing-out of the

factors that determ ine the coherence of sound sequences and their implications for

performance in a variety of memory tasks.

R E F E R E N C E S

Bregman, A.S. (1990). Auditory scene a na lysis. Cambridge, MA: M IT Press.

Bregman, A.S., & Dannenbring, G. (1973). The effect of continuity on auditory stream segregation.

Perception a nd Psychophysics, 13 , 308± 312.

Bregman, A.S., & Rudnicky, A.I. (1975) . Auditory segregation: Stream or streams? J ourna l of Experi-

menta l Psychology : Huma n Perception a nd Performance, 1 , 263± 267.

Crowder, R.G. (1989) . Imagery for musical timbre. J ourna l of Experimenta l P sychology : Human Percep-

tion a nd Performa nce, 15 , 472± 478 .

Crowder, R.G. (1993) . Auditory memory. In S. M cAdams & E. Bigand (Eds.), Thinking in sound: The

cognitive psychology of huma n a udition (pp. 112 ± 145). Oxford: Oxford University Press.

Deutsch, D. (1970) . Tones and numbers: Speci® city of interference in short-term memory. Science, 168,

1604± 1605.

Deutsch, D. (1978). Interference in pitch memory as a function of ear of input. Qua rterly J ourna l of

Experimenta l Psychology, 30A , 283± 287.

Deutsch, D. (1980) . The processing of structured and unstructu red tonal sequences. Perception and

Psychophysics, 28, 381± 389.

Deu tsch, D. (1984). M emory fo r nonverbal aud itory info rmation: A link be tween behavioral and

physiolog ical studies. In L.R. Squire & N. Butters (Eds.), Neuropsychology of memory (pp. 45 ± 54).

New York: Guilford.

Fraisse, P. (1978) . Time and rhythm perception. In E.C. Carterette & M .P. Friedman (Eds.), Handbook

of perception. Vol. 8: Perceptua l coding (pp. 189 ± 233). New York: Academic Press.

3 5 6 J O N E S , M A C K E N , H A R R I E S

Jones, D.M., & Macken, W.J. (1995) . Organizational factors in the effect of irrelevant speech: The role of

spatial location and timing. Memory a nd Cognition, 23 , 192± 200.

Jones, D.M ., M acken, W.J., & M urray, A .C. (1993). D isruption of visual short-term memory by

changing-state auditory stimuli: The role of segmentation. Memory and Cognition, 21 , 318± 328.

Kallman, H.J., Cameron, P.A., Beckstead, J.W., & Joyce, E. (1987) . Ear of input as determinant of pitch-

memory interference. Memory a nd Cognition, 15, 454± 460.

Kallman, H.J., & Massaro, D.W. (1979) . Similarity effects in backward recogn ition masking. J ourna l of

Experimenta l Psychology : Huma n Perception and Performance, 5 , 110 ± 128.

Krumhansl, C.L., & Iverson, P. (1992). Perceptual interactions between musical pitch and timbre.

J ourna l of Experimenta l P sychology: Huma n Perception and Performance, 18 , 739 ± 751.

Liberman, A.M., Cooper, G.S., Shankweiler, D.P., & Studdert-Kennedy, M . (1967). Perception of the

speech code. Psychologica l Review, 74 , 431± 461.

Liberman, A.M ., & Mattingly, I.G. (1985) . The motor theory of speech perception revised. Cognition,

21, 1 ± 36.

M elar a, R.D., & M arks, L.E. (1990). Perceptual primacy of d imensions: Support for a model of

dimensional interaction. J ourna l of Experimenta l P sychology : Huma n Perception & Performance, 16,

398± 414.

Pechmann, T., & M ohr, G. (1992). Interference in memory for tonal pitch: Implications for a working

memory model. Memory a nd Cognition, 20 , 78 ± 98 .

Pisoni, D.B. (1978) . Speech perception. In W.K. Estes (Ed.), Ha ndbook of lea rning a nd cognitive

processes: Vol. 6. Linguistic functions in cognitive theory (pp. 167 ± 233) . Hillsdale, NJ: Lawrence

Erlbaum Associates, Inc.

Semal, C., & Demany, L. (1991) . D issociation of pitch from timbre in auditory short-term memory.

J ourna l of the Acoustica l Society of America , 69 , 2404 ± 2410.

Semal, C., & Demany, L. (1993). Further evidence for an autonomou s processing of pitch in auditory

short-term memory. J ourna l of the Acoustica l Society of America , 94 , 1315 ± 1322 .

Singh, P.G. (1987) . Perceptual organisation of complex-tone sequences: A trade-off between pitch and

timbre? J ourna l of the Acoustica l Society of America , 89 , 1890 ± 1991.

Singh, P.G., & Hirsh, I.J. (1992). In¯ uence of spectral locus and F0 changes on the pitch and timbre of

complex tones. J ourna l of the Acoustica l Society of America , 82 , 886± 899.

van N oorden , L .P.A .S. (1975). Tempora l coherence in the perception of tone sequences. D octo ral

dissertation, Technische Hogeschool, Eindhoven, The Netherlands.

Origina l manuscript received 28 J une 1994

Accepted rev ision received 5 August 1996

M E M O R Y F O R T O N E S 3 5 7