Disruption of Short term Recognition Memory for Tones ...psych.cf.ac.uk/home2/jones_dylan/1997...
Transcript of Disruption of Short term Recognition Memory for Tones ...psych.cf.ac.uk/home2/jones_dylan/1997...
Disruption of Short-term Recognition
M em ory for Tones: Stream ing or Interference?
D.M . Jones, W.J. M acken, C. Harries
University of Wa les, Ca rdiff, U.K.
A sequence of auditory stimuli interpolated between the initial presentation of a tone and a
com parison tone impairs recognition performance. Notably, the impairment is much less w ith
interpolated speech than with tones. S ix experiments converge on the conclusion that th is
pattern of impairment is due m ore to the org anization of the interpolated sequence than to its
sim ilarity to the to-be-remem bered standard. Factors that contribute to the coherence of the
interpolated sequence into a stream distinct from the initia l tone are prim ary determinants of
the level of impairment. This is dem onstrated by manipulating factors that contribute to the
coherence of the interpolated sequence by the action of tempora l, spatial, tim bral, and tonal
attributes. However, the relative immunity of recognition performance to the interpolation of
unprocessed digit sequences is not explained wholly by such coherence.
The judgement of the sim ilarity of pitch between two successive test tones is critically
dependent on the presence and nature of irrelevant sound interpolated between the tones.
Two interpretations of this effect are contrasted in this paper: One considers that the
disruption is produced by interferenceÐ that is, on the sim ilarity of representations
within memory of the test tones and the irrelevan t tones; the other supposes that the
interference is the product of the degree to which the interpolated material is perceptually
integrated w ith the test tones. These issues are addressed using a methodology ® rst
employed by Deutsch (1970). The paradigm is one in which memory for the pitch of
tones over a period of a few seconds is assessed. Subjects ® rst hear a standard tone,
followed by a sequence of sounds that they are instructed to ignore, and then, after a short
interval, a comparison tone. They are asked to judge whether the standard tone and the
comparison tone are of the same or different pitch . The main variable of interest is the
different types of sound in the interpolated sequence. In the study by Deutsch (1970),
interpolated spoken dig its and interpolated tones produced markedly different effects on
the pitch judgement task: Speech produced errors on 2% of trials, but tones produced
THE QUARTERLY JOU RNAL OF EXPERIM ENTAL PSYCHOLOGY, 1997 , 50A (2), 337 ± 357
Requests for reprints should be sent to Dylan Jones, School of Psychology, PO Box 901 , University of Wales,
Cardiff, CF1 3YG. Email: jonesdm@ cardiff.uk.a c
The work reported here was supported by a projec t g rant from the Economic and Social Research Council.
Thanks are due to Seth Schwartz for his help in running Experim ent 1. The development of the experim ental
series bene ® ted from discussions with Cliv e Frankish . Revisio ns to the manuscript were helped greatly by Robyn
Boyle and Karen Howes.
Ó 1997 The Experimental Psychology Society
errors on 32% of trials. The lower level of disruption by speech led Deutsch (1970; see
also Deutsch, 1984, for a summary) to suppose that recognition memory was related to
the degree of sim ilarity between the standard tone and the interpolated material.
The very marked difference produced by speech and non-speech also suggested to
Deutsch (1970) that speech and non-speech are functionally distinct w ithin memory. This
interpretation was reinforced by Pechmann and M ohr (1992), who v iewed their ® ndings
using the Deutsch (1970) paradigm as supporting a modi® ed working-memory model in
which non-speech and speech were functionally distinct, stored respectively in a ``tonal
loop’ ’ and an ``articulatory loop’ ’ . By both these accounts the marked degree of inter-
ference by interpolated tones is explained in terms of retroactive interference with the
standard tone by the subsequent interpolated tones, with the suggestion that this inter-
ference is especially low in the case of speech becau se distinctiveness is not merely
structural but may also be functional.
The main thrust of the present paper is that the results of Deutsch (1970) and
Pechmann and M ohr (1992) can also be explained in terms of a processing system within
which representations in memory are organized according to the principles of auditory
scene analysis (Bregman, 1990). Such an approach focuses on the characteristics of
sounds that lead to their grouping and coherence. For exam ple, temporal proxim ity or
spectral sim ilarity can be used to organize materials into perceptual streams correspond-
ing to distinct entities in space and over time. It is argued that such organization is
re¯ ected in the represen tation within memory, which, in turn, has repercussions for
the accessibility of events within and between streams. Thus, organizational factors will
not on ly separate different stream s of sound so that they are distinct in phenomenal
terms, but will also determine organ ization in memory and consequently determine the
ease of retrieval from memory. This ob ject- and stream-formation framework has already
proved usefu l in the context of interference effects in short-term serial recall (e.g. Jones,
M acken , & M urray, 1993). The curren t series of experiments seeks to extend its
applicability to the phenomena of interference with pitch memory as investigated by
Deutsch (1970).
From this streaming standpoint, two factors not controlled in the experimental para-
digm used by Deutsch (1970) may have resulted in a misleading picture of how inter-
polated items interfere with memory for pitch. By controlling for these factors in new
experiments, resu lts consistent with an auditory scene analysis framework and at variance
with the interference framework should emerge. The ® rst factor is that in the original
paradigm the tim ing of the standard tone in relation to the interpolated sequence of tones
encouraged the perception of the standard as an element of the interpolated sequence.
Speci® cally, the interval between the standard tone and the ® rst interpolated tone was the
same as that between successive interpolated tones (300 msec). This timing is likely to
lead to all those items (both standard and interpolated) being organized into a single
higher-order perceptual unit. Failure to isolate the standard tone would prejudice the
process of matching it to the comparison tone and therefore lower the recognition rate.
This is based on the generalization that the more attributes successive stimuli have in
common, the more likely they are to be grouped togetherÐ a phenomenon that leads to
dif® cu lty in isolating information about single events within a sequence (see Bregman &
Rudnicky, 1975, for an illustration). For example, if a sequence of stimuli share both
3 3 8 J O N E S , M A C K E N , H A R R I E S
timing and timbre, even though they may differ in pitch, the stimuli will be bound
together more ® rmly than if they shared only timing. With speech, therefore, the group-
ing of the standard tone with the interpolated items will be less compelling, as it is distinct
in timbre, though not in tim ing, from the interpolated sequence. The effect of interpol-
ated tones can therefore be accounted for in part by the dif® culty of segregating percep-
tually the standard tone from the tones in the interpolated sequence with which it is
grouped. If this account of the greater disruption produced by interpolated tones is
correct, increasing the interval between the standard and the ® rst stimulus of the inter-
polated set should signi ® can tly reduce disruption.
The second confounding factor is that in the comparison of speech with non-speech by
Deutsch (1970) the degree of acoustic variation of the interpolated materials was not
controlled appropriately. As a result, the dig it sequence contained items that were not
only distinct from the standard tone, but were also more distinct spectrally from one
another than were the items within the sequence of interpolated tones. Additionally, in the
materials used by Deutsch, the tone sequences varied in frequency over an octave, but the
speech stimuli shared a common fundam ental frequency. Speech by its nature d iffers
acoustically from pure tones; however, given that an important distinction rests on this
confounding, it seems reasonable to argue that a more exacting analysis of the effects of
different characteristics of speech is required. Speci ® cally, such a test should match the
variation in both speech and non-speech stimuli along one dimension while controlling all
other stimulus dimensions. For example, it might be desirable to match the frequency of
tones to the fundamental frequency of each speech stimulus. In this way the degree of
variation in pitch w ithin the interpolated sequence wou ld be varied to the same degree for
the d ifferent classes of material.
The following sequence of experiments explores the pattern of interference in pitch
memory as investigated by Deutsch (1970) and Pechmann and M ohr (1992) with a view to
establish ing the applicability of a stream ing fram ework to the results.
E X P E R I M E N T 1
Experiment 1 is a replication (w ith m inor variations) and extension of the main conditions
of Deutsch (1970). The central aim of this experiment was to test two propositions: ® rst,
that the timing of the standard tone in relation to the sequence of interpolated tones will
determ ine, in part, the degree of disruption via its effect on grouping; and second, that
when the differences within each type of interpolated sequence are made comparable as
far as is practicable, sequences of speech when compared to sequences of non-speech will
produce equivalen t degrees of disruption of memory for tones. W hen both of these
factors are controlled, it is argued, a more accurate picture of the mechanisms of disrup-
tion of memory for pitch will emerge.
Two forms of manipulation were exam ined in Experiment 1. One involved changing
the interval between the standard and the ® rst item in the interpolated sequence.1
Two
conditions were contrasted: In one case, the interval was the same as that separating the
M E M O R Y F O R T O N E S 3 3 9
1Thanks are due to Peter Bailey for pointing to an unpublished study reported as a footnote in the paper of
Deutsch (1978 ) in which the results of a similar experiment are described brie¯ y.
items in the interpo lated sequence (350 msec), in the other it was appreciably longer
(1500 msec). On the basis of the streaming hypothesis, it was predicted that recognition
memory for the standard would be superior when the interval between the standard and
the interpolated list is longer, but that this improvement may depend on the role of other
factors (including the acoustic composition of the interpolated list).
In order to investigate the role of the composition of the interpolated list and its joint
action with the standard-list interval, four types of interpolated sequence were used. Two
of these were also used in the Deutsch (1970) experiment: pure tones at different fre-
quencies and sequences of random digits. Two further conditions were added: In the ® rst,
a sequence of cello notes having the same pitch as the sequence of pure tones was used to
compare the effect of having the interpolated sequence distinct in timbre from both the
standard and the comparison sounds. Semal and Demany (1991) found that manipulating
the timbre of the interpolated series did not improve recognition performance; however,
in their version of the Deutsch parad igm , the interval between the standard and the ® rst
item of the interpolated series was longer (at 500 msec) than the interval between mem-
bers of the interpolated set (300 msec), thus possibly serving as a basis for segregating the
standard . Additional cues to segregation based on the timbre of the interpolated set may
not have been useful under these conditions. This suggests that the ro le of timbre will be
revealed by its interaction with the interval between the standard and the ® rst item of the
interpolated list. A lthough listeners trade temporal and timbral information, w ith in limits,
it seems plausible that when tim ing does not serve as a strong basis for segregation (as in
the 350-msec interval condition of Experiment 1) the effect of distinctiveness of timbre
will be made more likely. A sim ilar logic applies to the second additional condition in
which a single token of the spoken digit ``one’ ’ is repeated.
Changing the timing between the standard and the interpolated list would only affect
conditions in which the interpolated items were tones, not those comprising speech or
cello notes, as the timbral difference between these interpo lated items and the standard
tone means that the standard is not likely to be grouped with them anyway.
In Experiment 1 there are slight differences to the procedure adopted by Deutsch
(1970) and Pechmann and M ohr (1992)Ð nam ely that all the sounds were 350 msec long
(not 200 msec) and in the interpolated sequence there were nine items, rather than six,
separated by 350 msec, rather than 300 msec. These values were adopted following pilot
studies and were designed to prevent effects of timing between the standard and the
interpolated sequence being masked by ceiling effects. These changes are relatively minor,
and it seems safe to regard such differences between the studies as unlikely to prejudice
the hypotheses under test.
M e th o d
S u b j e c ts
M ale and female student volunteers were recru ited for the study. E ach reported normal hearing.
Subjects were screened for their ability to discrim inate between pure tone stimuli a semitone apart. A
criterion of 7 correct discrim inations in 10 presentations was used. Of the subjects, 8 failed to meet
the criterion; 20 subjects undertook the full experimental procedure after passing the test.
3 4 0 J O N E S , M A C K E N , H A R R I E S
A u d i t o r y M a t e r i a l s
All sounds were recorded and edited digitally to 16-bit resolution and sampled at 48 K Hz.
The sounds were tape-recorded onto a high-quality cassette tape recorder for playback during the
experiment.
Each trial consisted of a standard pure tone (lasting 350 msec), followed, after an interval of either
350 msec or 1500 msec, by an interpolated sequence of nine item s, each of which was 350 msec long
and separated from its neighbours by silence of 350 m sec. Following a further interval of 1500 msec,
the com parison tone (lasting 350 msec) was presented. Care was taken to tailor, by digital editing, the
rise and fall time of the tones so that no artefactual clicks were produced (each tone rose and fell its
full amplitude in 50 m sec). An interval of 10 sec followed the compar ison tone, during which the
subject was required to mark the judgement ``same’ ’ or ``d ifferent’ ’ on a response blank. On half the
trials standard tones and comparison tones were the sam e; on the other half they were a semitone
apart (randomly either higher or lower).
Four types of interpolated sequence were assembled: (a) a sequence of pure tones drawn randomly
from the set of 12 semitones in the range C4 (262 Hz) to B4 (494 Hz); (b) spoken digits, the integers
1 to 9, recorded individually in a male voice in a monotone at a pitch corresponding to C4 (262 H z)
and thereafter assembled using digita l editing into a quasi-random sequence in which no digit was
repeated; (c) a repeated syllable spoken in a random sequence of fundamental frequencies in the sam e
range as for tones; in this case, a single token of the word ``one’ ’ w as spoken at a pitch corresponding
to C4 (262 Hz) and was subsequently transformed using digital signal processing techniques using
Digidesign’ s S oundE dit software so that the same token was played at each of the pitches correspond-
ing to those of the pure tones without a change in the duration of the stimulus; (d) bowed cello notes
at fundamental frequencies in the same range as that for tones (taken from com mercially recorded
digital samples using Sa mpleCell software). The pitches of standard and test tones were chosen from
those not already selected as interpolated items.
Each of the speech tokens was edited to last 350 msec (thus making them the same physical length
as the non-speech tokens) by means of a routine in Digidesign ’ s S oundE dit software in which the
length of the token could be reduced w ithout any changes in any other feature such as the pitch of the
token. Individually, transformed tokens were not perceptibly different from untransformed tokens.
P r o c e d u r e
Subjects were given written instructions in which they were asked to judge the sim ilarity of the
standard and the comparison tones by responding ` s̀am e’ ’ or ``d ifferent’ ’ as quickly and accurately as
possible. They were asked to ignore the interpolated material and were told explicitly that they would
not be tested on its contents. The cond itions were presented in random order from trial to trial, and
each subject undertook 160 trials in all, 20 in each condition, half the trials having the correct
response ` s̀am e’ ’ and half having the correct response ``different’ ’ .
R e s u l t s a n d D i s c u s s i o n
Percentage correct responses for this and all other experiments in the series are given in
Table 1. S tatistical analysis by ANOVA revealed no overall effect of the interval between
the standard and the interpolated list, F (1, 19) = 4.02, p > .05. There was however a
signi ® cant effect of interpolated material, F (3 , 57) = 28.52, p < .0001, and a signi ® cant
interaction between interpolated material and interval, F (3, 57) = 4.31, p < .01.
M E M O R Y F O R T O N E S 3 4 1
As Deutsch (1970) and Pechmann and M ohr (1992) found, the difference between
tones and the digit sequence was highly signi ® cant, both in the 350-msec and the 1500-
msec interval conditions: F (1, 57) = 89.44, p < .0001, and F (1, 57) = 23.27, p < .001,
with tones showing much greater disruption of recogn ition . However, the contrast
between the cello sequence and the repeated syllable sequence was clearly non-signi ® cant,
F < 1 in both the 350-msec and the 1500-msec interval conditions. Interpolated tones led
to poorer performance than the repeated syllable sequence for the 350-msec interval,
F (1, 57) = 13.76, p < .001, but not for the 1500-msec interval, F < 1. The same pattern
was also obtained with the comparison between interpolated tones and interpolated cello:
F (1, 57) = 7.74, p < .01 for the 350-msec interval, and F < 1 for the 1500-msec interval.
In sum , although speech made up of a sequence of spoken digits shows the effect found
by Deutsch, when speech and non-speech are matched in terms of between-stimulus
acoustic change within a sequence, the difference between speech and non-speech is non-
signi ® cant.
3 4 2 J O N E S , M A C K E N , H A R R I E S
T A B L E 1
S u m m a r y o f R e s u l t s f o r t h e E x p e r i m e n t s i n t h e S e r i e s
Interva l
350 msec 1,500 msec
Experiment 1 tones 57 67
digits 82 80
repeated syllable 67 66
cello 64 68
Experiment 2 9-tones 60
17-ton es 67
Experiment 3 same ear 52
different ear 60
both ears 60
Experiment 4 digits/common F0 82
digits/changing F0 65
one digit/changing F0 74
Experiment 5 cello 62
digits 86
instrument/repeated 66
instrument/changing 56
Experiment 6 tone 63
digits 84
reversed digits 79
glides 73
In relation to the effect of tim ing, the bene ® cial effect on recognition of extending the
interval between the standard and the interpolated sequence from 350 msec to 1500 msec
was restricted to the case of interpolated tones, F (1, 19) = 14 .45, p < .0004. Extending the
interval from 350 msec to 1500 msec had no effect on sequences with interpolated cello,
interpolated repeated syllable, or interpolated spoken digits, F (1, 19) = 1.69, p > .05,
F (1, 19) = 0.08, p > .05, and F (1, 19) = 0.70, p > .05, respectively. Overall, these results
support the streaming hypothesis and identify the role of confounding factors in the
original Deutsch (1970) study. The results also speak to the interplay of tim ing and timbre
in stream ing: Only when the standard was bound perceptually to the interpolated list by
timing did the action of timbre serve to isolate the standard, thereby enhancing recogni-
tion performance.
One possible alternative interpretation is that increasing the time interval between the
standard and the interpolated sequences might provide subjects with add itional time to
encode more accurately the standard tone before it is degraded by the subsequent inter-
polated tones, leading to improved p itch recognition. This seems unlikely. The interval of
350 msec is longer than that wh ich usually gives rise to auditory backward recognition
masking (e.g. Kallman & M assaro, 1979), and so an exp lanation of the high level of
disruption found with interpolated tones in these terms would seem to be implausible
(see also Kallman , Cameron, Beckstead, & Joyce, 1987). Furthermore, the longer interval
of 1500 msec is within the time range that gives rise to the perception of single isolated
auditory events, rather than grouped items (Fraisse, 1978).
Overall, the results of Experiment 1 are most plausibly interpreted in terms of an
auditory scene analysis approach, within which speech and non-speech have similar
effects. F irst, there was no difference in terms of the disruption produced by matched
speech and non-speech sounds. Second, the effect of the timing of the standard and the
interpolated sequence was lim ited to tones, implying that the original ® nding of Deutsch
(1970) was subject to the effect of perceptual grouping. The results of Experiment 1
suggest strongly that speech and non-speech sounds are equipotent in their capacity to
disrupt memory for sound, and that speech is not a suf® cient condition for low inter-
ference. In the next two experiments, we further explore the role o f grouping manipula-
tions in determining pitch recognition performance.
E X P E R I M E N T 2
The effects of timing of the standard tone in Experiment 1 point to the importance of the
organization of the interpolated list. This indicates the possible value of construing the
disruption of pitch memory in the Deutsch (1970) paradigm in terms of the processes by
which items are organized into higher-order units, or streams, rather than in terms of a
degradation in the representation of the standard tone by retroactive interference due to
the similarity of the interpolated list to the standard tone. A further test of this proposi-
tion is mounted in Experiment 2 by manipulating the number of tones in the interpolated
sequence. We compare the case in which 9 tones are presented in the interpolated
sequence (as in Experiment 1) with the case where the number of tones is increased to
17, without changing the interval within which these tones are presented. From the
viewpoint of retroactive interference, it would be expected that an increase in the number
M E M O R Y F O R T O N E S 3 4 3
of similar stimuli should lead to an increase in interference, wh ich wou ld be signalled by a
decrease in recognition performance. But if, as is strongly suggested by Experiment 1,
grouping processes in¯ uence the degree of disruption produced by interpolated tones,
the effect of nearly doubling the num ber of interpolated tones (while keeping the total
duration of the interpolated sequence constant) will be to isolate perceptually the inter-
polated tones from the test tone. As the num ber of interpolated tones is doubled, the
standard no longer shares the inter-item tim ing of the sequence as a whole. Therefore, the
streaming hypothesis makes the opposite prediction to that of the interference hypo-
thesis Ð n am ely that nearly doubling the number of tones in the interpolated interval
will increase the likelihood of correct recognition.
M e th o d
S u b j e c ts
Twenty-one underg raduates, each of whom reported normal hearing and passed the tone test
descr ibed in Experim ent 1, volunteered to take part and were paid an honorarium . N ine potential
subjects failed the tone test.
M a te r i a l s a n d P r o c e d u r e
Only tones were used as interpolated materials in this study, but the overall nature of the task and
the frequency range of the tones were identical to the corresponding conditions of Experiment 1.
The control condition was identical to the 350-msec interpolated (` t̀ones 350’ ’ ) tones condition of
Experiment 1. In the experim ental condition the test tone was separated by 350 msec from an
interpolated set of 17 tones, ne arly tw ice the number used in the control cond ition. In no case
was a tone immediately followed by another of the same frequency w ithin the interpolated sequence.
Although in the 17-tone condition one tone ® n ished as another star ted, the rise and fall tim es of the
tones (as in Experiment 1), coupled with a change of frequency at each tone in the sequence, meant
that they were phenomenally separate and distinct. In both cond itions the com parison tone occurred,
as in Experiment 1, 1500 msec after the end of the interpolated list.
In each condition, 20 trials were presented, with ``same’ ’ and `̀ d ifferent’ ’ responses equal in
number appearing in random order. The procedure was as in Experiment 1.
R e s u l t s a n d D i s c u s s i o n
Results were analysed in terms of percen tage of correct responses. Performance in the
pure-tone control condition (60% correct) was comparable to that found in the ``tones
350’ ’ condition of Experiment 1 (in which the level was 57% correct). In addition,
recognition performance was signi ® cantly better with 17 interpolated tones (67% correct)
than with 9 interpolated tones, F (1, 20) = 6.05, p < .025.
The results of Experiment 2 were directly in line with the stream ing hypothesis and at
variance with the retroactive interference hypothesis: Almost doubling the number of
interpolated tones did not increase disruption; rather, it was reduced. The effect was one
roughly comparable in magnitude to that found by extending the interval between the
standard and the interpolated tones in Experiment 1. Any account based on retroactive
3 4 4 J O N E S , M A C K E N , H A R R I E S
interference by interpolated sounds would be hard-pressed to predict that an increase in
the number of interpolated tones would improve recognition of the test tone.
Although the results of Experiment 2 support the general thrust of the argument
developed in Experiment 1, there is residual ambiguity about the outcome of Experiment
2, stemming from a confounding of interstimulus interval and the number of interpolated
stimuli. As the number of interpolated stimuli was doubled, the interstimulus interval
within the interpolated list was reduced from 350 msec to zero, but the interval between
the standard and the ® rst item of the interpolated sequence remained at 350 msec. This
ambiguity cannot be resolved fully; therefore subsequent experiments of the current
series of experiments seek evidence using other stimulus manipulations that help con-
verge on the stream ing hypothesis.
Thus far, two different techniques of manipulating stream ing have provided conver-
gent evidence for the usefulness of the aud itory scene analysis fram ework in exam ining
disruption of memory for pitch in the Deutsch paradigm . Experiment 3 seeks to extend
this theme using another type of streaming manipulation.
E X P E R I M E N T 3
In this experiment the role of streaming processes in determining disruption of pitch
memory is exam ined by varying the spatial location from which the interpolated items are
presented. This represents an extension of a previous study by Deutsch (1978) in which a
comparison was made between conditions whereby standard and comparison tones were
presented monaurally, and interpolated tones were presented either to the same ear as
standard and comparison tones or to the other ear. Conditions in which the interpolated
tones were presented contralaterally to the standard and the comparison led to signi ® -
cantly better performance than cond itions where the interpolated tones were presented
ipsilaterally.
This result may be seen as supporting the interference view of the impairment of pitch
memory, in the sense that the degradation of the representation of the standard tone by
the subsequent occurrence of interpolated tones may be attenuated or elim inated by
assigning those interpolated tones to a different ear-speci® c represen tation.2
However,
another possible interpretation Ð one in line with a stream ing approach Ð is that spatial
location serves as a powerful segregation cue to disembed the standard from the early
items in the interpolated sequence. The conditions used by Deutsch (1978) do not allow
discrim ination between these two possibilities. In Experiment 3, the possib ility that the
® nding of Deutsch (1978) is better conceived of in grouping terms is explored by adding a
third condition to her design. As well as presenting interpolated tones both ipsilaterally
and contralaterally to the standard and comparison tones, we also include a condition in
which the interpolated tones are presented to both ears simultaneously. Because it is
presented binaurally, the interpolated sequence will be perceived as originating some-
where in the centre of the head, thus forming a distinct perceptual group relative to the
M E M O R Y F O R T O N E S 3 4 5
2In fact Deutsch (1978 ) in terpreted her results in an attentional rather than in a retroac tive in terference
framework . Her results are compatible with a retroactive in terference explanation, however.
standard . From the point of view of the streaming hypothesis, therefore, this condition
would be expected to give rise to an improvement in recognition equivalent to that found
with contralateral presentation of the interpolated tones. From the viewpoint of the ear-
speci ® c channel hypothesis, such a condition wou ld be expected to be sim ilar in its effect
to ipsilateral presentation, as interpolated tones are still p resented to the same ear as the
standard and the comparison and therefore enter the same ear-speci ® c channel.
M e th o d
S u b j e c ts
Twenty subjects were selected on the basis that they passed two screening tests. The ® rst was the
pitch discrim ination test used in Experiments 1 and 2. The second involved testing subjects for their
ability to discrim inate the location of presented tones. Subjects were played 15 tones: ® ve to the left
ear, ® ve to the right ear, and ® ve to both ears at once. They were required to m ake a three-alternative
forced-choice decision as to whether tone s occurred in the left, right, or centre locations. Only
subjects who scored 13 or m ore correct out of 15 went on to the experiment proper, and 7 potential
subjects were rejected as a re su lt of these two tests.
A p p a r a tu s /M a t e r i a l s
The 12 pure tones at sem itone intervals of the scale C4 to B4 were generated as described in
Experiment 1. T hese tones were assembled to provide three different types of trial sequence. The
tem poral features of all three types of sequence were identical, with a standard tone, lasting 350 msec,
followed after 350 msec by nine 350-msec interpolated tones, each separated from its neighbours by
350 msec. After a silent interval of 1500 msec at the end of the interpolated sequence, a comparison
tone, also lasting 350 msec, was presented. On half the trials, the comparison was the same pitch as
the standard; on the other ha lf, it d iffered by a sem itone (half were higher and half lower than the
standard). The frequencies of the tones in the sequence were selected in the same way as the tone
sequences described in Expe rim ent 1.
Three types of sequence were generated. In the same-ear cond ition, the standard was presented to
a single ear, followed by the interpolated sequence to the same ear, followed by the comparison tone
also to the same ear. In the different-ear cond ition, the interpolated tones were presented to a
different ear to that of the standard and the com parison tones. In the both-ears cond ition, standard
and comparison tones were both presented monaurally to the same ear, but the interpolated tone s
were presented stereophonically to both ears. The amplitude of the binaurally presented interpolated
sequences was set in pilot tria ls such that they were of equivalent loudness to the m onaurally
presented tones. Tones presented to a single ear were presented to the left ear on half the trials
and to the right ear on the other half of trials.
The sequences were stored as ``snd’ ’ resources on a M acintosh IIcx for playback in a predeter-
mined random order via a Hyperca rd program during the experiment. Subjec ts wore stereo head-
phones throughout the screening tests and experiment.
D e s i g n a n d P r o c e d u r e
Subjects were presented with 60 trials in all, 20 for each of the three conditions (sam e-ear,
d ifferent-ear, both-ears). Conditions changed randomly from trial to trial. In other respects the
procedure was the same as Experim ent 1.
3 4 6 J O N E S , M A C K E N , H A R R I E S
R e s u l t s a n d D i s c u s s i o n
Subjects’ responses were calculated in terms of percentage correct for each condition.
M eans were as follows: 52% in the same-ear condition; 60% in the d ifferent-ear condi-
tion; and 60% in the both-ears condition. There was a signi ® cant overall effect of con-
dition, F (2 , 38) = 4.82, p < .02. Planned comparisons revealed signi® cantly better
performance in both the different-ear and the both-ear conditions relative to the same-
ear cond ition , F (1, 38) = 7.00, p < .02, and F (1, 38) = 7.45, p < .01, respectively. There
was no difference between differen t-ear and both-ears conditions, F < 1.
These results converge with those of Experiments 1 and 2 in providing further support
for the streaming hypothesis of interference effects in pitch memory. If the marked level
of disruption found when interpolated tones are presented in the sam e spatial location as
the standard tone was due to interference w ithin an ear-speci® c channel, then the both-
ears condition should have led to the same degree of disruption as the same-ear condition.
Instead, the results support the argument that disruption effects in pitch memory may be
viewed usefully in the context o f the percep tual processes whereby items and events are
organized into higher-order perceptual groups, or streams (see, for example, Bregman,
1990 ). To the extent that such processes allow the standard and the interpolated tones to
be perceived as distinct entities, then recognition performance will be enhanced. The
results also serve to reinforce those of Kallman et al. (1987), who found the superiority of
the d ifferent-ear condition was restricted to a procedure in which conditions were blocked
rather than randomized. The results of the curren t series of experiments show that the
effect also occurs with randomized conditions.
Thus far, the results have suggested that the auditory scene analysis fram ework pro-
vides a useful way of approaching the ® ndings of Deutsch (1970, 1978) and Pechmann
and M ohr (1992). W ithin the present series, g rouping manipulations (by time, by timbre,
by location, and by num ber) have been shown to modulate the interference of pitch
memory cau sed by interpolated tones. In the three ® nal experiments we return to the
question, ® rst posed in Experiment 1, of why spoken dig its produce such low levels of
disruption. As has already been mentioned, such stimuli d iffer from the tone sequences in
poten tially critical ways: First, each item in the sequence is spectrally distinct from its
neighbours in a way that the tones are not; and second, the spoken digits do not change in
overall pitch in the way that the tones do. In the following experiments, the in¯ uence that
these factors may have on modulating the disruptive capacity of interpolated sequences is
investigated.
E X P E R I M E N T 4
A potentially critical difference between the interpolated tones and the digits used by
Deutsch (1970) was that whereas the tones varied in frequency from item to item , all
digits shared a common fundam ental frequency (that is, they were spoken at the same
overall pitch). In Experiment 4, we examine the possibility that this common overall
fundamental frequency contributes to the low level o f disruption of pitch memory found
with such digit sequences. From the streaming point of view, we may hypothesize that a
shared fundam ental frequency am ong items increases the likelihood that such items will
M E M O R Y F O R T O N E S 3 4 7
be grouped together coherently, thus facilitating the perceptual isolation of the standard
tone from the ® rst few items in the interpolated sequence. In Experiment 4, the effect of
interpolated digit sequences that share a common fundamental is compared with
sequences that change in fundam ental frequency from item to item. Chang ing the funda-
mental frequency should reduce the tendency for items to be grouped together coher-
en tly. From the streaming view, changing the fundam ental frequency from digit to digit
may reduce the strength with which such items may be grouped together separately from
the standard, therefore increasing the level of disruption of pitch memory. The technique
used in this experiment is to change the frequency of a token using digital signal process-
ing techn iques allow ing all other characteristics (such as duration and intonation) to
remain ® xed while the pitch is changed. Phenomenally the effect of this is to make the
digits sound as if they are spoken by different voices.
In order to exam ine these possibilities, Experiment 4 employs conditions in which the
content of the interpolated sequences is manipulated in three ways: (a) the usual spoken
digit condition as used in Experiment 1; (b) a condition that involves presentation of the
nine spoken digits, bu t in th is case each item is shifted in pitch to 1 of 12 semitone steps;
and (c) a single spoken digit (``one’ ’ ) shifted in pitch in the same way as in Condition (b).
Cond itions b and c, therefore, share a common range of pitches but differ in terms of the
variation in timbre; Condition c has a common timbre throughout the interpolated
sequence, but in Condition b the timbre varies. Repeating the same speech token, albeit
at d ifferent pitches, shou ld also increase the likelihood that such a sequence w ill form a
more coherent group than sequences that have both pitch and timbral change (as in
Cond ition b). Therefore, from the point of view of the grouping hypothesis, we would
expect Condition c to produce less disruption than Condition b. Thus, the conditions
exam ined in Experiment 4 may be seen within the streaming framework as representing
three levels of coherence in the interpolated items, with repeated digits sharing a common
fundamental being the most coherent, and pitch shifted versions of all the d igits the least
coherent.
M e th o d
S u b j e c ts
Eighteen sub jects participated in the experiment for an honorarium. They were selected on the
basis that they passed the pitch discrim ination test as described in Experiment 1; 4 potential subjects
were rejected on this basis.
A u d i t o r y M a t e r i a l s
Three types of sequence were constructed: (a) d igits 1 ± 9/common F0, identical to those used in
the ``digits 350’ ’ condition used in Experiment 1; (b) digits 1 ± 9/changing F0, in which each of the
digits 1 to 9 was digitally processed such that each had a fundamental frequency corresponding to one
of the semitone steps in the octave C4 to B4; and (c) single digit/changing F0, in which the spoken
digit ``one’ ’ was digita lly copied and pitch-shifted to each of the semitone steps of the C4 to B4
octave. The pitch sh ifting technique was the same as that used in Experim ent 1. The overall order of
the sequences was the same as that used in the 350-msec cond itions of Experiment 1.
3 4 8 J O N E S , M A C K E N , H A R R I E S
D e s i g n a n d P r o c e d u r e
Twenty sequences of each type were constructed and randomly ordered on a digital tape for
playback through headphones during the experiment. The general procedure was the same as used in
Experiment 1.
R e s u l t s a n d D i s c u s s i o n
Subjects’ responses were scored as percentage correct in each condition. The means for
the three conditions were as follows: digits 1± 9/common F0, 82%; digits 1 ± 9/changing
F0, 65%; and single-digit/changing F0, 74% . A repeated-measures ANOVA revealed a
signi ® cant main effect of condition, F (2, 34) = 18.78, p < .0001. Planned comparisons
indicated that digits 1 ± 9/common F0 led to better recogn ition performance than either the
digits 1 ± 9/changing F0 condition or the single-digit/changing F0 condition, F (1, 34) =
37.47, p < .0001, and F (1 , 34) = 7.85, p < .01, respectively. There was also a signi ® cant
difference between the digits 1± 9/changing F0 and the single-digit/changing F0 condi-
tions, F (1, 34 ) = 11.02, p < .005, w ith digits 1 ± 9/changing F0 producing a greater degree
of disruption of pitch recognition.
These results clearly ind icate that one feature of spoken digit sequences giving rise to
low levels of disruption is that the items share a common fundamental frequency. Chan-
ging the F0 breaks up the coherence of such sequences. Presenting such sequences with
each item at a different frequency leads to a substantial increase in their disruptive
capacity, to the order of a 17% increase in errors. The results also suggest that this is
due, at least in part, to the effect that changing the frequency from item to item has on the
strength w ith which such sequences form coherent groups. By repeating the same item at
different frequencies (single-d igit/changing F0), performance is improved relative to the
digits 1 ± 9/changing F0 by approximately 9% Ð an improvement that we suggest is due to
the increased likelihood with which sequences may be coherently grouped, facilitating the
perceptual iso lation of the standard tone.
E X P E R I M E N T 5
Experiment 4 focused on some of the aspects of speech sounds that may give rise to low
levels of disruption. In Experiment 5 attention is turned to non-speech sounds. In
particular, questions are posed relating to whether the low level of disruption produced
by spoken digits can also be produced by non-speech sounds that share the character-
istics of spectral change from item to item while retaining a common fundamental
frequency. W ithin the current series of experiments it has been shown already that
speech (repeated syllables) can be made to exhibit effects sim ilar to those of non-speech
sounds (repeated cello notes) in Experiment 1. For analytic purposes, it would be useful
also to demonstrate the complementary effect Ð that is, to show that non-speech sounds
can produce effects sim ilar to complex and varying speech. This would show that
speech was not a necessary condition, augmenting the evidence o f earlier experiments
in the current series that demonstrated that speech was not a suf® cient condition for
low levels o f disruption.
M E M O R Y F O R T O N E S 3 4 9
In Experiment 5, two conditions are introduced to examine this possibility. In one, the
interpolated sounds are made up of a range of different instruments playing the same
note, thus varying in timbre but not in pitch. This is contrasted with a condition in which
the same range of instruments play different notes: speci® cally, the range of pitch change
used in previous experiments in semitone steps w ithin an octave range. The comparison
of these two conditions should indicate whether change in pitch and timbral qualities
within a sequence of non-speech sounds gives rise to the same type of effect as has been
shown with speech sounds in Experiment 4. As control, conditions identical to the
``cello’ ’ and ``digits’ ’ (at 350-msec interval between the standard and the interpolated
sequence) conditions of Experiment 1 are also used.
In sum , Experiment 5 poses two questions: Does varying a common timbre augment
or reduce the group ing, and does an additional change of pitch reduce coherence still
further?
M e th o d
S u b j e c ts
Twenty subjects, screened by a test for tone discrim ination (see above), were paid an honorarium
for taking part. N ine potentia l sub jects were rejected on the basis of the pitch discrim ination test.
P r o c e d u r e
The general procedure was identical to that used in Experiment 2. Four conditions were used in
which interpolated materials were constructed from the following sounds: (a) cello, which was
identical to the `̀ cello 350 ’ ’ condition of Experim ent 1; (b) digits, identical to the ``digits 350’ ’
condition of Experiment 1; and two new conditions: (c) an instruments/repeated-pitch cond ition
comprising sequences of instrumentsÐ cello (bowed), guitar, French horn, saxophone, pipe-org an,
¯ ute, p iano, glockenspiel, trumpet Ð p laying the sam e note (C4) in random order; and (d) an instru-
ments/changing-p itch condition using the same randomly ordered sounds, but in add ition the notes
varied in the sam e range as those for ``cello ’ ’ Ð that is, 12 notes in semitone steps for the octave C4
and above.
The instrumental sounds were based on digitized samples of sound effects stored on compact disk
(D igidesign Sa mpleCell). T hese were subsequently pitch-shifted and edited (as in previous experi-
ments in this series) using Digidesign SoundTools digital signal processing software. T he tim ing of
the stimuli was as in the `̀ 350’ ’ conditions of Experiment 1 Ð that is, with the standard spaced
350 msec ahead of the ® rst interpolated sound and each of the interpolated sounds spaced at
350 msec.
Subjects were presented w ith 20 trials per condition, as in previous experiments.
R e s u l t s a n d D i s c u s s i o n
Results were analysed in terms of the percentage of correct responses in each condition.
The means for the four conditions were as follows: instruments/repeated-pitch, 66% ;
instruments/changing-pitch, 56% ; cello, 62%; and spoken digits, 86% . An overall
ANOVA showed the effect of the type of interpolated material to be highly signi ® cant,
3 5 0 J O N E S , M A C K E N , H A R R I E S
F (3, 57) = 26.89, p < .0001. Planned comparisons ind icated, just as shown with spoken
digits in Experiment 4, that changing the pitch from item to item in the instrument
sequences led to an increase in errors relative to the instruments that share a common
fundamental frequency, F (1, 57) = 5.56, p < .03, although the instruments/repeated-
pitch still p roduced more errors than the spoken dig its condition, F (1, 57) = 34.77, p <
.0001.
The resu lts of Experiment 5 also speak to the matter of timbre. The level of perform-
ance in the instruments/changing F0 condition was as poor as that found for tones in
Experiment 1. The main difference between this use o f instrumental sound and its use in
Experiment 1 is that timbre changed from one item in the interpo lated sequence to the
next. Added to the loss of coherence in timbre is the effect of change of frequency.
The substantial difference between the disruption produced by the instruments/
repeated F0 condition and the spoken digit condition indicates that common fundamental
frequency and spectral change from item to item in the interpolated sequence are not in
themselves suf® cient to give rise to the very low levels of disruption of pitch recognition
found with digit sequences. Nonetheless, the comparison between instruments/changing
F0 and instruments/repeated F0 in the present experiment, as well as the comparison
between spoken d igits and pitch shifted digits in Experiment 4, indicates that a shared
fundamental frequency am ong items leads to a decrease in disruption Ð probably, we
would argue, because a shared fundamental frequency among items increases the like-
lihood that such items will be grouped together coherently.
However, the question remains as to what other characteristics of spoken digit
sequences lead to very low levels of interference. One possib ility is that even though
each spoken digit is spectrally distinct from others, they still share many attributesÐ other
than fundamental frequency Ð which may also contribute to the likelihood of their being
grouped together. As the digits were all spoken in the same voice and therefore share
certain acoustic qualities in a way in which different instrument sounds do not, it seems
possible that such items are more likely to be grouped together. Such an explanation
would appear to be plausible in the light of previous results that have highlighted the
importance of grouping processes in modulating the disruptive capacity of interpolated
sounds (see Jones & M acken, 1995; Jones, M acken, & M urray, 1993). An alternative
explanation may be that the semantic content of the speech serves to bind the items
more coherently together. Thus there may be some ``top-down’ ’ processes that serve to
integrate the items more coherently into a separate perceptual group. We explore these
possibilities in the next experiment.
E X P E R I M E N T 6
Experiment 6 evaluates the effect of sequences of non-speech sounds constructed in a way
likely to increase their binding together of the items. Coherence within the interpolated
sequence was achieved by generating a continuous glide varying randomly in frequency.
Continuous glides were produced by ® rst low-pass ® ltering pink noise at a very low
frequency and then using this randomly varying signal to drive a voltage-controlled
oscillator. The oscillator then generated a sound with a pitch in proportion to the voltage
driving it. Th is glide was then interrupted regularly by shor t periods of silence. Hence, a
M E M O R Y F O R T O N E S 3 5 1
sequence of discrete items was created with each item pointing in frequency to the next
one in the sequence. Bregman and Dannenbring (1973) exam ined the effect of frequency
glides join ing two steady-state tones at different frequencies on the tendency for such
items to be formed into a single stream . They found that even when such glides contained
a silent interruption, they still increased the tendency for the sounds to be integrated into
a single stream . Thus, the way in which the glide at the offset of one sound pointed to the
glide at the onset of the next sound served to bind the tones more coherently together.
The glides used in Experiment 6 resemble those of Bregman and Dannenbring (1973) in
that the end of each glide segment points in frequency to the beginning of the next one. If
the tendency for interpolated items to be grouped together coherently is an important
determ inant of pitch recognition performance, then low levels of d isruption should be
expected for such sequences. An interpolated tone condition is also included as a baseline.
A clear prediction is that reversed digits will not have the semantic content of forward
speech, and therefore, if the meaning of the items is important in determining the effect
of interpolated digits, an effect on recognition memory by reversing the items in the
interpolated list would be expected. Forward and reversed digits share the same long-
term spectrum , but in reversing a spoken digit the ordering of periodic and non-periodic
features together with changes in the shape of the formant trajectories w ill mean that the
sequence will sound rather differen t to that in normal English. One possibility, therefore,
is that the ordering or trajectory of such features will reduce the coherence of the inter-
polated sequence, thereby decreasing the degree of disruption of recognition memory.
M e th o d
S u b j e c ts
M ale and female student volunteers were each subjected to the screening test described for
Experiment 1. Of the subjects, 5 failed the test, and the remaining 18 participated in the experiment.
P r o c e d u r e
As before, subjects were presented w ith 20 trials per condition. Four conditions, d iffering with
respect to the content of the interpolated list, were presented: tones, random digits, reversed random
digits, and pitch glides. The tones condition was identical to that used in the ``tones 350’ ’ cond ition of
Experiment 1. The digits were also those used in Experiment 1. D igits were reversed by m eans of
SoundDesigner software and edited to produce the same tim ing as in Experiment 1. T he glides
condition was constructed by generating a pitch glide and then editing silences so that 350 msec
of glide was alte rnated with 350 msec of silence.
To construct the pitch glides, p ink noise from a Cons ilium Industri noise generator (M odel PNG
11) was low-pass ® ltered by a Barr and Stroud ® lter (Type EF3) se t to pass frequencies be low 0.7 Hz,
with an increase in attenuation of 24 dB/octave above that point. The resu lting randomly varying
signal was ampli® ed and served as a control voltage for a Farnell DS61 oscillator. A s the voltage fed to
it var ied in a random manner, so did the pitch of the sound produced by the oscillator. Due to the
variation of loudness w ith pitch, the glides resu lted in a percept that varied in both pitch and
loudness. The range over which the pitch varied (the depth of modulation, covering roughly 500 ±
900 Hz with a mean of 650 Hz) was adjusted to give a discernible deg ree of variability while at the
3 5 2 J O N E S , M A C K E N , H A R R I E S
same time ensuring that the excursions of pitch at the lower bound to frequency did not lead to the
perception of silence due to the particular insensitivity of the ear to low frequencies (see Jones,
M acken, & M urray, 1993). This continuous glide was then edited. Periods of silence were produced
by editing the continuous glides using Digidesign SoundDesigner software. E ach set of interpolated
materia l contained a continuous glide from which portions had been rem oved and silence of the sam e
duration had been substituted.
R e s u l t s a n d D i s c u s s i o n
Subjects’ responses were calculated in terms of percentage correct for each type of
interpolated material. The means for each condition were as follows: tones, 63%; d igits,
84%; reversed digits, 79% ; and glides, 73% . An ANOVA carried out on the data revealed
a signi ® cant overall effect of interpolated material, F (3, 51) = 11.70, p < .0001. Planned
comparisons indicated the usual effect of greater disruption from tones than for d igits,
F (1, 51) = 31.87, p < .0001. However, reversing the digits had no reliable effect on their
disruptive capacity relative to forward versions of the same stimuli, F (1, 51) = 1.94, p >
.05 , nor was there a signi ® can t difference between reversed digits and glides, F (1, 51) =
2.37, p > .05 , although glides did produce signi® cantly more disruption than forward
digits, F (1, 51) = 8.60, p < .01. Furthermore, the glides produced signi ® cantly less
disruption than interpolated tones, F (1, 51) = 7.36, p < .01.
These results indicate clearly that semantic processing is not an appreciable factor in
the effect of interpolated digits, as reversed digits do not produce signi ® cantly more
disruption. This result also suggests that spectral features shared by individual items
uttered in the same voice serve to b ind items into a coherent perceptual group separate
from the standard tone, thus facilitating pitch recognition. However, the disruptive role of
different voices is not examined directly here and wou ld seem to be a usefu l line of
follow-up work. The importance of such cues to perceptual binding is also indicated by
the relatively low levels of interference produced by glides. It is argued that such glides
achieve coherence by virtue of the underlying continuity of the pitch glide, interrupted by
silence. That is, the impression of `̀ common trajectory’ ’ (or in Gestalt terms, ``good
continuation’ ’ ) is maintained, even in the discontinuous stimulus. Certainly, these glides
produce considerably less disruption than other non-speech stimuli used in previous
experiments. For example, in Experiment 4, whereas the instruments/repeated F0
sequences produced 20% more errors than spoken digits, the glides used in the present
experiment produced only approximately 11% more errors than spoken digits.
One possible confounding factor is the average p itch of the p itch glides. At 650 Hz, the
average is appreciably above that of the standard and higher than that of any of the
interpolated stimuli in the current experimental series. Although remote pitches in the
interpolated sequence have been associated with decreased disruption (Semal & Demany,
1991 ; Experiment 1), the level of performance w ith glides is still appreciably worse than
digits spoken at frequencies relatively close to the standard .
Taken together with the results of Experiments 4 and 5, Experiment 6 provides further
support for the app licability of the auditory scene analysis fram ework to disruptive effects
in pitch memory by indicating that such disruption may be determined by varying the
ex tent to which interpolated items form a coherent perceptual entity.
M E M O R Y F O R T O N E S 3 5 3
G E N E R A L D IS C U S S I O N
Before embarking on a discussion of the broader implications of the ® ndings, it may be
useful to summarize the main outcomes. Experiment 1 replicated the major feature of the
studies of Deutsch (1970) that a digit sequence produced much less interference than a
tone sequence when interpolated between the standard and a recognition test. The use of
a voice (repeated ``one’ ’ w ith each item in the sequence at a different pitch) alone was
insuf® cient to produce the low level of d isruption found w ith a string of digits; instead,
the effects were very much like those of a sequence of cello notes produced with the same
range of frequency variation. Speech is therefore not a suf® cient cond ition for the low
degrees of disruption typically found with unprocessed digit sequences. Increasing the
interval between the standard and the ® rst item in the interpolated set improved perform-
ance only for interpolated tones, in line w ith the idea that part of the dif® culty in making
the recognition judgement was isolating the standard from subsequent sounds rather than
retroactive interference by these sounds.
By examining the effect of nearly doubling the number of interpolated tones in
Experiment 2 , two contrasting predictions could be adjudicated: If the effect of inter-
polated sequences on memory performance was largely due to retroactive interference,
the increased number of interpolated tones should produce a sharp decline in recognition
performance, but if the effect was largely due to grouping, the interpolated list would be
more distinct from the standard, and recognition should improve markedly. The results
favoured the latter view. Experiment 3 further demonstrated the importance of grouping
processes by manipulating the spatial location of the interpolated tones relative to the
standard and comparison. Both binaural and contralateral presentation of interpolated
tones produced less disruption than ipsilateral presentation , but they did not differ from
each other in their disruptive capacity. The results therefore cannot be ascribed to the
action of an ear-speci ® c store; rather, they are in line with the idea that spatial location
may serve to differentiate the standard from the interpolated list. The results of Experi-
ment 4 suggested that an important feature of spoken digits leading to the low level of
disruption cau sed by such interpolated items is the common pitch of all items in the
sequence: D igit sequences sharing a common frequency caused less disruption than do
those changing in frequency (see further discussion ). Experiment 5 showed a similar
pattern with non-speech sounds. Interpolated sequences constructed from different
instrument sounds sharing a common fundamental frequency produced less disruption
of pitch recognition than did sequences of instrument sounds changing in fundamental
frequency from instrument to instrument. However, these stimuli still produced substan-
tially more disruption than did spoken digits. Reversing the digits Ð thus rendering them
meaningless wh ile maintaining a common spectrum Ð increased disruption, but not by an
appreciable amount. W hether this was an effect of reducing meaning is doubtful. It is
more likely that the process of reversal disrupted dynam ic cues to coherence. The
importance of such dynamic cues in binding a stimulus sequence was illustrated in
Experiment 6 by the relatively low levels of disruption produced by discrete frequency
glides lying on a common frequency trajectory.
Overall, these results h ighlight the importance of grouping processes in determ ining
how representations are formed in memory. It is the grouping of the material into a
3 5 4 J O N E S , M A C K E N , H A R R I E S
coherent whole, enabling the perceptual isolation of the standard, wh ich determines
subsequent recogn ition accuracy. The characteristics of the interpolated sequence will
determ ine the organization of its elements into a coherent whole. The notion of grouping
also ® ts neatly with other ® ndings in the area of pitch recogn ition , includ ing those of
Deutsch . For example, Deutsch (1980 ) found that interpolated sequences constructed
from monotonic ascending or descending sequences of tones produced less disruption
than d id the same tones ordered random ly. This outcome, we would argue, is due to the
effect such a man ipulation will have on the tendency for the interpolated items to form a
coherent perceptual group. In another experiment, Deutsch showed that this bene® cial
effect of monoton ic ordering of interpolated tones was attenuated by increasing the range
over which the items varied from one to two octaves. Such a manipulation would be
expected to decrease the tendency for coherent grouping, as frequency proxim ity is a
powerful cue to stream formation (Bregman, 1990).
Widening the spectra of the interpolated tones, thus cau sing them phenomenally to
sound more ``noisy’ ’ , has been shown to improve recognition performance compared to
pure tones (Semal & Demany, 1993, Experiment 4 ). Such effects of timbre are in line with
the results of the current series of experiments, but inconsistent with the results of an
earlier study by Semal and Demany (1991, Experiment 1). However, evidence from a
range of other paradigms converges to suggest the importance of timbre. For exam ple,
pitch and timbre do not completely dissociate if subjects have to categorize the pitch of a
single tone as quickly as possible (Crowder, 1989; Krumhansl & Iverson, 1992) or if
successive comparisons have to be made between tones with same or different timbres
(M elara & M arks, 1990; Singh & Hirsh, 1992). Given that stimulus features other than
timbre seem to point to the role of stream segregation in memory for pitch within the
Deutsch parad igm , it would be rather surprising on logical grounds if timbre were an
exception to the general rule, particularly in light of its pre-eminent role in perceptual
segregation in studies of auditory perception (see for example, van Noorden, 1975; Singh,
1987 ).
With a broad view of the present series of experiments, the evidence strongly suggests
that interpolated speech is not a suf® cient cond ition for low levels of disruption, but the
case that it is not a necessary condition has also not been fully ¯ eshed out. Granting a
special status for speech usually also invokes the notion of speech modules or special
modes o f processing (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967;
Limberman & M attingly, 1985 ), a step that lacks parsimony and is attended by a
number of logical dif® culties (see Pisoni, 1978, for an overview). Certain ly, the results
of Experiment 4 of the current series of experiments do not allow a strong version of
the modular action of speech to be entertained. In that experiment, speech stimuli were
shown to produce disruption sim ilar to that of tones and more than that of pitch glides.
This result alone is enough to cast doubt on the necessary status of speech: A strict
version of modularity would restrict its action to speech at an abstract level and would
be independent of particular instantiations such as those produced by a change of voice.
However, the case is not a conclusive one, and further work systematically assessing
factors such as spectral continuity is needed before speech is granted special status.
W hatever the precise status of speech, the presen t series points to the pre-em inent role
of organizational factors and, in more abstract terms, to the intimacy of the relationship
M E M O R Y F O R T O N E S 3 5 5
between perception and memory. Generally, in models of memory, it is assumed that
perception is logically prior to and separate from memory. An emerging alternative
view is the proceduralist position. As viewed by Crowder (1989, 1993), a leading pro-
ponent of th is view, the proceduralist position ``denies that information is retained in
memory stores in a sim ilar way to receptacles’ ’ ; instead, ` r̀eten tion is a natural con-
sequence of the information processing that was originally engaged by the experience
in question’ ’ (1993, p. 115). Although Crowder (1993) sought to align the proceduralist
view with a neuropsychological instantiation, it is entirely possible that instead of viewing
distinctiveness of representation anatomically, it can be viewed propositionally Ð that is,
in terms of some abstract representation of the rules of organization. It follows that those
rules governing organization determine the distinctiveness of representation in memory.
In fact, previous accounts of the procedural v iew of memory have explicitly excluded the
results of Deutsch (1970), Pechmann and M ohr (1992), and by inference those of Semal
and Demany (1991, 1993), regarding them as one of a small number of special cases
calling for a modular architecture. Indeed, Crowder drew speci® c attention to their
results, claim ing that they `̀ bespeak a tidy modu larity in memory’ ’ (1993, p. 124) as
they re¯ ect the action of sensory storage systems that fall outside the procedural mechan-
isms of short-term memory. On the basis of the foregoing experiments, exclusion from
the general proceduralist framework seems unwarranted.
The results of the foregoing series of experiments, by illustrating the importance of
grouping, have thrown into relief the importance of perceptual factors in determ ining
recognition performance. A challenge for subsequent work will be the ¯ eshing-out of the
factors that determ ine the coherence of sound sequences and their implications for
performance in a variety of memory tasks.
R E F E R E N C E S
Bregman, A.S. (1990). Auditory scene a na lysis. Cambridge, MA: M IT Press.
Bregman, A.S., & Dannenbring, G. (1973). The effect of continuity on auditory stream segregation.
Perception a nd Psychophysics, 13 , 308± 312.
Bregman, A.S., & Rudnicky, A.I. (1975) . Auditory segregation: Stream or streams? J ourna l of Experi-
menta l Psychology : Huma n Perception a nd Performance, 1 , 263± 267.
Crowder, R.G. (1989) . Imagery for musical timbre. J ourna l of Experimenta l P sychology : Human Percep-
tion a nd Performa nce, 15 , 472± 478 .
Crowder, R.G. (1993) . Auditory memory. In S. M cAdams & E. Bigand (Eds.), Thinking in sound: The
cognitive psychology of huma n a udition (pp. 112 ± 145). Oxford: Oxford University Press.
Deutsch, D. (1970) . Tones and numbers: Speci® city of interference in short-term memory. Science, 168,
1604± 1605.
Deutsch, D. (1978). Interference in pitch memory as a function of ear of input. Qua rterly J ourna l of
Experimenta l Psychology, 30A , 283± 287.
Deutsch, D. (1980) . The processing of structured and unstructu red tonal sequences. Perception and
Psychophysics, 28, 381± 389.
Deu tsch, D. (1984). M emory fo r nonverbal aud itory info rmation: A link be tween behavioral and
physiolog ical studies. In L.R. Squire & N. Butters (Eds.), Neuropsychology of memory (pp. 45 ± 54).
New York: Guilford.
Fraisse, P. (1978) . Time and rhythm perception. In E.C. Carterette & M .P. Friedman (Eds.), Handbook
of perception. Vol. 8: Perceptua l coding (pp. 189 ± 233). New York: Academic Press.
3 5 6 J O N E S , M A C K E N , H A R R I E S
Jones, D.M., & Macken, W.J. (1995) . Organizational factors in the effect of irrelevant speech: The role of
spatial location and timing. Memory a nd Cognition, 23 , 192± 200.
Jones, D.M ., M acken, W.J., & M urray, A .C. (1993). D isruption of visual short-term memory by
changing-state auditory stimuli: The role of segmentation. Memory and Cognition, 21 , 318± 328.
Kallman, H.J., Cameron, P.A., Beckstead, J.W., & Joyce, E. (1987) . Ear of input as determinant of pitch-
memory interference. Memory a nd Cognition, 15, 454± 460.
Kallman, H.J., & Massaro, D.W. (1979) . Similarity effects in backward recogn ition masking. J ourna l of
Experimenta l Psychology : Huma n Perception and Performance, 5 , 110 ± 128.
Krumhansl, C.L., & Iverson, P. (1992). Perceptual interactions between musical pitch and timbre.
J ourna l of Experimenta l P sychology: Huma n Perception and Performance, 18 , 739 ± 751.
Liberman, A.M., Cooper, G.S., Shankweiler, D.P., & Studdert-Kennedy, M . (1967). Perception of the
speech code. Psychologica l Review, 74 , 431± 461.
Liberman, A.M ., & Mattingly, I.G. (1985) . The motor theory of speech perception revised. Cognition,
21, 1 ± 36.
M elar a, R.D., & M arks, L.E. (1990). Perceptual primacy of d imensions: Support for a model of
dimensional interaction. J ourna l of Experimenta l P sychology : Huma n Perception & Performance, 16,
398± 414.
Pechmann, T., & M ohr, G. (1992). Interference in memory for tonal pitch: Implications for a working
memory model. Memory a nd Cognition, 20 , 78 ± 98 .
Pisoni, D.B. (1978) . Speech perception. In W.K. Estes (Ed.), Ha ndbook of lea rning a nd cognitive
processes: Vol. 6. Linguistic functions in cognitive theory (pp. 167 ± 233) . Hillsdale, NJ: Lawrence
Erlbaum Associates, Inc.
Semal, C., & Demany, L. (1991) . D issociation of pitch from timbre in auditory short-term memory.
J ourna l of the Acoustica l Society of America , 69 , 2404 ± 2410.
Semal, C., & Demany, L. (1993). Further evidence for an autonomou s processing of pitch in auditory
short-term memory. J ourna l of the Acoustica l Society of America , 94 , 1315 ± 1322 .
Singh, P.G. (1987) . Perceptual organisation of complex-tone sequences: A trade-off between pitch and
timbre? J ourna l of the Acoustica l Society of America , 89 , 1890 ± 1991.
Singh, P.G., & Hirsh, I.J. (1992). In¯ uence of spectral locus and F0 changes on the pitch and timbre of
complex tones. J ourna l of the Acoustica l Society of America , 82 , 886± 899.
van N oorden , L .P.A .S. (1975). Tempora l coherence in the perception of tone sequences. D octo ral
dissertation, Technische Hogeschool, Eindhoven, The Netherlands.
Origina l manuscript received 28 J une 1994
Accepted rev ision received 5 August 1996
M E M O R Y F O R T O N E S 3 5 7