Download - Developmental origins of adult phonology · behavioral realism and explanatory adequacy are raised, problems arise and consensus tends to disappear. 2. Why a new paradigm is needed

Björn Lindblom: Developmental origins of adult phonology

1

Developmental origins of adult phonology

The interplay between phonetic emergents and the evolutionary adaptations of

sound patterns

Björn Lindblom

Stockholm University, Sweden

1. Introduction

Fundamental to linguistic methodology is to distinguish between the abstract

structure of an utterance, its form, and its behavioral expression, its substance. The

traditional division of labor between phonology and phonetics derives from that

distinction (Fischer-Jørgensen 1975). A crucial step in the history of the discipline

was taken by stipulating that form (la langue) take precedence over substance (la

parole) (Saussure 1916).

However, as students of speech sounds endeavor to increase the explanatory

adequacy of their descriptions, it is becoming increasingly clear that the assumption

of the ‘logical priority of linguistic form’, left essentially intact since de Saussure, is

counter-productive to that goal. One of the aims of this introduction is to reexamine

this time-honored assumption.

An area where the consequences of the doctrine are particularly evident is

language acquisition where, logically, the priority of linguistic form cannot be applied

in any plausible way, since at the onset of development there is no form. We shall end

our introductory remarks by concluding that, if accounting for how children learn

their native sound systems is to be part of explanatory linguistics, the doctrine of

‘form first, then substance’ must be rejected and replaced by another paradigm. The

question of what that framework should be will be considered in the second part of

the paper.


2

1.1 The ‘inescapable’ dogma of 20th century linguistics

The roles of phonology and phonetics are schematically diagrammed in Figure 1.

The starting point is spoken samples from a given language observed by ear and

specified in terms of the elements of a universal phonetic alphabet. This provides raw

materials for functional analyses in which judgements of contrast by native

informants play a crucial role.

Experimental phonetics contributes physical descriptions of the perceptually

relevant correlates of the phonological units. In principle, these specifications can be

translated into audible, and therefore perceptually testable, form by means of speech

synthesis. The key notion is that speech performance is to be analyzed as a realization

of underlying, grammatically determined aspects of sound. Phonology aims at

discovering those aspects, whereas phonetics describes how, once defined, linguistic

structure is actualized by the speaker and how it is recovered from the signal by the

listener.

The significant phrase here is ‘once defined’, since, traditionally, physical

observations cannot precede functional analyses: Linguistic form must take

precedence over phonetic substance.

natural speechpatterns

⇓functional analyses phonology (form)

⇓formal units, rules

⇓instrumental analyses phonetics (substance)

⇓physical correlates ofperceptually relevant

attributes

Figure 1 The traditional division of labor between phonetics and phonology.


3

A glimpse of the origins of the form-substance distinction can be obtained by

considering the difficulties that the founders of the International Phonetic Association

must have had in their attempts to create a phonetic alphabet, e.g., the problems of

‘phonetic variability’ and ‘phonetic detail’.

What speaking style should phonological analyses be based on? Suppose numerous

instances are recorded of the ‘same’ German utterance, e.g. ‘mit dem Wagen’, ranging

from clear �� to more casual forms such as �� (Kohler

1990). A standard move has been to exclude such style-dependent variations from the

domain of phonology proper. As stated by Jakobson & Halle:

"When analyzing the pattern of phonemes or distinctive features composing them, one mustrecur to the fullest, optimal code at the command of the given speakers." Jakobson and Halle(1968:413-414).

Second consider the problem of ‘phonetic detail’. Should tenth be represented as

��, ��, or with even more detail? Sweet concluded:

"It is necessary to have an alphabet which indicates only those broader distinctions of soundwhich actually correspond to distinctions of meaning in a given language." (Sweet (1877, 103-104)).

It has been claimed that one of de Saussure’s major contributions was:

"…. to focus the attention of the linguist on the system of regularities and relations whichsupport the differences among signs, rather than on the details of individual sound andmeaning in and of themselves......For Saussure, the detailed information accumulated by phoneticians is of only limited utilityfor the linguist, since he is primarily interested in the ways in which sound images differ, andthus does not need to know everything the phonetician can tell him......By this move, then, linguists could be emancipated from their growing obsession with phoneticdetail." (Anderson 1985:41-42, italics ours).

The form-substance distinction ‘solves’ the problems of phonetic detail and

phonetic variability by invoking a process of abstraction and idealization. It replaces

variable and context-dependent behavioral data by invariant and context-free entities,

such as phonemes and allophones. Phonetic substance is stripped away as


4

linguistically irrelevant so as to uncover the phonologically significant structure

assumed to be embedded in that substance. In other words, progress is achieved by

making phonological structure independent of its behavioral use.

As mentioned, physical observation can never precede functional analysis. For an

illustration consider the following observations. When the Swedish word ‘nolla’

(‘zero’) is played backwards, native Swedes hear ‘hallon’ (‘raspberry’) rather than the

non-word ‘allon’ (Lindblom 1980)1. Spectrograms indicate that, when spoken as a

citation form, ‘nolla’ has expiration noise at the end of the final vowel. This noise is

heard as a speech sound when the tape runs backwards, but not when the word is

presented in the forward direction. Another finding is that, when the name ‘Anna’ is

played backwards, native Swedes hear ‘Hanna’ rather than ‘Anna’ which indicates

that the perceptual asymmetry of the ‘nolla-hallon’ example is likely to originate in

auditory processing (e.g., differences between forward and backward masking) rather

than in language-specific lexical access. The point of these examples is that the same

physical pattern (the utterance-final noise) is a linguistically significant event in one

situation, but not in the other. To the phonologist,

“ … nothing in the physical event ... tells us what is worth measuring and what is not."(Anderson 1985:41)

The ‘nolla-hallon’ demonstration2 conforms with the widespread conviction that

making phonetic measurements, no matter how comprehensive, would not help the

phonologist, since only the ear and the brain of the native speaker can determine what

1 These words have accent II, the ‘grave’ accent. Produced as citation forms their F0 contoursare fall-rise patterns which will remain fall-rise patterns also when reversed.2 The full account of the nolla-hallon effect (Lindblom 1980) has a second part which relatesthe observed perceptual asymmetry to the fact that the world’s languages seem to prefer using theglottal aspirated [h] in syllable-initial over syllable-final position. If, as we suggest, the nolla-hallonasymmetry is linked to universal characteristics of human hearing, this speech-independent auditoryproperty could be invoked to explain the typological observations as well. However, predicting suchpatterns would require a non-traditional framework that puts ‘substance’ first and ‘form’ second (seefurther discussion below).


5

is of linguistic relevance in a speech signal. It is observations of this sort that have led

linguists to stipulate that form must come first, then substance.

This distinction has been central for all of 20th century linguistics. Linguists have

left it intact assuming their primary concern to be with the individual native speaker's

competence (mental grammar, tacit knowledge), not with performance (its behavioral

instantiations):

"It seems natural to suppose that the study of actual linguistic performance can be seriouslypursued only to the extent that we have a good understanding of the generative grammars thatare acquired by the learner and put to use by the speaker or hearer. The classical Saussureanassumption of the logical priority of the study of langue (and the generative grammars thatdescribe it) seems quite inescapable." (Chomsky (1964:52), italics ours).

1.2 The focus of phonetics: “Given the units, what are the phonetic correlates”?

As these remarks suggest, sound structure is postulated, not observed in the

laboratory. Nonetheless, experimental phoneticians have accepted the ‘logical priority

of form’ since, without an analysis of utterances into some kind of abstract units, it

would be difficult to make sense of laboratory records. Therefore the following 30-

year-old handbook statement on the relationship between phonetics and phonology

continues to be a valid description of how speech sounds are analyzed.

"…. a combination of a strictly structural approach on the form level with an auditorily baseddescription on the substance level will be the best basis for a scientific analysis of theexpression when manifested as sound. This description has to start by the functional analysis,then it must establish in auditory terms the distinctions used for separating phonemic units,and finally, by means of appropriate instruments, find out which acoustic and physiologicalevents correspond to these different units. The interplay between the different sets ofphenomena will probably for a long time remain a basic problem in phonetic research(Malmberg 1968:15)"

We conclude that, in keeping with the ‘inescapable’ dogma, the focus of phonetics

is placed on describing how postulated phonological units are realized in production

and how they are recovered from the signal in perception. In short, “given the units,

what are the phonetic (behavioral) correlates?”


6

In suggesting that progress was made by defining phonological structure as

independent of on-line use and by relegating the study of individual, situational and

style-dependent variations to phonetics and other performance-oriented disciplines,

the preceding account is unlikely to be controversial. However, as soon as issues of

behavioral realism and explanatory adequacy are raised, problems arise and consensus

tends to disappear.

2. Why a new paradigm is needed

2.1 The behavioral realism of linguistic form

Let us begin by mentioning two classical issues that remain unresolved despite

decades of experimental work. They are closely linked to accepting the priority of

linguistic form: (i) the question of the ‘psychological reality’ of linguistic units and

rules. (ii) the issue of ‘phonetic invariance’.

The ‘psychological reality’ issue derives from the fact that the formal constructs of

linguistic analyses are postulated rather than observed. Data on e.g., the alphabet,

speech errors (Fromkin 1973), word games, synchronic and diachronic phonology

(Halle 1964)3 have been used as evidence for segmental organization and discrete

units as psychologically genuine phenomena. However, this evidence is only indirect

and that leaves room for alternative interpretations. Therefore, it is not surprising that

phoneticians and psycho-linguists differ as to how compelling that evidence really is

(Ladefoged 1984).

The ‘phonetic invariance’ issue has a similar source. It arises from the fact that

natural speech patterns exhibit extensive complex individual, situational and stylistic

variations, and by assuming that formal linguistic units - stripped of variability – can

by hypothesis be upgraded from ‘operationally defined’ to ‘behaviorally real’. On the


7

one hand, a variable phonetic reality - on the other, context-free invariant linguistic

representations. The mismatch between the two generates the invariance issue. Again

there is indirect evidence but so far there has never been a direct demonstration of

phonetic invariance as a physical observable (Perkell & Klatt 1986).

Are these two long-standing problems indications of significant, but reparable

cracks in the theoretical edifice of linguistic science? Or are they irremediable

consequences of the ‘priority of linguistic form’ portending a paradigm shift? Lacking

solutions, these difficulties have given some speech researchers second thoughts

concerning the behavioral status of phonological units as invariant and context-free. A

case in point is the recent interest in ‘exemplar models’ of speech perception (more

anon).

2.2 Explanatory adequacy in phonology and the form-substance distinction

Few linguists would currently interpret the form-substance distinction so

rigorously as to claim that phonological units and processes are totally arbitrary,

empty logical phenomena4. Recall the following afterthought in chapter nine of The

sound pattern of English:

“The entire discussion in this book suffers from a fundamental theoretical inadequacy. ……..The problem is that our approach to features, to rules, and to evaluation has been overlyformal. ….. In particular, we have not made any use of the fact that features have intrinsiccontent.” (Chomsky and Halle 1968:400; italics ours).

Contemporary phonology presents numerous developments (e.g., ‘Optimality

Theory’, ‘Grounded Phonology’, ‘Laboratory Phonology’) indicating that attempts are

being made to link the description of sound patterns more tightly to the production

3 “Almost every insight gained by modern linguistics from Grimm’s law to Jakobson’sdistinctive features depends crucially on the assumption that speech is a sequence of discrete entities.”(Halle 1964).4 In “Why phonology isn't ‘natural’", Anderson (1981) acknowledges a role for performance-based accounts, but places them outside linguistics proper, since they fail to deal with the aspects thatought to interest the linguist the most, viz., the formal idiosyncracies of Language per se. On that view,what counts as a ‘real’ linguistic explanation is one that deals with the functionally inexplicable.


8

and perception of speech. It appears clear that phonetics and phonology are

undergoing a rapprochement. Such ‘second thoughts’ imply a softening of the

condition that, like the rest of grammar, sound structure be autonomous and

independent of language use.

Are these developments indicative that a more productive fine-tuning of the

phonetics/phonology division of labor is under way? Or are they signs of a growing

realization that accepting the priority of form creates an impasse that unnecessarily

deprives linguistics of explanatory power?

2.3 The logical priority of ‘la parole’ in the study of language acquisition

The perspective of child phonology further underscores how real and serious these

questions are.

On the one hand, it appears reasonable to expect a phonological theory with

explanatory ambitions to aim at accounting for how children develop the sound

structure of their native languages. In response to the question, ‘Where does

phonology come from?’ linguistics would provide a developmental answer instead of

claiming that it is largely determined by our genetic endowment (=nativism)5, or that

it has to be postulated given analysis methods and the observations themselves

(=curve-fitting). On the other hand, honoring the ‘priority of linguistic form’ does not

make sense in the case of phonetic learning because at the onset of development there

is no ‘form’.

We are forced to conclude that research in this area does not conform to the game

plan of Figure 1 and that the focus question of traditional phonetics, ‘given the units

5 “This analysis into features could not plausibly be said to have been learned, for there aresurely few experiences in the life of a normal individual who is not a professional linguist or aphonetician that would lead her/him to develop a system of features for classifying speech sounds. Oneis, therefore, led to assume that the speech-analysing system is part of our genetic endowment..." (Halle& Stevens 1979:339-340). "And similar correlations between articulatory activity and acoustic signal


9

what are the phonetic correlates’, is utterly problematic. It would seem preferable to

rephrase it as: “Given the child’s behavior what are the units?”. Consequently, the

‘inescapable’ dogma of 20th century linguistics does not apply to language

acquisition.

If accounting for how children learn their native sound systems is to be part of

explanatory linguistics, the priority of form must be rejected and another paradigm

must be found. How could such a framework be developed?

3. Modeling phonological development as emergent computation

Methodological conditions, long-term goals and hypotheses.

The present section programmatically sketches fragments of a theory of emergent

phonology. The following ground rules provide the key to escaping the explanatory

impasse imposed by the priority of linguistic form:

1. Phonological structure must not prematurely be assumed to be genetically pre-specified. Rather itshould be deduced from the child’s experience and minimal assumptions about ‘initialknowledge’. In the technical sense of the term, it should be derived as emergent behavior.

2. Phonological structure should not be postulated simply because the entities or processes aresuggested by the data to be explained. Always seek independent motivations in ‘first principles’and avoid mere ‘curve fitting’.

Restated the first rule says that nativism should be replaced by emergent

computation. The second is an anti-circularity condition. In summary, the common

message is ‘deduce rather than postulate’!

The following presentation is made up of several hypotheses. (i) Cumulative

perceptual experience is complex but shows lawful effects of emergent categorization.

(ii) Motor learning unfolds according to a criterion of minimum energy consumption.

This universal physiological constraint puts the child within closer reach of the

articulatory patterns of its native phonology. (iii) Reuse of perceptual and motor

are genetically provided for each of the nineteen or so features that make up the universal set ofphonetic features" (Halle & Stevens 1991:10).


10

patterns is favored owing to a metabolic constraint on memory formation. These ideas

exemplify possible roles that listening, speaking and learning might have in the

shaping of sound systems and suggest domains where behaviorally motivated ‘first

principles’ might be sought. Ambient input interacts with all three. (iv) Languages

exhibit (a tangled fabric of) adaptations to the proposed processes, e.g., patterns of

perceptual contrast, articulatory ease and combinatorial coding of submorphemic

elements such as ‘features’ and ‘segments’. (v) Furthermore, it is assumed that,

although socio-cultural evolution sometimes opposes the effects of the above-

mentioned factors, linguistic systems nevertheless retain those adaptations owing to

the blind phonetic ‘editing’ unwittingly performed by speakers, listeners and learners

during on-line language use (Lindblom, Guion, Hura, Moon & Willerman 1995).

Conceivably it might be objected that the present suggestions drastically

overestimate the role of functional constraints at the expense of formal factors. As a

brief response to that objection we note that obviously the complexity of the

functionalism vs. formalism issue is considerable. Therefore a computational

approach will be necessary. Our position is that any hypothesis propounded - whether

formal or functional, articulatory or perceptual - must eventually be evaluated using

‘first principles’ simulations in an integrated manner that give all the component

hypotheses a fair chance to compete and be numerically evaluated. Accordingly,

whether we favor a functionalist or a formalist stance the agenda becomes identical. A

framework of emergent computation will be needed in either case.

4 Listening

Emergent effects of cumulative perceptual experience

Studies of unscripted speech are beginning to draw attention to the drastic

modifications that phonetic forms frequently suffer under natural, non-laboratory


11

conditions (Kohler this volume). The variability of infant-directed speech with its

emotive coloring and lively prosody (Fónagy 1983, Fernald 1984), appears to be

similar to what is found in adult-to-adult styles (Kuhl et al 1997, Davis & Lindblom

1994, Ulla Sundberg 1998). It is a near certainty that the invariance issue would need

to be resolved also for Baby Talk. How do children segment the legato flow of signal

information into words? How do they factor out emotive and stylistic transforms

(Johan Sundberg this volume)? In short, how do they manage to build their

phonologies from a complex input?

Machine learning including neural network theory appears relevant to that

question, particularly the ‘unsupervised’ approaches (Hinton & Sejnowski 1999). For

instance, work has been done to study how ‘structure’ can be derived from complex

inputs such as human faces. There is neuro-physiological evidence indicating that the

brain represents whole objects in terms of component parts (Wachsmuth, Oram &

Perrett 1994, Logothetis & Sheinberg 1996). Lee & Seung (1999) describe an attempt

to simulate such behavior computationally. They designed an algorithm that learned

to analyze faces non-holistically, i.e., it automatically parsed them into parts

resembling “several versions of mouths, noses and other facial parts” (p 789). The

term ‘parts’ is here used to refer to “entities that allow objects to be reassembled using

purely additive combinations” (Mel 1999).

The field of automatic speech recognition also offers results rich in implications

for phonetics and phonology. It turns out that currently the best-performing systems

are not based on extensive a priori knowledge about phonetic structure. They do

surprisingly well simply by exploiting statistical regularities in the speech signal.

‘Units’ are derived automatically as the stored data form clusters and as patterns of

‘tied states’ become defined (Young & Woodland 1993).


12

These and other findings raise the question whether, in processing speech,

children’s brains come up with units-based representations in a similarly unsupervised

fashion. If so, perceptual phonological units would not represent abstract ‘form’ but

simply the emergent patterning of the phonetic information.

The philosophy of the work just exemplified is reminiscent of so-called exemplar-

based models of perception and learning, a paradigm explored for some time by

psychologists (Estes 1993). Exemplar models make minimal assumptions about

‘initial conditions’ and are therefore not guilty of ‘resolving’ issues such as the

variability problem by postulating unknown, innate mechanisms. They make the most

of the signal and its complex, but lawfully structured, variability before positing

abstract hypothetical decoding mechanisms.

Johnson & Mullenix (1997) compare traditional and exemplar-based approaches to

speech perception. They point out that classical accounts assume representations (e.g.,

phoneme-sized units) to be simple (context-free invariants). The task of deriving such

units from the speech signal calls for complex processes capable of extracting

invariants. Mechanisms of this type have been proposed – e.g., the ‘phonetic module’

of the motor theory (Liberman & Mattingly 1985), the ‘smart mechanisms’ of direct

realism (Fowler 1986, 1994) and the ‘top-down’ processes (reconstructive rules,

inference making and hypothesis testing) of cognitively oriented approaches. The

details of how they operate still need to spelled out.

Exemplar accounts adopt the opposite perspective. They assume representation to

be complex and mapping to be simple. Categories form as emergent products of

cumulative phonetic experience. A key point is that, although the variability of speech

signals is extensive, it is highly systematic. Exemplar models capitalize on this fact

storing stimulus information along with its immediate signal context. As more data


13

accumulate, systematic co-variations among stimulus dimensions gradually appear.

The system can be said to use context to sort and disambiguate the variability. As a

result, speech sounds have complex and contextually embedded representations unlike

abstract phonetic segments. However, with sound systems shaped by a distinctiveness

constraints (Diehl & Lindblom in press, Johnson this volume) and a perceptual space

of sufficiently large dimensionality, the integrity of sound-meaning relationships

should have a fair chance of being maintained.6

5 Speaking

5.1 Clues from the study of non-speech motor processes

‘Articulatory ease’ is sometimes informally invoked to explain both phonetic and

phonological observations. It has a certain common-sense appeal but admittedly its

current status is controversial. Ladefoged’s position (1990) is (i) that it is language-

dependent, (ii) that it cannot be measured and (iii) that therefore appeals to it are

unscientific. In a paper on assimilation, usually seen as an articulatory process, Ohala

(1990) rejects articulatory factors in favor of a perceptual account. He argues that

articulatory ease is likely to play a marginal role in shaping sound patterns and that

invoking it makes explanations teleological.

As warnings against uncritical use of articulatory ease such statements are well

taken, but, in the broader context of experimental biology, they appear overly

pessimistic.

This field presents a large literature on the energetics of locomotion in various

species. Quantitative data are available on how humans and dogs walk and run, birds

and bumblebees fly and how fish swim. A standard way of presenting results is to plot

6 For an illustration of this reasoning see the simplified exemplar-based account (Lindblom

1990, 1996) of phonetic learning in Japanese quail (Kluender et al 1997). For attempts to implementexemplar learning computationally see Lacerda (1995).


14

the amount of energy that the subject expends against traveling speed. The energy

used is inferred from measurements of oxygen consumption made for subjects under

steady-state conditions and therefore in an aerobic mode of oxygen uptake (McNeill

Alexander 1992). A typical example of this research is the study by Hoyt & Taylor

(1981) who measured energy consumption for horses walking, trotting and galloping.

The subjects were observed as they moved freely and at speeds controlled by a

treadmill. The energy used expressed per unit distance traveled and plotted against

traveling speed formed U-shaped curves with distinct minima. Significantly, these

minima were found to occur at speeds that subjects spontaneously adopted when

moving freely and unconstrained by the speed of the treadmill. Such findings rest

solidly on a large body of physiological studies (McArdle, Katch & Katch 1996) and

have been reported for a number of species. Experimental biologists interpret them to

suggest that locomotion is shaped by a criterion of ‘minimum energy expenditure’.

5.2 Why should speech movements be different?

Are speech movements and whole body movements similarly organized? Since

energy costs for speech are likely to be small in comparison with those of locomotion,

it might be argued that they play no major role at all in shaping phonetic movements.

It is true that, until speech energy costs can be reliably measured, we have no basis for

settling that issue satisfactorily. However, evolution’s tendency towards parsimony

would make us expect the same rules to apply for small as for big movements.

Among phoneticians, it is widely believed that both speech and sound patterns

have many characteristics that are most readily accounted for in terms of production

constraints. Conceivably, we will ultimately be able to show that many of them derive

from a minimum energy expenditure condition. For instance, in running speech,

prosodic modulations and speaking styles produce both strong elaborated and weak


15

reduced forms. In our opinion, this segmental dynamics is an obvious candidate for an

analysis based on energetics. Similarly, looking typologically at phonological

systems, we observe a clear preference for low-cost motor patterns (Lindblom 1983).

We hypothesize that minimization of energy expenditure plays a causal role in:

• the absence of vegetative movements and mouth sounds;• determining the feature composition of phonetic segments (e.g., why are /i/ and /u/ universally

‘close’ vowels?);• constraining the universal organization of syllabic & phonotactic structure,• the patterning of diachronic and synchronic lenition and fortition processes;• shaping the system-dependent selection of phonetic values in segment inventories;• ….

5.3 Recognizing the ‘DOF problem’ for speech production

Motor systems offer their users an extremely rich set of possibilities for executing

a given task. In principle, there is an infinite number of trajectories that a movement

from one point to another could take. This motoric embarrassment of riches is

technically known as the Degrees of Freedom (DOF) problem. Solving the DOF

problem means selecting a unique movement from a very large search space. As the

following example will show, speech production offers talkers countless possibilities

for any given task which makes DOF a very real issue also for the phonetician.

Articulatory modeling (Lindblom & Sundberg 1971, Maeda 1991) has shown that

there is a continuous trade-off between jaw opening and tongue raising in producing a

given vowel, e.g., an /i/. In principle there is an infinite number of ways in which a

given /i/ formant pattern could be produced.

The normal way of making this vowel is to raise the jaw and adopt a moderately

palatal tongue shape. However, it has been experimentally demonstrated that, when

speakers are asked to produce a normal-sounding /i/ with an atypically large jaw

opening maintained by a ‘bite-block’ (Lindblom et al 1979), their output does not

approach the /�/-like quality predicted by area-function-to-formant calculations. In

fact, subjects are able to match the normal quality and the formant pattern of the


16

vowel quite closely, a result that clearly indicates a compensatory mode of

articulation. X-ray data (Gay et al 1981) have confirmed this interpretation showing

that, for bite-block /i/:s, subjects compensate by raising the tongue higher than normal

into a super-palatal position. This case is typical of many situations arising in

articulatory modeling. It shows that the DOF problem definitely also applies to

speech.

5.3 From walking to talking

In order to see how articulatory models might handle DOF let us briefly return to

some recent computational research on human walking (Anderson & Pandy 1999,

Pandy this volume). It offers an interesting solution to this problem.

The human body is represented as a 3-D model of the musculo-skeletal system.

The upper part consists of a rigid torso without arms. The lower part has 23 degrees of

freedom controlled by a set of 54 muscles. Attempts were made to simulate the

normal human gait cycle. The findings indicate that the model walks at a forward

velocity of 81 m/min, a value typical of human subjects (Ralston 1976). Predicted

displacements of anatomical structures were quantitatively similar to experimental

observations. Muscle coordination patterns were consistent with EMG data from

human subjects. Metabolic energy was expended at a rate comparable to that for

human walking. A compelling impression of the model’s realism is obtained from a

video demonstration of the performance of the model. The normal gait of the model

skeleton is presented and compared with average measurements from human subjects.

It shows the model walking in an extremely human-like fashion.

With so many muscles and mechanical dimensions, this model has a significant

DOF problem. Particularly relevant to discussions of articulatory ease in phonetics is

the fact that the results were obtained when a performance criterion of least metabolic


17

cost (= minimum heat production) was used. We can interpret the success of the

simulations as implying that the optimization criterion drastically reduces the search

space and makes it possible for the algorithm to identify unique and optimal

movement trajectories for each subtask during the gait cycle.

Summarizing so far we note (i) that minimum energy consumption evidently helps

solve the DOF problem for non-speech movements and (ii) that this problem also

arises for speech modeling. In the light of those observations it seems justified to

assume that energy costs ought to play an important role also in shaping on-line

speech as well as sound systems.

In order to get a preliminary idea of what energy costs might be for speech

movements, a simplified model of the mandible was constructed (Lindblom et al

1999). The jaw was represented by a system defined by its mass (m), damping (b) and

elasticity (k). The mass was equal to 250 g. Critical damping and a resonance

frequency of 5 Hz were assumed. The energy required to drive the system was

calculated as a function of frequency for sinusoidal jaw movement of 10 mm

0

10

20

30

40

50

0 5 10 15

Frequency of jaw movement (Hz)

Ene

rgy

cost

per

dis

tanc

e tr

avel

ed(j

oule

s/m

eter

)

Figure 2. The energy required to drive a biomechanical model of the jaw as a

function of frequency for a sinusoidal movement of 10 mm amplitude.


18

amplitude. The results plotted with energy per distance on the ordinate and frequency

of jaw movement on the abscissa indicated a function with a U-shape and a distinct

minimum similar to an ‘upside-down’ resonance curve and not unlike the locomotion

findings reviewed earlier.

6. Learning to speak

6.1 Articulatory boot-strapping: ’Easy-way-sounds-OK’.

The work of MacNeilage (1998) draws attention to the prominent role of the

mandible in babbling and early speech. He argues convincingly that speech did not

have to develop a new rhythm generator for the production of syllables. By the

evolutionary process of continuity and tinkering it made conservative use of existing

central pattern generators, namely those already developed for vegetative purposes.

“ … speech makes use of the same brainstem pattern generator that ingestive cyclicities do,and … control structures for speech purposes are, in part at least, shared with those ofingestion.” (MacNeilage 1998, p 503)

This helps explain the universal fact that virtually every utterance of every speaker

of every one of the world’s languages exhibits syllabic organization - that is, involves

a mandibular open-close movement. It also sheds light on why, both motorically and

sensorily, the jaw and the area around the mouth opening are particularly salient

regions of the vocal tract (Lindblom & Lubker 1985) and are therefore likely to be

explored early on.

Let us supplement this scenario with a few remarks based on energetics. Suppose

that talking is like walking. In other words, young children vocalizing behave exactly

like subjects walking and running in preferring energetically low-cost movements. If

so, their vocal systems would tend to be activated at the minimum points of the U-

shaped curves of their articulators. To further simplify this view of early vocal

behavior, let us limit the degrees of freedom of the production mechanism to the jaw


19

because of its vegetative salience. What would the articulatory and acoustic

characteristics of opening and closing the jaw at minimum energy cost be like?

Metaphorically, it would be given by the minimum value of the U-shaped curve and

correspond to an open-close alternation near the jaw’s resonance frequency.

Combining this movement with phonation would produce a quasi-syllabic acoustic

output resembling [bababa]. In other words, least effort applied to the jaw would

produce an utterance not unlike canonical babbling.

MacNeilage is right in making us wonder why open-close alternations should be so

ubiquitous in spoken language. We can restate the facts and interpretations presented

so far: (a) The low-energy articulatory search (start pianissimo!) is limited to only a

fragment of the child’s phonetic space (=mandibular oscillation). (2) It helps the child

spontaneously bump into many articulatory patterns used by the ambient phonology

(=’proto-syllables’) by significantly narrowing the alternative possibilities.

Could steps (1) and (2) be generalized and incorporated into a more comprehensive

model of phonetic learning? Do they constitute a general boot-strapping strategy for

discovering native articulatory patterns? An affirmative answer would be possible if it

could be shown that:

(a) the DOF problem for speech is solved in the same way as it is solved for non-speech movements. That would produce a strong statistical bias in favor oflow-cost motor patterns.

(b) Many aspects of the world’s phonologies are low-cost motor patterns.

(c) By cultural evolution the world’s phonologies could in principle havedeveloped biologically less optimal motor patterns than they use now, but havedone so only to a limited extent.

In our opinion there is a strong probability that all three claims are correct. The

reason is, we suggest, that sound patterns are adapted for phonetic development. Low-

cost motor patterns are retained so as to accommodate the child’s energy-efficient

search by providing ambient reinforcement of the child’s efforts (Davis &


20

MacNeilage this volume). The phrase ‘easy-way-sounds-OK’ captures the nature of

this boot-strapping. Phonologically organized speech presupposes the specialized

ability of vocal imitation (Studdert-Kennedy 1998, this volume). The present account

suggests that imitating is supplemented in important ways by mechanisms of motor

emergence. As articulations are fortuitously discovered the ‘easy way’ and confirmed

by the ambient input, perceptuo-motor links get established to budding perceptual

categories.

6.2 Where do phonological units come from?

The preceding discussion has concentrated on substantive aspects. In this final

section we address the possibility of behavioral origins (as opposed to the pre-

specification) of a formal universal of linguistic structure, e.g. the combinatorial

coding of discrete units.

To address this topic, we will describe a game based on a simple algorithm that

automatically analyzes holistic patterns into smaller elements and then re-uses those

elements. The phenomenon of re-use implies combinatorial organization. In keeping

with the spirit of the proposed phonetics/phonology program, the point is that the

derived units are emergent consequences of system growth and that they do not come

pre-specified. We suggest that this mechanism is formally similar to what goes on in

lexical development.

Phonetically the holistic patterns can be pictured as articulatory and/or auditory

patterns. The segmentation into smaller elements defines the ‘units’. Re-use of those

units is promoted by the fact that memory storage is associated with a biochemical

cost. This cost is hypothesized to derive from the energy metabolism of memory

formation (Gonzales-Lima 1992) and is an increasing function of the novelty of the


21

stored materials. Since novelty is expensive, holistic coding is disfavored whereas

parts-based re-use is not.

At this point a short summary is needed of some simplified neurobiological facts

about how memories are encoded. Learning causes the brain to change physically.

This change is activity-dependent. Active neural tissue contains more energy-rich

substances. Hence, learning costs metabolic energy. Such conclusions have been

drawn from histochemical analyses of brain tissue. Cytochrome oxidase is a substance

used as a marker of metabolic capacity. The mitochondrial amount of this enzyme is

assumed to reflect the functional level of activity in the neuron. More active neurons

have more cytochrome oxidase and more active regions within a neuron have more

mitochondria (Wong-Riley 1989).

Gonzales-Lima (1992) reports experiments in which rats were trained to associate

reward with an auditory stimulus (FM signal 1-2 kHz). After training for eleven days

the brains of experimental and control animals were examined for cytochrome

oxidase contents in their auditory neostriatum. The experimental group showed

significantly increased amounts of cytochrome oxidase. The proposed interpretation is

that the memory of the conditioning stimulus changes the neurons activated by the

task. This change takes the form of an increase in their metabolic capacity. Fuel

(’potential energy’) is available should a demand arise for their activation (e.g.,

recall). This is reminiscent of other familiar examples of activity-dependent change,

e.g., callous hands and bigger muscles.

These results suggest that a principle of ‘minimal incremental storage’ may be

embodied in the neural metabolism of memory formation. If so, it would mean that

patterns containing more information (more ’bits’ in the information theory sense) are

energetically more costly, and therefore they take longer, to commit to memory. Here


22

we do not confidently claim that this is the process underlying phonetic learning. Our

objective is rather to demonstrate that a formal-looking property such as

combinatorial coding could in principle readily arise for functional reasons. The mere

possibility of such an account should make us wary of ‘inescapable’ conclusions

about arbitrary formal idiosyncracies.

It is of course important that an account of emerging structure be completely non-

teleological. To say that children acquire large vocabularies because of the advantages

of combinatorial coding is to make a teleological argument. What has to be argued

instead is that combinatorial coding comes into existence owing to the fortuitous

coincidence of several factors. Once that happens, that mode of organization is

reinforced by its functional advantages. We hypothesize that one of the causal factors

behind the ability to code up to 100,000 words or more (Miller 1977) is a metabolic

constraint on memory formation.

6.2 The nepotism game: ‘close relatives get promoted’.

Imagine a 10-by-10 matrix with 100 cells. The point of the game is to choose a

sequence of n points located in the matrix so that a ‘cost criterion’ is minimized. We

consider two alternative definitions of ‘cost’:

(1) For every new cell we pay 1 unit!(2) For every new coordinate specification (row or column) we pay 0.5 units!

A single item costs one unit on either measure. For the first criterion, the cost is

equal to n units regardless of the cells selected. In the case of rule (2), costs can be cut

by selecting a cell in a previously activated row and/or column. As n (system size)

increases numerous opportunities for re-use arise. Figure 2 below shows a situation

with six points sequentially chosen according to the second measure. Selected cells

are marked in black. When a choice is made, the other cells of that row and column


23

become available at half price (0.5 units). This is indicated by the shading. Zero cost

is associated with cells at intersections of already committed rows and columns. The

example in Figure 2 costs 6 units when we pay per cell (first measure), but only 2.5

units when selections are priced by coordinate specifications as in (2).

Figure 3. Selecting a sequence of n matrix cells in accordance with a costcriterion.

We conclude that rule (1) corresponds to Gestalt coding and that, in conjunction

with cost minimization, rule (2) forces the system to go combinatorial.

6.3 Self-segmentation and the emergence of articulatory ‘re-use’.

To explore what this exercise might tell us about speech, let us interpret the matrix

as a crude articulatory space and replace rows and columns by continuous parameters,

say the phase and amplitude of elementary oscillatory movement. Along a third

dimension we specify the articulator performing the movement. A given point in this

3-D space represents a Gestalt motor score.

Further suppose that a given child consistently uses forms sounding like [��,

�� and ��. In the articulatory space these forms are represented by three

points whose coordinates specify the movement parameters: e.g., three amplitude

values for the open-close movement of the jaw, two positions (front and back) for the


24

rest/target alternation of the tongue etc. In standard notation (but without implying

any segmental organization), the jaw-tongue parameters form the following matrix:

tongue positions

_�_�

jaw openings _�_�

_�_�

These specifications are each linked to its own type of anatomically distinct

oscillatory closure movement: d_d_, m_m_, and b_b_.

The nepotism principle (NEP) literally states that a re-combination of all these

hidden “component” movements is favored by the memory constraint. If NEP were

consistently and mechanically implemented, it would yield the following additional

potential re-use patterns for jaw-tongue movement:

tongue positions

_�_�

jaw openings _�_�

_�_�

Moreover, it would put a number of forms in a state of ‘readiness’, e.g., [��,

[��, [��, [��, [��, ��, ��, ��, ��, ��,

��, ��, ��, ��, ��. Again no segmental organization is implied.

How does this re-use come about? How are the “component movements” identified?

The quotation marks around “component” are important, since so far we have little

reason to treat phonetic forms as anything but Gestalts.


25

As a first step, we note that the vocal tract consists of several independently

controllable structures. In other words, although early vocalizations do not arise from

phoneme-like control signals, the system producing them is in fact anatomically

‘segmented’.

Second, we observe that, in many cases, neural representations are somatotopically

organized (Kandel & Schwartz 1991) which means that the brain stores individual

motor and sensory activities in specific locations with anatomical identity preserved

(cf notion of homunculus). Both of these circumstances play a crucial role in the

proposed self-segmentation process.

Faced with the task of producing ambient forms not yet acquired, the child must

solve the problem of assembling new motor programs. NEP predicts that the speed

and accuracy of imitation, spontaneous use and recall will depend significantly on

whether or not the new form shares “component” movements with old forms.

Assembling a new motor score is assisted by overlap with previously encoded

patterns even if those patterns are part of unanalyzed wholes and have not yet been

‘defined’ as separate motor entities. For developmental and typological evidence

supporting this suggestion see review in Lindblom (1998).

We propose that in part the NEP bias makes the child engage in spontaneous

articulatory re-use, in part the native language favors forms that match the output of

NEP. Learners can thus use NEP to find ‘hidden’ structure. Behavioral conditions

make certain patterns more functional than others. Languages are molded by those

functional constraints. They adapt to them incorporating fossils of naturalness in their

architecture and by so doing they become more learnable and easier to use.


26

7. Summary

How do children find the ‘hidden’ structure of speech? This question presupposes

that ‘structure’ is something disembodied. In other words, it is seen as embedded in an

incomplete, degraded, noisy and infinitely variable signal. That is the traditional, but,

in our view, not necessarily correct view. Instead the following approach is

advocated.

Phonetic variations are far from random. They are patterned in principled ways

because of perceptual distinctiveness, articulatory dynamics and VT acoustics (Fant

1960, Stevens 1998). A cumulatively growing, exemplar-based phonetic memory

should go a long way towards revealing that patterning to the child. In such a model

‘categories’ do not resemble the neat, operationally defined units of classical

phonemic analysis, since their correlates are likely to be strongly contextually

embedded, in a sense ‘hidden’. However, over time, variability would get sorted and

disambiguated by context and by the cues providing semantic and situational labeling.

‘Mapping simple, representation complex!’

One source of information for perceptual labeling is articulatory. Research on non-

speech offers the phonetician valuable clues as to how motor processes operate. The

role of metabolic cost in solving the DOF problem is a case in point. We have made

the parsimonious assumption that speech movements are organized like other

movements. Therefore energetics should be relevant. From that conclusion we were

led to propose a two-part hypothesis: Easy-way-sounds-OK! It says (1) that children

initially explore their vocal resources in an energetically low-cost mode and (2) that

sound patterns have adapted to reward that behavior. This is a kind ‘conspiracy’ that

makes children stumble on motorically motivated phenomena in the ambient language

such as syllabic organization. It also establishes motor links to perceptual forms

(together with imitation).


27

A related scenario was sketched for the development of the phonemically coded

lexicon. We suggested that a linguistic system with featural and phonemic

recombination humors learners whose memories charge a metabolic fee for storage. If

that fee increases with the number of bits (amount of information) to be stored, it

follows that patterns that do not share materials (Gestalts) are costly, whereas patterns

with overlap are cheaper. Somatotopic organization and VT anatomy were found to

impose an unsupervised segmentation of this overlap into articulator-specific

parameters. This is the process that leads the child to the ‘phonetic gesture’ (Studdert-

Kennedy this volume, Carré this volume). Metabolically controlled re-use is thus

launched and paves the way for cognitively driven and combinatorial vocabulary

growth. These considerations favor the view that phonemic coding is an adaptive

emergent rather than a formal idiosyncracy of our genetic endowment for Language.

Emergent phonology is proposed to promote a new vision of the relationship

between phonetics and phonology. By substituting it for the traditional division of

labor, we would get away from Chomsky’s ‘inescapable dogma’.

The distinctions between form/substance and competence/performance should be

abandoned having served their historical purpose. There is no split between phonetics

and phonology because, from the developmental point of view, phonology remains

behavior. Phonology differs qualitatively from phonetics in that it represents a new,

more complex and higher level of organization of that behavior. For the child,

phonology is not abstract. Its foundation is an emergent patterning of phonetic

content. The starting point is the behavior. ‘Structure’ unfolds from it. Therefore the

issue of ‘psychological reality’ does not arise. Similarly, explanations need not be

limited to post-hoc experimental justifications for postulated formal phenomena but


28

are integrated into the theory’s predictions. Behavioral realism and explanatory

adequacy are given free reins.


29

8. ReferencesAnderson S R (1981): "Why phonology isn't "natural'", Linguistic Inquiry 12:493-

539.

Anderson S R (1985): Phonology in the twentieth century, Chicago:Chicago

University Press.

Anderson F C & Pandy M G (1999): “A dynamic optimization solution for vertical

jumping in three dimensions”, Computer Methods in Biomechanics and

Biomedical Engineering, 1-31.

Carré R (2000): "", Phonetica, this volume.

Chomsky N (1964): "Current trends in linguistic theory" 50-118 in Fodor J A & Katz

J J (eds): The structure of language, New York:Prentice-Hall.

Chomsky N & Halle M (1968): The sound pattern of English, New

York:Harper&Row.

Davis B L & Lindblom B (1994). “Prototype formation in speech development and

phonetic variability in Baby Talk”, in Lacerda F, von Hofsten C & Heiman M

(in press): Emerging Cognitive Abilities in Early Infancy, LEA:Hillsdale, NJ.

Diehl R L and Lindblom B (in press): “Explaining the structure of feature and

phoneme inventories”, chapter to appear in Greenberg S and Ainsworth W

(eds): Speech Processing in the Auditory System, Springer Handbook of

Auditory Research (SHAR).

Estes W K (1993): “Concepts, categories, and psychological science”, Psychological

Science 4: 143-153.

Fernald A (1984): “The perceptual and affective salience of mothers' speech to

infants”, 5-29 in Feagans L, Garvey C & Golinkoff R (eds): The origins and

growth of communication, New Brunswick:Ablex.

Fant G (1960): The acoustic theory of speech production, The Hague:Mouton.


30

Fischer-Jørgensen E (1975): Trends in phonological theory: A historical introduction,

Copenhagen:Akademisk forlag.

Fónagy I (1983): La Vive Voix, Paris:Payot.

Fowler C A (1986): "An event approach to the study of speech perception from a

direct-realist perspective", Journal of Phonetics 14(1): 3-28.

Fowler C A (1994): "Speech perception: Direct realist theory", 4199-4203 in Asher R

E (ed): Encyclopedia of Language and Linguistics, Pergamon:New York.

Fromkin V (1973): Speech errors as linguistic evidence, The Hague:Mouton.

Gay T, Lindblom B & Lubker J (1981): "Production of bite-block vowels: Acoustic

equivalence by selective compensation", J Acoust Soc Am 69:802-810.

Gonzales-Lima F (1992): “Brain imaging of auditory learning functions in rats:

Studies with fluorodeoxyglucose autoradiography and cytochrome oxidase

histochemistry ”, 39-109 in Gonzales-Lima F, Finkenstädt T & Sheich H

(eds): Advances in metabolic mapping techniques for brain imaging of

behavioral and learning functions, NATO ASI Series D:68, Dordrecht:

Kluwer.

Halle M (1964): “On the bases of phonology”, 604-612 in Fodor J A & Katz J J (eds):

The structure of language, New York:Prentice-Hall.

Halle M & Stevens K N (1979): "Some reflections on the theoretical bases of

phonetics", 335-353 in Lindblom B & Öhman S (eds): Frontiers of speech

communication research, London:Academic Press.

Halle M & Stevens K N (1991): "Knowledge of language and the sounds of speech",

1-19 in Sundberg J, Nord L & Carlson R (eds): Music, language, speech and

brain, Houndmills, Basingstoke, England:MacMillan.


31

Hinton G E & Sejnowski T J (1999): Unsupervised learning: Fondations of neural

computation, MIT Press:Cambridge, MA. Wachsmuth, Oram & Perrett 1994,

Logothetis & Sheinberg 1996

Hoyt D F & Taylor C R (1981): “Gait and the energetics of locomotion in horses”,

Nature 292: 239.

Jakobson R & Halle M (1968): "Phonology in relation to phonetics", 411-449 in

Malmberg B (ed): Manual of phonetics, Amsterdam:North-Holland.

Jakobson R, Fant G & Halle M (1952/69): Preliminaries to speech analysis,

Cambridge, MA: MIT Press.

Johnson K & Mullenix J (1997): “Complex representations used in speech processing:

Overview of the book”, 1-8 in Johnson K & Mullenix J (eds): Talker

variability in speech processing, Academic Press.

Johnson K (2000): “Adaptive dispersion in vowel perception”, Phonetica, this

volume.

Kandel E, Schwartz J & Jessel T (1991): Principles of neural science (3rd edition),

New York:Elsevier.

Kluender K R, Diehl R & Killeen P (1987): "Japanese quail can learn phonetic

categories", Science 237: 1195-1197.

Kohler K (1990): "Segmental reduction in connected speech in German: Phonological

facts and phonetic explanations", 69-92 in Hardcastle W J & Marchal A (eds):

Speech production and speech modeling, Dordrecht:Kluwer.

Kohler K (2000): "Investigating unscripted speech: Implications for phonetics and

phonology", Phonetica, this volume.

Kuhl P K, Andruski J E, Chistovich I A, Chistovich L A, Koshevnikova E V, Ryskina

V L, Stolyarova E I, Sundberg U & Lacerda F (1997): “Cross-language


32

analysis of phonetic units in language addressed to infants”, Science 277: 684-

686.

Lacerda L 1995. ‘The perceptual magnet effect: An emergent consequence of

exemplar-based phonetic memory’, Proceedings ICPhS Stockholm, 140-147,

vol 2.

Ladefoged P (1984): "’Out of chaos comes order’: Physical, biological and structural

patterns in phonetics", Van den Broecke M P R & Cohen A (eds):

Proceedings of the Xth International Congress of Phonetic Sciences, Vol IIB,

83-95.

Ladefoged P (1990): “Some reflections on the IPA”, Journal of Phonetics 18:335-

346.

Lee D D & Seung H S (1999): “Learning the parts of objects by non-negative matrix

factorization”, Nature 401, 788-791.

Liberman A & Mattingly I (1985): "The motor theory of speech perception revised,"

Cognition 21: 1-36.

Lindblom B (1980): "The goals of phonetics, its unification and application",

Phonetica 37:7-26.

Lindblom B (1983): “Economy of speech gestures”, in P F MacNeilage (ed): Speech

Production, 217-246, Springer Verlag:New York.

Lindblom B (1990): "Explaining phonetic variation: A sketch of the H&H theory",

403-439 in Hardcastle W & Marchal A (eds): Speech Production and Speech

Modeling, Dordrecht:Kluwer.

Lindblom B (1996): “Role of articulation in speech perception: Clues from

production”, J Acoust Soc Am 99(3):1683-1692.


33

Lindblom B (1998): “Systemic constraints and adaptive change in the formation of

sound structure”, 242-264 in Hurford J, Studdert-Kennedy M & R Knight C

(eds): Approaches to the evolution of language, Cambridge:CUP.

Lindblom B & Sundberg J (1971): "Acoustical consequences of lip, tongue, jaw and

larynx movement", J Acoust Soc Am 50:1166-1179.

Lindblom B, Lubker J & Gay T (1979): "Formant frequencies of some fixed-mandible

vowels and a model of speech programming by predictive simulation", J

Phonetics 7: 147-162.

Lindblom B & Lubker J (1985): “The speech homunculus and a problem of phonetic

linguistics”, 169-192 in Fromkin V A (ed): Phonetic Linguistics, Academic

Press: London.

Lindblom B, Guion S, Hura S, Moon S-J & Willerman R (1995): "Is sound change

adaptive?", Revista di Linguistica 7.1, 5-37.

Lindblom B, Davis J, Brownlee S, Moon S-J & Simpson Z (1999): “Energetics in

phonetics and phonology”, to appear in Fujimura O et al (eds): Linguistics and

Phonetics, Ohio State University.

Logothetis N K & Sheinberg D J (1996): “Visual object recognition”, Annu Rev

Neurosci 19, 577-621.

Maeda S (1991): "On articulatory and acoustic variabilities", J of Phonetics 19:321-

331.

Malmberg B (1968): “The linguistic bases of phonetics”, Manual of phonetics,

Amsterdam:North-Holland.

McNeill Alexander R (1992): The Human Machine, New York:Columbia University

Press.


34

MacNeilage P F (1998): “The frame/content theory of evolution of speech

production”, Behavioral and Brain Sciences 21, 499-546.

MacNeilage P F & Davis B L (2000): “Deriving speech from non-speech: A view

from ontogeny”, Phonetica, this volume.

McArdle W D, Katch F I & Katch V L (1996): Exercise physiology, 4th ed,

Baltimore:Williams&Wilkins.

Mel B W (1999): “Think positive to find parts”, Nature 401, 759-760.

Miller G A (1977): Spontaneous apprentices, Seabury Press: New York.

Ohala J J (1990): “The phonetics and phonology of aspects of assimilation”, 258-275

in Kingston J & Beckman M (eds): Papers in laboratory phonology: Vol 1.

Between grammar and the physics of speech, Cambridge:CUP.

Pandy M (2000): "", Phonetica, this volume.

Perkell J & Klatt D (1986). Invariance and variability of speech processes,

LEA:Hillsdale, NJ.

Ralston H J (1976): “Energetics of human walking”, 77-98 in Herman R M, Grillner

S, Stein P S G & Stuart D G (eds): Neural control of locomotion, Plenum

Press:New York.

de Saussure F (1916): Cours de linguistique générale, Paris:Payot.

Stevens K N (1998): Acoustic phonetics, Cambridge:M.I.T. Press.

Studdert-Kennedy M (1998): “Evolutionary implications of the particulate principle:

Imitation and the dissociation of phonetic form from semantic function”, in

Knight C, Studdert-Kennedy M & Hurford J R (eds): The emergence of

language: Social function and the origins of linguistic form, Cambridge:CUP.

Studdert-Kennedy M (2000): "Imitation and the emergence of segments", Phonetica,

this volume.


35

Sundberg J (2000): "Emotive transforms", Phonetica, this volume.

Sundberg U (1998): Mother tongue – Phonetic aspects of infant-directed speech, Ph

D dissertation, Stockholm University.

Sweet H (1877): Handbook of phonetics, Oxford:Henry Frowde.

Young S J & Woodland P C (1993): “The use of tying in continuous speech

recognition”, Proc Eurospeech 93, 2203-2206.

Wachsmuth E, Oram M W & Perrett D J (1994): “Recognition of objects and their

components: responses of single units in the temporal cortex of the macaque”,

Cereb Cortex 4, 509-522.

Wong-Riley M T T (1989): ” Cytochrome oxidase: An endogenous metabolic marker

for neuronal activity”, Trends Neurosci 12(3):94-101.

8. Acknowledgements

This research is supported by grant number BCS-9901021 from the National

Science Foundation, Washington D.C..