A model of word and sentence intonation...ate the appropriate word stress pitch pattern, and the box...
Transcript of A model of word and sentence intonation...ate the appropriate word stress pitch pattern, and the box...
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
A model of word andsentence intonation
Ohman, S.
journal: STL-QPSRvolume: 9number: 2-3year: 1968pages: 006-011
http://www.speech.kth.se/qpsr
B. A MODEL O F WORD AND SENTENCE INTONATION * S.E.G. ahman
Many investigators have suggested that the fundamental frequency
contours of utterances can be divided into two components, namely,
word intonation and sentence intonation, approximately a s shown in
Fig. I-B-1.
The rightmost box i s supposed to generate the fundamental fre-
quency contour, fo(t), a s a result of three major factors, namely, the
vocal cord tension (here denoted by g(t)), articulatory interactions
(indicated above the box), and acoustic interactions (indicated below
the box).
The articulatory interactions result f rom the secondary effects
on the vibrations of the vocal cords which a r e associated with the pro-
duction of high vowels, voiceless consonants, and glottal stops. The
acoustic interactions result from the secondary fluctuations in the
supra- and subglottal pressures which a r e due to the varying degrees I
of closure of the mouth. The vocal cord tension, g(t), i s supposed to
be the sum of two signals, namely, a sentence intonation component,
gs(t), and a word intonation component, gw(t).
In languages that have word s t ress , such a s English and German,
the box labelled "Word intonation filter", may be assumed to gener-
ate the appropriate word s t r e s s pitch pattern, and the box labelled
"Sentence intonation filter" to generate the slow phrase contour on
which the s t r e s s fluctuations a r e superimposed. In what follows,
however, I shall assume that the word tones of the Scandinavian
languages a r e controlled by the lower box in Fig. I-B-1 and that both
the basic phrase contours and the superimposed s t r e s s inflections
a r e produced by the upper box. In a more general model i t i s really
necessary to have three inputs, one for the basic phrase contour, one
for the s t r e s s inflection, and one for the word tones, but since my
purpose here is to discuss the Scandinavian word tones only, I shall
make the simplification just mentioned.
* Verbatim version of paper presented a t the 6th ICA, Tokyo 1968.
FUNCTIONAL MODEL OF LARYNX CONTROL
SENTENCE lNT ONATION INPUTS
WORD INTONATION INPUTS
SENTENCE n ARTICULATORY INTERACTION
SIONAL
INTO NATION '-1 FILTER
LARYNX
MODEL
~")ROIQJt~I INTONATION IT'
Fig. I-B-1.
ACOUSTIC INTERACTION
SIGNAL
STL-QPSR 2-3/1968 7.
C~ualitative analysis of a la rge number of pitch patterns of r ea l
utterances suggests that the fundamental frequency signal i s the r e -
sponse of a relatively sluggish sys tem to a sequence of relatively
simple level changing commands a s indicated i n the left-most par t of
Fig. I -B - I . The neural signals that reach the laryngeal muscles a r e
not entirely of this simple nature but the s tep function model has con-
siderable analytical advantages and seems to be quite sat isfactory
f o r the purposes of a functional description.
It may i n fact be assumed that each of the control branches of our
model has the simple character is t ics summarized in Fig. I-B-2.
That is, the command generators smooth the s tep function inputs
just like a crit ically damped third-order l inear f i l ter and moreover,
the analogous vocdl- cord tension i s exponentially related to the fun-
damental frequency output. Unfortunately, I don' t have t ime now to
go into the experimental data on which these part icular assumptions
a r e based.
The model developed s o f a r can be z:lticizec! .sn the ;,-:-,:nii -:-:a; ~mast
functions of relevance to speech production can be synthesized by
means of smoothed s tep function sequences. It is clear , moreover,
that only a smal l number of s teps a r e necessary i n o rde r to repro-
duce the Scandinavian accents. This is i l lustrated in Fig. I-B-3.
Here we s e e the pitch contour of the utterance [seja manner1 Ijcn]
a s produced by a speaker of the Stockholm dialect. The word [mannen]
has the grave tone o r accent. With the acute accent i t would sound
[ m ~ n n e n ] . No Scandinavian dialect has m o r e than two tones.
The filled c i rc les represent the period-by-period pitch measure-
ments and the solid line is the smoothed response of the model to the
s tep function input shown at the bottom. The vert ical l ines indicate
acoustic segment boundaries.
At the beginning, the curve shows a smal l inflection which prob-
ably is due to the unstressed and hence reduced grave accent of the
f r ame word [scja] . Then follows the falling- rising pitch pattern
typical of the unreduced grave accent. The falling end contour of
the phrase is visible i n the rightmost par t of the graph.
NEURO-MOTOR COMMAND
FUNCTIONAL MODEL OF LARYNGEAL CONTROL IN INTONATION
VOCAL CORD "TENSION"
FUNDAMENTAL FREQUENCY
Fig. 1-B-2. Detailed rpecification of one of the channel8 ohown in Fig. I-B-l .
STL-CPSR 2-3/1968
Although, the synthesized curve matches the data well and although
only a few commands a r e used for the synthesis, the s tep function in-
put i s not particularly revealing f rom the phonetic point of view. All
we learn, essentially, is that i t is all-aight to se t the f i l ter constants
i n such a way that the fo output reaches 90 % of the ta rge t pitch level
in about 250 msec in response to a single s tep input. This can be
seen more clear ly in Fig. I-B-4 which shows an utterance having the , I
other Stockholm tonal accent, the acute one. It sounds like [ s ~ j a I
m&nnen I j ~ n ] . I
I
The s tep function input i s much s impler he re than in the previous 1 case, but on the other hand, the model response does not reproduce
during the [m] which is very systematic and cannot be explained by
I the data equally well. Note, in particular, the sma l l pitch deflection ,
articulatory o r acoustic influence f rom the consonant. It is of course
possible to make up for this mismatch by adding a few more s teps a t
the input, but this would only reduce the possibility of interpreting
the input pattern i n phonetic t e rms . Evidently, the present model
allows us too much freedom and we need some more general prin- I I
ciple by which the l ibrary of admissible inputs can be defined and
constrained s o that ad hoc introductions of s tep commands can be
avoided. The a i m of the work reported on i n this paper was to look I
for constraining c r i t e r i a of this s o r t among the Scandinavian accent i patterns .
I
Fig. I-B-5 gives a summary of the acute and grave pitch contours I of one hundred Scandinavian dialects a s measured by the German I
phonetician Edvard Meyer severa l decades ago. Each subgraph
presents f r o m left to right: the name of the dialect, the acute pat-
tern, and the grave pattern. As you can see, very many varieties
exist. In Danish, for instance, the acute accent is a glottal stop in
the middle of the vowel, and on the island of Gotland, a s well a s
in severa l C a h ~ ria dialects, the two accents - though completely
distinct t o the native speakers - a r e closely s imilar . They may
sound like: acute accent, [ s ~ j a mgnnen ~ j ~ n ] versus grave [sEja
mBnnen Ij ~ n ] . Now, to get on with the analysis, the following a s sumption i s
introduced. F o r every s t ressed word a single s tep of positive am-
plitude is entered a t the input of the sentence intonation fi l ter. This
FUNDAMENTAL FREQUENCY ( C P S ) (CURVES A & B )
2
I I 2 2
0 ul 0 0
(CURVE C )
Fig. I-B-5. Schematic acute and grave accent patterns of a hundred Scandinavian dia- lects according to E. A. Meyer: Die Intonation i m Schwediechen, part 11.
STL-QPSR 2-3/1968 9.
step is made to s ta r t at the beginning of the f irst consonant of the
stressed syllable (see Fig. I-B-6).
The consequences of this assumption a r e seen in the left part of
Fig. I-B-6. The wriggled curves a r e measured data and the smooth
curves a r e calculated model responses. The upper display rcpre-
sents the acute word [m6:nen] embedded in a sentence frame, l d ~ va
m6:nen ja sa] and the lower display corresponds to the grave word
[mb:nen] in C ~ E va mb:nen ja sa]. For technical reasons the pitch
curve goes to zero during the voiceless consonant [ s ) where, in fact,
it i s undefined.
A positive going sentence intonation step representing the s t ress
according to our previous assumption is introduced at the beginning
of the [m] and a negative step representing the end contour i s intro-
duced later in the sentence, both in the acute and in the grave cases.
The curves marked with the let ter E show the difference between the
calculated and the measured contours.
Note that the e r r o r curves have a negative dip both in the acute
and in the grave cases, If our assumption that s t ress i s a positive
sentence intonation step i s correct, then these dips must be the word
intonation components of the two patterns, since they represent the
residue after elimination of the sentence intonation component.
The right part of the figure shows the result of introducing an
appropriately shaped negative pulse at the input of the word intona-
tion filter. Almost perfect matches a r e obtained i f this pulse i s
made to occur early in the acute case and late in the grave case.
From the point of view of the present descriptive model, these nega-
tive pulses a r e the Stockholm tonal accents. - The proposal shown in Fig. I-B-7 therefore suggests itself. The
Stockholm tonal accents can be synthesized with a sentence intonation
step and a word intonation pulse which a r e coarticulated in the appro-
priate manner. The parameters of this model are: the amplitude of
the step, marked A; the depth of the pulse, marked B; the duration
of the pulse, marked D; and the timing of the pulse, marked t2.
Furthermore, the possibly different time constants of the sentence
intonation filter and the word intonation filter, to be denoted by cu and
B , respectively, a r e also of relevance.
ACUTE ACCENT: STOCKHOLM
Hz I d r v a l m 1 6: I n ~ n l ja s a: I l d c v a l r n l 6: l n r n l j a s a: I Hz 150r 1150
GRAVE ACCENT: STOCKHOLM
I I I I 0 .5 1.0 sec
r'ig. I-B-6. Comparison of Stockholm accent patterns with curves calculated by rileans of intonation model. The pulses marked I, IS, and IW represent model outputs with the same input commands that were used to match the e.impirica1 data but with the model constants cr and B both set to 1000.
h ^ ,. I . I
INPUT COMMANDS FOR SWEDISH ACCENTS
SENTENCE INTONATION STEP -- time
Fig. I-B-7.
t
WORD INTONATION PULSE
A
B
7
STL-QPSR 2-3/1968 10.
Some of the possibilities of this model a r e shown in Fig. 1-13-3.
In the curve family marked A the depth of the word accent pulse is
ze ro and the model responses for varying s t r e s s s tep amplitudes a r e
shown. The curve family marked cu is s imi lar except that here the
amplitude is fixed and the f i l ter t ime constant is changed.
In the curve families marked B, B, and D, the s t r e s s s tep am-
plitude is ze ro and the depth, t ime constant, and duration of the
word accent pulses a r e systematically varied.
Finally, in the curve family marked t a typical s t r e s s s tep re- 2
sponse has been combined with a typical accent pulse response for
various timings of the la t te r pulse.
I have pointed to the importance of the relative timing of the ac-
cent commands for the grave/acute tonal contrast in the Stockholm
dialect (cf., t i n Fig. I-B-8). A s imi lar situation seems to obtain 2
in a l l Scandinavian dialects. This is indicated in Fig. I-B-9.
This figure shows a rearrangement of Meyer' s da ta . Again,
each subgraph shows f rom left to right: the name of the dialect, the
acute pattern, and the grave pattern. It will be seen here that, a s
we go f rom dialect to dialect, the acoustic reflexes of the acute and
the grave accent pulses a r e gradually shifted, a s a pa i r i t seems,
either to the right o r to the left depending on which direction we fol-
low in the orbit.
Starting in Dalarna, for instance, the grave accent f i r s t shows
up a s a l i t t le groove i n the onset ramp of the s t r e s s pattern and this
groove penetrates toward the right a s we rnove upward toward Stock-
holm whereafter it moves toward the end of the word until it finally
drowns - in the Baltic sea i t would seem . . . On the other hand, starting once more in Dalarna, the acute
accent first shows up a s a groove on the ta i l of the pitch pattern and
this groove penetrates gradually toward the left i n the word a s we
move toward Denmark where i t shows up a s a glottal stop, and then
i t continues toward the beginning of the word until finally it loses
itself among the wolves of Lapland.
ACCENT MODEL: EFFECTS OF SIX PARAMETERS
Fig. I-B-8. Model responses to input commands of Fig. I-B-7.
Explanation i n text.
STL-CPSR 2-3/1968 11.
Every Scandinavian dialect is close to one o r another of these
patterns. This is an interesting fact which, although not completely
understood as yet, appears to support a t leas t the g r o s s outlines of
the model summarized here.
Acknowledgments
I should like to acknowledge the valuable cooperation of my col-
league, Johan Liljencrants, i n the writing of some of the computer
programs required for this work.
References
E. A. Meyer: Die Intonation i m Schwedischen, Teil I (Stockholm 1937) .
B. htalmberg: Sydsvensk ordaccent (Lund 1953).