A model of word and sentence intonation...ate the appropriate word stress pitch pattern, and the box...

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

A model of word andsentence intonation

Ohman, S.

journal: STL-QPSRvolume: 9number: 2-3year: 1968pages: 006-011

http://www.speech.kth.se/qpsr

http://www.speech.kth.se

http://www.speech.kth.se/qpsr

B. A MODEL O F WORD AND SENTENCE INTONATION * S.E.G. ahman

Many investigators have suggested that the fundamental frequency

contours of utterances can be divided into two components, namely,

word intonation and sentence intonation, approximately a s shown in

Fig. I-B-1.

The rightmost box i s supposed to generate the fundamental fre-

quency contour, fo(t), a s a result of three major factors, namely, the

vocal cord tension (here denoted by g(t)), articulatory interactions

(indicated above the box), and acoustic interactions (indicated below

the box).

The articulatory interactions result f rom the secondary effects

on the vibrations of the vocal cords which a r e associated with the pro-

duction of high vowels, voiceless consonants, and glottal stops. The

acoustic interactions result from the secondary fluctuations in the

supra- and subglottal pressures which a r e due to the varying degrees I

of closure of the mouth. The vocal cord tension, g(t), i s supposed to

be the sum of two signals, namely, a sentence intonation component,

gs(t), and a word intonation component, gw(t).

In languages that have word s t ress , such a s English and German,

the box labelled "Word intonation filter", may be assumed to gener-

ate the appropriate word s t r e s s pitch pattern, and the box labelled

"Sentence intonation filter" to generate the slow phrase contour on

which the s t r e s s fluctuations a r e superimposed. In what follows,

however, I shall assume that the word tones of the Scandinavian

languages a r e controlled by the lower box in Fig. I-B-1 and that both

the basic phrase contours and the superimposed s t r e s s inflections

a r e produced by the upper box. In a more general model i t i s really

necessary to have three inputs, one for the basic phrase contour, one

for the s t r e s s inflection, and one for the word tones, but since my

purpose here is to discuss the Scandinavian word tones only, I shall

make the simplification just mentioned.

* Verbatim version of paper presented a t the 6th ICA, Tokyo 1968.

FUNCTIONAL MODEL OF LARYNX CONTROL

SENTENCE lNT ONATION INPUTS

WORD INTONATION INPUTS

SENTENCE n ARTICULATORY INTERACTION

SIONAL

INTO NATION '-1 FILTER

LARYNX

MODEL

~")ROIQJt~I INTONATION IT'

Fig. I-B-1.

ACOUSTIC INTERACTION

SIGNAL

STL-QPSR 2-3/1968 7.

C~ualitative analysis of a la rge number of pitch patterns of r ea l

utterances suggests that the fundamental frequency signal i s the r e -

sponse of a relatively sluggish sys tem to a sequence of relatively

simple level changing commands a s indicated i n the left-most par t of

Fig. I -B - I . The neural signals that reach the laryngeal muscles a r e

not entirely of this simple nature but the s tep function model has con-

siderable analytical advantages and seems to be quite sat isfactory

f o r the purposes of a functional description.

It may i n fact be assumed that each of the control branches of our

model has the simple character is t ics summarized in Fig. I-B-2.

That is, the command generators smooth the s tep function inputs

just like a crit ically damped third-order l inear f i l ter and moreover,

the analogous vocdl- cord tension i s exponentially related to the fun-

damental frequency output. Unfortunately, I don' t have t ime now to

go into the experimental data on which these part icular assumptions

a r e based.

The model developed s o f a r can be z:lticizec! .sn the ;,-:-,:nii -:-:a; ~mast

functions of relevance to speech production can be synthesized by

means of smoothed s tep function sequences. It is clear , moreover,

that only a smal l number of s teps a r e necessary i n o rde r to repro-

duce the Scandinavian accents. This is i l lustrated in Fig. I-B-3.

Here we s e e the pitch contour of the utterance [seja manner1 Ijcn]

a s produced by a speaker of the Stockholm dialect. The word [mannen]

has the grave tone o r accent. With the acute accent i t would sound

[ m ~ n n e n ] . No Scandinavian dialect has m o r e than two tones.

The filled c i rc les represent the period-by-period pitch measure-

ments and the solid line is the smoothed response of the model to the

s tep function input shown at the bottom. The vert ical l ines indicate

acoustic segment boundaries.

At the beginning, the curve shows a smal l inflection which prob-

ably is due to the unstressed and hence reduced grave accent of the

f r ame word [scja] . Then follows the falling- rising pitch pattern

typical of the unreduced grave accent. The falling end contour of

the phrase is visible i n the rightmost par t of the graph.

NEURO-MOTOR COMMAND

FUNCTIONAL MODEL OF LARYNGEAL CONTROL IN INTONATION

VOCAL CORD "TENSION"

FUNDAMENTAL FREQUENCY

Fig. 1-B-2. Detailed rpecification of one of the channel8 ohown in Fig. I-B-l .

STL-CPSR 2-3/1968

Although, the synthesized curve matches the data well and although

only a few commands a r e used for the synthesis, the s tep function in-

put i s not particularly revealing f rom the phonetic point of view. All

we learn, essentially, is that i t is all-aight to se t the f i l ter constants

i n such a way that the fo output reaches 90 % of the ta rge t pitch level

in about 250 msec in response to a single s tep input. This can be

seen more clear ly in Fig. I-B-4 which shows an utterance having the , I

other Stockholm tonal accent, the acute one. It sounds like [ s ~ j a I

m&nnen I j ~ n ] . I

I

The s tep function input i s much s impler he re than in the previous 1 case, but on the other hand, the model response does not reproduce

during the [m] which is very systematic and cannot be explained by

I the data equally well. Note, in particular, the sma l l pitch deflection ,

articulatory o r acoustic influence f rom the consonant. It is of course

possible to make up for this mismatch by adding a few more s teps a t

the input, but this would only reduce the possibility of interpreting

the input pattern i n phonetic t e rms . Evidently, the present model

allows us too much freedom and we need some more general prin- I I

ciple by which the l ibrary of admissible inputs can be defined and

constrained s o that ad hoc introductions of s tep commands can be

avoided. The a i m of the work reported on i n this paper was to look I

for constraining c r i t e r i a of this s o r t among the Scandinavian accent i patterns .

I

Fig. I-B-5 gives a summary of the acute and grave pitch contours I of one hundred Scandinavian dialects a s measured by the German I

phonetician Edvard Meyer severa l decades ago. Each subgraph

presents f r o m left to right: the name of the dialect, the acute pat-

tern, and the grave pattern. As you can see, very many varieties

exist. In Danish, for instance, the acute accent is a glottal stop in

the middle of the vowel, and on the island of Gotland, a s well a s

in severa l C a h ~ ria dialects, the two accents - though completely

distinct t o the native speakers - a r e closely s imilar . They may

sound like: acute accent, [ s ~ j a mgnnen ~ j ~ n ] versus grave [sEja

mBnnen Ij ~ n ] . Now, to get on with the analysis, the following a s sumption i s

introduced. F o r every s t ressed word a single s tep of positive am-

plitude is entered a t the input of the sentence intonation fi l ter. This

FUNDAMENTAL FREQUENCY ( C P S ) (CURVES A & B )

2

I I 2 2

0 ul 0 0

(CURVE C )

Fig. I-B-5. Schematic acute and grave accent patterns of a hundred Scandinavian dialects according to E. A. Meyer: Die Intonation i m Schwediechen, part 11.

STL-QPSR 2-3/1968 9.

step is made to s ta r t at the beginning of the f irst consonant of the

stressed syllable (see Fig. I-B-6).

The consequences of this assumption a r e seen in the left part of

Fig. I-B-6. The wriggled curves a r e measured data and the smooth

curves a r e calculated model responses. The upper display rcpre-

sents the acute word [m6:nen] embedded in a sentence frame, l d ~ va

m6:nen ja sa] and the lower display corresponds to the grave word

[mb:nen] in C ~ E va mb:nen ja sa]. For technical reasons the pitch

curve goes to zero during the voiceless consonant [ s ) where, in fact,

it i s undefined.

A positive going sentence intonation step representing the s t ress

according to our previous assumption is introduced at the beginning

of the [m] and a negative step representing the end contour i s intro-

duced later in the sentence, both in the acute and in the grave cases.

The curves marked with the let ter E show the difference between the

calculated and the measured contours.

Note that the e r r o r curves have a negative dip both in the acute

and in the grave cases, If our assumption that s t ress i s a positive

sentence intonation step i s correct, then these dips must be the word

intonation components of the two patterns, since they represent the

residue after elimination of the sentence intonation component.

The right part of the figure shows the result of introducing an

appropriately shaped negative pulse at the input of the word intona-

tion filter. Almost perfect matches a r e obtained i f this pulse i s

made to occur early in the acute case and late in the grave case.

From the point of view of the present descriptive model, these nega-

tive pulses a r e the Stockholm tonal accents. - The proposal shown in Fig. I-B-7 therefore suggests itself. The

Stockholm tonal accents can be synthesized with a sentence intonation

step and a word intonation pulse which a r e coarticulated in the appro-

priate manner. The parameters of this model are: the amplitude of

the step, marked A; the depth of the pulse, marked B; the duration

of the pulse, marked D; and the timing of the pulse, marked t2.

Furthermore, the possibly different time constants of the sentence

intonation filter and the word intonation filter, to be denoted by cu and

B , respectively, a r e also of relevance.

ACUTE ACCENT: STOCKHOLM

Hz I d r v a l m 1 6: I n ~ n l ja s a: I l d c v a l r n l 6: l n r n l j a s a: I Hz 150r 1150

GRAVE ACCENT: STOCKHOLM

I I I I 0 .5 1.0 sec

r'ig. I-B-6. Comparison of Stockholm accent patterns with curves calculated by rileans of intonation model. The pulses marked I, IS, and IW represent model outputs with the same input commands that were used to match the e.impirica1 data but with the model constants cr and B both set to 1000.

h ^ ,. I . I

INPUT COMMANDS FOR SWEDISH ACCENTS

SENTENCE INTONATION STEP -- time

Fig. I-B-7.

t

WORD INTONATION PULSE

A

B

7

STL-QPSR 2-3/1968 10.

Some of the possibilities of this model a r e shown in Fig. 1-13-3.

In the curve family marked A the depth of the word accent pulse is

ze ro and the model responses for varying s t r e s s s tep amplitudes a r e

shown. The curve family marked cu is s imi lar except that here the

amplitude is fixed and the f i l ter t ime constant is changed.

In the curve families marked B, B, and D, the s t r e s s s tep am-

plitude is ze ro and the depth, t ime constant, and duration of the

word accent pulses a r e systematically varied.

Finally, in the curve family marked t a typical s t r e s s s tep re- 2

sponse has been combined with a typical accent pulse response for

various timings of the la t te r pulse.

I have pointed to the importance of the relative timing of the ac-

cent commands for the grave/acute tonal contrast in the Stockholm

dialect (cf., t i n Fig. I-B-8). A s imi lar situation seems to obtain 2

in a l l Scandinavian dialects. This is indicated in Fig. I-B-9.

This figure shows a rearrangement of Meyer' s da ta . Again,

each subgraph shows f rom left to right: the name of the dialect, the

acute pattern, and the grave pattern. It will be seen here that, a s

we go f rom dialect to dialect, the acoustic reflexes of the acute and

the grave accent pulses a r e gradually shifted, a s a pa i r i t seems,

either to the right o r to the left depending on which direction we fol-

low in the orbit.

Starting in Dalarna, for instance, the grave accent f i r s t shows

up a s a l i t t le groove i n the onset ramp of the s t r e s s pattern and this

groove penetrates toward the right a s we rnove upward toward Stock-

holm whereafter it moves toward the end of the word until it finally

drowns - in the Baltic sea i t would seem . . . On the other hand, starting once more in Dalarna, the acute

accent first shows up a s a groove on the ta i l of the pitch pattern and

this groove penetrates gradually toward the left i n the word a s we

move toward Denmark where i t shows up a s a glottal stop, and then

i t continues toward the beginning of the word until finally it loses

itself among the wolves of Lapland.

ACCENT MODEL: EFFECTS OF SIX PARAMETERS

Fig. I-B-8. Model responses to input commands of Fig. I-B-7.

Explanation i n text.

STL-CPSR 2-3/1968 11.

Every Scandinavian dialect is close to one o r another of these

patterns. This is an interesting fact which, although not completely

understood as yet, appears to support a t leas t the g r o s s outlines of

the model summarized here.

Acknowledgments

I should like to acknowledge the valuable cooperation of my col-

league, Johan Liljencrants, i n the writing of some of the computer

programs required for this work.

References

E. A. Meyer: Die Intonation i m Schwedischen, Teil I (Stockholm 1937) .

B. htalmberg: Sydsvensk ordaccent (Lund 1953).

A model of word and sentence intonation...ate the appropriate word stress pitch pattern, and the box...

Documents

Transcript of A model of word and sentence intonation...ate the appropriate word stress pitch pattern, and the box...