Overview Of the human speech system

5
Overview Of the human speech system 1 [1]The main components of the human speech system are: The lungs, tracher(windpipe), larynx, pharyngeal cavity(throat), oral or buccal cavity(mouth), nasal cavity(nose). Nor- mally the pharyngeal and the oral cavity are grouped into one unit called the oral tract. The nasal cavity is normally called the nasal tract. The exact placement of the main organs is shown in figure 1.1. A simplified schematic for the speech system is shown in figure 1.2 on page II Figure 1.1: [1]Schematic view of human speech production. I

Transcript of Overview Of the human speech system

Page 1: Overview Of the human speech system

Overview Of the humanspeech system 1

[1]The main components of the human speech system are: The lungs, tracher(windpipe),larynx, pharyngeal cavity(throat), oral or buccal cavity(mouth), nasal cavity(nose). Nor-mally the pharyngeal and the oral cavity are grouped into one unit called the oral tract. Thenasal cavity is normally called the nasal tract. The exact placement of the main organs isshown in figure 1.1. A simplified schematic for the speech system is shown in figure 1.2 onpage II

Figure 1.1: [1]Schematic view of human speech production.

I

Page 2: Overview Of the human speech system

CHAPTER 1. OVERVIEW OF THE HUMAN SPEECH SYSTEM

Figure 1.2: [1]Block model of human speech production.

1.1 Speech production

As seen in figure 1.2 muscle force are used to press air from the lungs through the larynx(more specific; the epiglottis). The vocal cords the vibrates, and interrupt the air and pro-duce a quasi-periodic pressure wave. The pressure impulse are called pitch impulse. Thefrequency of the pressure signal is the pitch frequency or fundamental frequency. A typicalsound pressure function is shown in figure 1.3. The frequency of the pressured signal is thepart that define the speech melody. If we speak with at constant pitch frequency the speechwould sound monotonous, but normally there is a permanent change if the frequency. Thefrequency of the vocal cord is determined by serval factors: The tension exerted by the mus-cles, it’s mass and it’s length. These factors vary between sexes and according to age.

Figure 1.3: [3]Typical impulse sequence(Sound pressure function).

The pressure impulse are stimulating the air in the oral tract and for certain sounds also thenasal tract. When the cavities resonate, they radiate a sound wave which is the speech sig-nal. Both tracts (Vocal and nasal) act as resonators with characteristic resonance frequencies

II

Page 3: Overview Of the human speech system

1.2. VOICED/UNVOICED SPEECH

called formant frequency. It is possible to change the cavities of the mouth by mowing thejaw, tongue, velum, lips and mouth. Because of this we can pronounce very many differentsounds.

1.2 voiced/unvoiced speech

Speech can be divided into two classes, voiced and unvoiced. The difference between thetwo signals is the use of the vocal cords and vocal tract(mouth and lips). When voicedsounds are pronounced you use the vocal cords and the vocal tract. Because of the vocalcords, it is possible to find the fundamental frequency of the speech. In contrast to this, thevocal cords are not used when pronouncing unvoiced sounds. Because the vocal cords arenot used, is it not possible to find a fundamental frequency in unvoiced speech. I general allvowels are voiced sounds. Examples of unvoiced sounds are /sh/ /s/ and /p/

There are different ways to detect if voice are voiced or unvoiced. As mentioned earlierthe fundamental frequency can be used to detect the voiced and unvoiced parts of speech.Another way is to calculate the energy in the signal (signal frame). There are more energy ina voiced sound than in a unvoiced sound. Figure 1.4 shows a speech signal that is dividedinto three parts, a original part, a voiced and a unvoiced part. The figure shows that thereare more energy in the voiced part than in the unvoiced part.

Time (sec)

Fre

q (k

Hz)

Speech Sample Spectrogram orginal

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

Time (sec)

Fre

q (k

Hz)

Speech Sample Spectrogram voiced

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

Time (sec)

Fre

q (k

Hz)

Speech Sample Spectrogram orginal unvoiced

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

Figure 1.4: Spectrogram of speech signal

III

Page 4: Overview Of the human speech system

CHAPTER 1. OVERVIEW OF THE HUMAN SPEECH SYSTEM

1.3 model of human speech

According to "overview of the human speech" a model for the human speech productionare shown in figure 1.5a. The mouth and nose can be described as a time-varying filter.The background of the LPC is to minimize the sum of the squared difference between theoriginal speech and the estimated speech. The LPC are normally estimated in frames of 20ms. The reason 20ms i chosen, is that a speech signal is consider quasi stationary in this timeframe. The transfer function of the time varying filter is given in equation 1.1.

H(z) =G

1−p

∑k=1

akZ−k(1.1)

In the transfer function the linear prediction coefficients are represented by ak and G is thegain, p is the number of coefficients. The theory of the LPC is described in the worksheet"Line Spectrum Pairs". To estimate a prober speech, the model must be able to conclude if thespeech is voiced or unvoiced, this is done with a pitch detector (described in the worksheet"pitch detection"). The pitch detector also finds the fundamental frequency, who controlsthe impulse train. A model of the speech model is shown in figure 1.5b.

Sound Pressure

Voiced

Excitation

Unvoiced

Excitation

1-a

a

VocalCords

Articulation

Quasi-periodic

excitation signal

Noise-like

excitation signal

Energy

Tone Generator

(pulse train)

Noise

Generator

Fundamental

Frequency

Variable

LPC filter

Voiced/unvoiced

Decision

Filter Cofficients

Lungs Mouth/Nose

Speech

Speech

Excitation Articulation

a)

b)

Figure 1.5: [4]a)Model for the human speech production b)Speech model forelectrical system

IV

Page 5: Overview Of the human speech system

1.4. LITTERATURE LIST

1.4 Litterature list

[1] http://ispl.korea.ac.kr/~wikim/research/speech.html

[2] http://www.kt.tu-cottbus.de/speech-analysis/

[3] http://www.student.chula.ac.th/~47704705/2-2.html

[4] http://www.kt.tu-cottbus.de/speech-analysis/tech.html

V