Evaluation of Speech Intelligibility of Malay Words in...

Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017

65

Evaluation of Speech Intelligibility of Malay Words in Terms of Reverberation

in Medium Classroom

Mokhtar Harun, Nadia Listari Abdul Rahim*, Khairunnisa Mohd Yusof, Mohamad Ngasri Dimon, Puspa

Inayat Khalid, Siti Zaleha Abdul Hamid Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor, MALAYSIA.

Nazli Che Din Faculty of Built Environment, University of Malaya, Kuala Lumpur, MALAYSIA.

Abstract: It is generally accepted that reverberation in classroom affects speech intelligibility (SI) score. However,

reverberation is yet to be quantified in terms of SI parameters. The purpose of this paper is to quantify the effect of

reverberation (in terms of reverberation time, RT60) with respect to SI parameters which are Clarity (C50), Definition (D50) and

Speech Transmission Index for Public Address system (STIPA). Four classrooms had been chosen for the study which gave

three different RT60; 1.1s (Room 1 and Room2), 1.5s (Room 3) and 1.1s (Room 4). Six words of four manners of articulations

had been played and recorded in those four classrooms. The recorded sound waveforms had been analyzed using established

formula and also by DIRAC software. The average clarity for Room 1 and Room 2 are similar while Room 4 has higher C50

than Room 3. D50 gives good definition at critical distance for the same words for Room 1 and Room 2, and different words for

Room 3 and Room 4. STIPA gives good SI for the same words for Room 1 and Room 2 while different words for Room 3 and

Room 4. Thus, rooms with the same RT60 gives the same SI scores.

.

Keywords: Speech intelligibility, reverberation, clarity, definition, STI

.

1 INTRODUCTION

The goal of delivering speech is to ensure acceptable

speech intelligibility in terms of clarity of the speech

to be listened by the audience. Speech intelligibility is

the level of understanding of a word in a certain

condition. The intelligibility of speech can be affected

by noise that comes from ventilation equipment such

as fans and air conditioner sets. If noises from

ventilation equipment are treated so that the level of

noise becomes low and bearable, the intelligibility of

speech in that particular room can be restored.

In addition, SI in the room can be affected the

reverberant level of the room. Reverberant sound

level is quantified in terms of reverberation time

(RT60). SI is inversely proportional to reverberation

time (RT60) and directly proportional to signal-to-

noise ratio (SNR) (Hodgson and Nosal, 2001).

Lecture hall is usually finite in dimension where it

serves speech as the main element to be concerned.

This study is carried out to investigate the speech

intelligibility in four lecture rooms which the two

rooms have the same volumes and the other two with

different volumes (Volume of Room 1 and Room 2

are 369 m3 each and volume of Room 3 and Room 4

are 634 m3 and 562 m

3 respectively). Volume of the

room is one of the factors that influence the

reverberant level in the room (Mikulski and Radosz,

2011). The finite dimension of these rooms had

contributed to the reflection of sound wave which then

produced the reverberation phenomenon (Everest,

2001). This reverberation phenomenon will affect the

acoustic performance of the room which is measured

in terms of acoustic parameters.

Table 1. Recommended RT60 values in classrooms

(Mikulski and Radosz, 2011).

Room type

Optimum

reverberation time, Topt

(s)

Tolerance range for

reverberation time, T

Classrooms

with volume

V = 30 – 1000

m3

Topt = 0.32 log(V) –

0.17

0.65 Topt < T < 1.2

Topt ( for 125 and

4000Hz bands)

0.8 Topt < T < 1.2 Topt

(for 250,500, 1000

and 2000 Hz bands)

In order to define acoustic parameters, impulse

response from energy decay curve (EDC) is used. The

time taken for EDC to decay to 60 dB is called the

reverberation time. Lower RT60 will gives better SI in

the room due to lesser effect of overlap masking

(Sodsri, 2012). By having volume of the room (V),

surface area of the room (S) and the absorption

coefficient based on material used for the room (α),

RT60 can be estimated by the Sabine’s formula as

shown in Equation (1).


66

S

VRT

161.060 (1)

RT60 plays a different role for different type of space.

For instant, classroom needs to have low RT60 to

prevent the delivered speech from being affected by

the reflected sound while orchestra hall need higher

RT60 for the music to sound livelier as shown in

Figure 1 below.

Figure 1. RT60 in different room purposes (Rossing,

2014).

Sitting in front of the class which is approximately at

critical distance of the class will have better speech

clarity as compared to the middle and back of class

which is near to the back wall (Sodsri, 2012). The

solid wall will give higher reflection of sound thus

producing higher RT60. Higher RT60 will give low

quality of SI (Andrijasevic et al., 2012).

Critical distance (dc) is defined as the distance from

sound source where the direct and reflected energies

are equal (Borwick, 2001). Reflected sound is small at

location near to dc. Critical distance is in meter can be

calculated by using Equation (2) below where V is the

volume of the room and RT60 is the reverberation

time of the room (Hall, 1993).

)60(1.0)(

RT

Vmdc

(2)

Another parameter used to analyse the acoustic

performance is clarity (C50). C50 is the logarithmic

ratio of sound energy receives within the first 50ms

over the energy arrive after 50ms (Kassim et al.,

2015). The late arriving sound or reflected sound

which arrives before 50ms will amplify the direct

sound while latter than that will mask the sound and

affect the SI of the delivered speech (Kristiansen et

al., 2011). C50 higher than 0 dB is optimal for SI

(Berardi, 2012). SI for C50 indicator is shown in

Table 1.

Table 1. Relationship between C50 and SI (Shimokura

et al., 2014)

Bad Poor Fair Good Excellent

C50 < -7

dB

-7 dB<

C50 < -2

dB

-2 dB<

C50 < 2

dB

2 dB<

C50 < 7

dB

C50 > 7

dB

C50 can be obtained by calculation using Equation (3)

below (Rossing, 2014).

1log10)

60

104.1(

50 RTeC (3)

Acoustic performance can also be analysed by

definition (D50). D50 is similar to C50 but it is

expressed in % instead of dB. It is the early sound

energy within the first 50ms after the direct sound

arrives (Kassim et al., 2015). Figure 2 shows the

relationship between D50 and SI which the value needs

to exceed 20% to have good SI. D50 increases

proportionally with SI (Kassim et al., 2015).

Figure 2. D50 against SI (Mikulski and Radosz, 2011)

SI quality can also be measured by using Speech

Transmission Index (STI). It measures SI by including

background noise and reverberant level during the

sound transmission (Borwick, 2001). The STI value

ranges from 0 to 1 which the 0 indicates bad SI and 1

indicates excellent SI as shown in Table 2 below. The

desired STI value for lecture halls is at least 0.6 to

have a high comprehensible level during speech

transmission (Rossing, 2014).

Table 2. STI meter for SI (Mikulski and Radosz,

2011)

STI 0-0.3 0.30-

0.45

0.45-

0.60

0.60-

0.75 0.75-1.0

Subjective

speech clarity

evaluation

Unintelligible Poor Fair Good Excellent


67

STIPA is also used to measure SI when Public

Address (PA) is installed in the system. It allows

accurate test with portable instrument. STIPA meter is

also the same as STI meter.

SI quality is also depends on Manner of Articulation

(MoA). MoA can be categorized into seven types for

Malay words (Table 3). Plosive, fricative, affricative,

nasal, trill, lateral and approximant are the types of

MoA for Malay words.

Table 3. Manner of Articulation (Tench, 2011).

Types of MoA

Definition Example

Malay words

Plosive Stop the air flow at some point and

releasing it suddenly

tadi

Fricative Air is force through a narrow gap to

produce hissing sound

sama

Affricate Speech sound consisting of a plosive

and a fricative articulated at the same place of articulation

jika

Nasal Produced through the nose with the mouth closed

mana

Trill Articulation with a rapid flutter of the tongue against the palate

rasa

Lateral Air passage in the centre is blocked lari

Approximant Blocked the airflow in small amount warna

In order to investigate the SI in the lecture room,

alveolar, plosive, fricative and nasal phonemes are

chose based on previous study. The chosen words are

tabulated in Table 4 below.

Table 4. Words used in the experiments

MoA choose Selected Words

Plosive DAPAT

Plosive TAHUN Nasal NADI

Fricative SAYA

Vowel AMAN Vowel USIA

2 METHODOLOGY

The 6 Malay words chosen are AMAN, DAPAT,

NADI, SAYA, TAHUN, and USIA. These words are

phonetically balanced which the manner of

articulation (MoA) are plosive, nasal, fricative and

vowel.

The sample rooms used to carry out this study are four

lecture rooms in Fakulti Kejuruteraan Elektrik (FKE),

Universiti Teknologi Malaysia Johor Bahru (UTM

Skudai) which are P05-105, P05-104 P16 room Demo

2 and P16 room BKT 2. The top view and the side

view of the rooms are obtained. An example is shown

in Table 5 below.

Table 5. Layout of sample room

Side view Top view

Measurement of RT60 is done by using SOLO sound

meter device. The RT60 for the rooms were measured

at critical distance, 2m, 4m, and 6m from the lecture

stage, where the sound source was placed. There

should be no statistically significant of RT60 between

Room 1 and Room 2 for this study to ensure the

validity of the same RT60 while the RT60 for Room 3

and Room 4 need to be statistically significant.

The speech intelligibility of the recorded words was

then analyzed by using DIRAC software. The steps

are shown in the flowchart in Figure 3 below.

Figure 3. Overall flow of project

3 RESULTS AND DISCUSSION

The results of measuring SI were collected in four

parameters which are RT60, C50, D50 and STIPA.

From Equation (2), critical distance obtained in this

experiment is 1.04m for both Room 1 and Room 2,

1.2m for Room 3 and 1.3m for Room 4.

Literature review on related research

Understanding MoA of Malay words

Selecting 6 Malay words with 2 syllables for 4

types of MoA

Finding four sample rooms which the two rooms

have the same dimension and the others are

different dimension

Measuring RT60 in all sample rooms by using

SOLO

Record the selected words in both rooms at critical

distance, 2m, 4m and 6m by using RPA

Analyzing results using DIRAC software


68

(a) Paired t-test

Hypothesized Difference = 0

RT60 Room 1, RT60 Room 2 Mean Diff. DF t-Value P-Value

.023 3 .940 .4166

(b)

Paired t-test Hypothesized Difference = 0

RT60 Room 3, RT60 Room 4 Mean Diff. DF t-Value P-Value

.230 3 3.860 .0307

Table 6. RT60 at various critical distances at one-octave center frequency 250 Hz to 2000 Hz. Sample

Rooms

RT60, (s)

Distance, (m)

250

500

1000

2000

Room 1

1.04 1.18 1.10 1.10 1.10

2.00 1.28 1.10 1.10 1.20

4.00 1.41 1.16 1.09 1.13

6.00 1.44 1.22 1.08 1.12

Average 1.33 1.15 1.09 1.14

Room 2

1.04 1.58 1.25 1.08 1.14

2.00 1.45 1.10 1.06 1.10

4.00 1.24 0.95 1.09 1.08

6.00 1.15 0.95 1.14 1.09

Average 1.36 1.06 1.09 1.10

Room 3

1.20 0.44 1.58 1.63 1.36

2.00 0.32 1.47 1.55 1.33

4.00 0.35 1.38 1.47 1.27

6.00 0.50 1.62 1.43 1.2

Average 0.45 1.48 1.50 1.33

Room 4

1.30 0.11 1.10 1.24 1.18

2.00 0.24 1.35 1.10 1.11

4.00 0.36 1.35 1.10 1.08

6.00 0.26 1.54 1.28 1.39

Average 0.23 1.35 1.10 1.16

Table 6 shows the RT60 at all sample rooms which

were measured at four distinct locations in the rooms.

The average values of RT60 were taken at 250Hz,

500Hz, 1000Hz and 2000Hz as the acoustical

characteristic such as noise reduction coefficient

(NRC) is considering those frequencies.

00.20.40.60.8

11.21.41.6

250 500 1000 2000Room 1Room 2Room 3Room 4

Frequency, (Hz)

RT6

0,(

s)

Figure 4. Graph of RT60 for Room 1 and Room 2

The graphical data in Figure 4 above shows that RT60

for both Room 1 and Room 2 are similar which are

1.1s while Room 3 and Room 4 are different which

are 1.5s and 1.1s respectively (taken at 1000Hz). This

similarity and differences are proven by significant

statistic test through paired-comparison test using

STATVIEW software which is indicated by the p-

value.

Figure 5.(a) Paired-comparison tests between RT60 at

Room 1 and RT60 at Room2. (b) Paired-comparison test between RT60 at Room 3 and RT60 at Room 4

Figure 5(a) and Figure 5(b) above show the paired-

comparison tests for RT60 at Room 1, Room 2, Room

3 and Room 4 simulated using STATVIEW software.

For the data to be statistically insignificant, p-value

needs to be greater than 0.05. The p-value for RT60

between Room 1 and Room 2 is 0.4 which is greater

than 0.05. Thus, the RT60 of both rooms are

considered similar as they are statistically

insignificant. In contrast, the p-value for RT60 at

Room 3 and Room 4 is 0.0307 which is less than 0.05

indicating that both rooms are different in RT60.

Value of RT60 is taken at 1000Hz as acoustic

measurements usually referred to that particular

frequency.

C50 is calculated for further analysis of SI in all

sample rooms. It can be obtained by substituting the

RT60 value into Equation (3).

Table 7. C50 at various critical distances at one-octave center frequency 250 Hz to 2000 Hz

Sample

Rooms

C50, (dB)

Distance, (m)

250

500

1000

2000

Room 1

1.04 2 2 2 2

2.00 1 2 2 2

4.00 1 2 2 2

6.00 1 2 3 2

Average RT60 1 2 2 2

Room 2

1.04 0 2 3 2

2.00 1 2 3 2

4.00 2 3 2 3

6.00 2 3 2 2


Room 3

1.20 11 0 0 1

2.00 15 1 0 1

4.00 14 1 1 1

6.00 9 0 1 2


Room 4

1.30 49 2 2 2

2.00 20 1 2 2

4.00 13 1 2 3

6.00 18 0 1 1


*all value for C50 had been rounded off to nearest decimal*

Frequency,

(Hz)

(Hz)

Frequency,


69

0

5

10

15

20

25

250 500 1000 2000

Room 1

Room 2

Room 3

Room 4

Frequency, (Hz)

C50

, (d

B)

Figure 6. Graph of C50 against frequency at average RT60

The result for C50 is tabulated and illustrated in Table 7

and Figure 6 above shows that the lowest value of C50

is 0 dB. This indicates that all rooms have good SI as

the clarity is greater than -2 dB. The average C50 for

Room 1 and Room 2 are the same which is 2 dB while

Room 3 and Room 4 are different which are 4 dB and

7dB respectively. Thus, for the same RT60 of the

rooms yield the same score of SI. It is also shown that

C50 is lower at lower frequency for Room 1 and Room

2 which indicates that SI for words started with

consonant is better in clarity as compared to word

started with vowel while Room 3 and Room 4 give

higher C50 at lower frequency indicating that word

started with vowel has better SI as compared to

consonant-started word.

Table 8. D50 at various critical distances at one-octave center frequency 250 Hz to 2000 Hz

Sample

Rooms

D50, (%)

Distance, (m)

A

M

A

N

D

A

P

A

T

N

A

D

I

S

A

Y

A

T

A

H

U

N

U

S

I

A

Room 1

1.04 25 18 0 18 39 1

2.00 29 13 0 17 33 0

4.00 10 7 0 13 46 0

6.00 12 12 0 16 0 0

Room 2

1.04 28 16 2 22 40 1

2.00 32 12 0 17 47 1

4.00 58 8 0 11 27 0

6.00 70 12 72 72 37 68

Room 3

1.20 32 3 9 0 20 11

2.00 22 12 0 0 3 57

4.00 12 4 0 0 4 17

6.00 7 1 0 0 2 23

Room 4

1.30 21 6 6 39 6 52

2.00 33 20 0 2 8 67

4.00 10 5 0 0 3 80

6.00 4 1 1 0 6 49

Table 8 above shows the value of D50 for all sample

rooms. To achieve good quality of SI, D50 needs to

exceed 20%. From both tables, the word AMAN and

TAHUN at Room 1 give D50 greater than 20% at

distance less than 2 meters and 4 meters respectively

while Room 1 is at all distances. These words give

better SI as compared to other words. The word NADI

which the phoneme is nasal shows the lowest D50 in

both rooms which indicates the most affected

phoneme in rooms with reverberant. Thus, for rooms

with the same RT60 will give the same quality of SI

for each phoneme.

Room 3 and Room 4 give good SI in terms of

definition for the word USIA which the word started

with vowel while the worst SI is the word SAYA

(fricative) in Room 3 and NADI (nasal) in both

rooms. Thus, rooms with different RT60 gives

different D50 score for different types of phonemes

which the phoneme fricative is the most affected

phoneme in Room 3 while the reverberant affects the

most for phonemes fricative and nasal in Room 4.

Table 9. STIPA value for all sample rooms

Sample

Rooms

STIPA

(m)

A

M

A

N

D

A

P

A

T

N

A

D

I

S

A

Y

A

T

A

H

U

N

U

S

I

A

Room 1

1.04 0.55 0.64 0.72 0.57 0.66 0.57

2.00 0.52 0.60 0.65 0.58 0.60 0.55

4.00 0.50 0.53 0.58 0.52 0.59 0.46

6.00 0.53 0.49 0.65 0.44 0.58 0.49

Room 2

1.04 0.57 0.63 0.72 0.57 0.67 0.55

2.00 0.54 0.61 0.67 0.59 0.60 0.55

4.00 0.49 0.55 0.58 0.46 0.59 0.46

6.00 0.49 0.47 0.56 0.46 0.58 0.50

Room 3

1.20 0.51 0.57 0.66 0.53 0.57 0.45

2.00 0.48 0.51 0.56 0.45 0.46 0.41

4.00 0.47 0.44 0.51 0.39 0.44 0.40

6.00 0.36 0.39 0.47 0.43 0.44 0.45

Room 4

1.30 0.48 0.52 0.61 0.50 0.62 0.55

2.00 0.44 0.47 0.55 0.45 0.41 0.45

4.00 0.46 0.46 0.51 0.44 0.47 0.49

6.00 0.43 0.38 0.44 0.39 0.44 0.40

Table 9 above shows the STIPA value for Room 1,

Room 2, Room 3 and Room 4. The words DAPAT,

NADI and TAHUN give good SI at critical distance

for both rooms. The word SAYA gives a poor SI at 6

meters for both rooms. As the distance increases, the

SI will drop. SI for phoneme nasal and plosive give

good SI while phoneme fricative gives poor SI as it is

affected by the reverberation in the rooms.

STIPA value gives poor SI for the words DAPAT,

TAHUN and SAYA at 4m and 6m in Room 3 while

Words

Words

Distance,


70

poor SI for Room 4 is the word SAYA. This shows

that phoneme plosive and fricative is the most affected

phoneme in Room 3 while only phoneme fricative is

affected in room with reverberation. Thus, for the

same reverberation time of the room, the SI score will

be the same and vice versa.

4 CONCLUSION

From analysis of the results, it is concluded that,

a) RT60 of both Room 1 and Room 2 are not

statistically significant which the RT60 are 1.1s

while the RT60 for Room 3 and Room 4 are

statistically significant which are 1.5s and 1.1s

respectively.

b) The C50 value exceeds -2 dB for all distances in

all rooms which indicates good SI. The same

RT60 of the rooms yield the same score of SI in

terms of clarity and vice versa.

c) In terms of D50, reverberation affects the most

nasal phoneme in Room 1 and Room 2, fricative

phoneme in Room 3 and nasal and fricative

phonemes in Room 4.

d) For STIPA value, phoneme nasal and plosive give

good SI at Room 1, Room 2 and Room 4 while

only phoneme nasal gives good SI at Room 3.

e) Rooms with similar RT60 will give the same

quality of SI at all distances.

ACKNOWLEDGMENTS

This research is supported by the Ministry of

Education (MOE) Malaysia, Universiti Teknologi

Malaysia and Fundamental Research Grant Scheme

(FRGS) by Universiti Teknologi Malaysia (Vote. No.

4F870).

REFERENCES

Everest, F. A. (2001), Master Handbook of Acoustic,

McGraw-Hill

Mikulski, W., Radosz, J. (2011). “Acoustics of

Classrooms in Primary Schools- Results of the

Reverberation Time and the Speech Transmission

Index Assessments in Selected Buildings”, Archives

of Acoustics, 36 (4), 777-793.

Sodsri, C. (2012). “Effects of Classroom

Reverberation and Listener’s Locations to Speech

Intelligibility”, 2012 9th International Conference on

Electrical Engineering/Electronics, Computer,

Telecommunications and Information Technology,

Phetchaburi, 2012, pp. 1-4.

Rossing, T. (2014). Springer Handbook of Acoustic,

2nd

Edition, New York: Springer Dordrecht

Heidelberg.

Andrijasevic, A., Ivancevic, B., Horvat, M. (2012).

“Evaluation of Speech Intelligibility in Two

Acoustically Different Spaces Using Logatome Test

and Measured Impulse Response”, Engineering

Review. Vol. 32. 78-85.

Borwick, J. (2001). Loudspeaker and Headphone

Handbook. 3rd

Edition. Woburn, Massachusetts: Reed

Educational and Professional Publishing LTD.

Hall, D. E. (1993). Basic Acoustics. 2nd Edition. New

York: John Wiley and Sons.

Kassim, D. H., Putra, A., Nor, M. J. M. (2015). “The

Acoustical Characteristics of the Sayyidina Abu Bakar

Mosque, UTEM”. Journal of Engineering Science and

Technology, 10 (1), 97 – 110.

Shimokura, R., Matsui, T., Takaki, T., Nishimura, T.,

Yamanaka T., Hosoia, H. (2014). “Evaluation of

Speech Intelligibility in Short-Reverberant Sound

Fields.” Auris Nasus Larynx, 41(4), 343-349.

Tench, P. (2011). Transcribing the Sound of English.

Cambridge University Press.

Hodgson, M., Nosal, E. M. (2001). “Effect of noise

and occupancy on optimal reverberation times for

speech intelligibility in classrooms.” Journal of

Acoustical Society of America, 111(2), 931-939.

Kristiansen, J., Lund, S. P., Nielsen, P.M., Persson,

R., Shibuya, H. (2011). “Determinants of noise

annoyance in teachers from schools with different

classroom reverberation times”. Journal of

Environment Psychology, 31, 383-392.

Berardi, U. (2012). “A Double Synthetic Index to

Evaluate the Acoustics of Churches.” Archives of

Acoustics, 37(4), 521-528.

Evaluation of Speech Intelligibility of Malay Words in...

Documents

Transcript of Evaluation of Speech Intelligibility of Malay Words in...