Evaluation of Speech Intelligibility of Malay Words in...
Transcript of Evaluation of Speech Intelligibility of Malay Words in...
Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017
65
Evaluation of Speech Intelligibility of Malay Words in Terms of Reverberation
in Medium Classroom
Mokhtar Harun, Nadia Listari Abdul Rahim*, Khairunnisa Mohd Yusof, Mohamad Ngasri Dimon, Puspa
Inayat Khalid, Siti Zaleha Abdul Hamid Faculty of Electrical Engineering, Universiti Teknologi Malaysia, Johor, MALAYSIA.
Nazli Che Din Faculty of Built Environment, University of Malaya, Kuala Lumpur, MALAYSIA.
Abstract: It is generally accepted that reverberation in classroom affects speech intelligibility (SI) score. However,
reverberation is yet to be quantified in terms of SI parameters. The purpose of this paper is to quantify the effect of
reverberation (in terms of reverberation time, RT60) with respect to SI parameters which are Clarity (C50), Definition (D50) and
Speech Transmission Index for Public Address system (STIPA). Four classrooms had been chosen for the study which gave
three different RT60; 1.1s (Room 1 and Room2), 1.5s (Room 3) and 1.1s (Room 4). Six words of four manners of articulations
had been played and recorded in those four classrooms. The recorded sound waveforms had been analyzed using established
formula and also by DIRAC software. The average clarity for Room 1 and Room 2 are similar while Room 4 has higher C50
than Room 3. D50 gives good definition at critical distance for the same words for Room 1 and Room 2, and different words for
Room 3 and Room 4. STIPA gives good SI for the same words for Room 1 and Room 2 while different words for Room 3 and
Room 4. Thus, rooms with the same RT60 gives the same SI scores.
.
Keywords: Speech intelligibility, reverberation, clarity, definition, STI
.
1 INTRODUCTION
The goal of delivering speech is to ensure acceptable
speech intelligibility in terms of clarity of the speech
to be listened by the audience. Speech intelligibility is
the level of understanding of a word in a certain
condition. The intelligibility of speech can be affected
by noise that comes from ventilation equipment such
as fans and air conditioner sets. If noises from
ventilation equipment are treated so that the level of
noise becomes low and bearable, the intelligibility of
speech in that particular room can be restored.
In addition, SI in the room can be affected the
reverberant level of the room. Reverberant sound
level is quantified in terms of reverberation time
(RT60). SI is inversely proportional to reverberation
time (RT60) and directly proportional to signal-to-
noise ratio (SNR) (Hodgson and Nosal, 2001).
Lecture hall is usually finite in dimension where it
serves speech as the main element to be concerned.
This study is carried out to investigate the speech
intelligibility in four lecture rooms which the two
rooms have the same volumes and the other two with
different volumes (Volume of Room 1 and Room 2
are 369 m3 each and volume of Room 3 and Room 4
are 634 m3 and 562 m
3 respectively). Volume of the
room is one of the factors that influence the
reverberant level in the room (Mikulski and Radosz,
2011). The finite dimension of these rooms had
contributed to the reflection of sound wave which then
produced the reverberation phenomenon (Everest,
2001). This reverberation phenomenon will affect the
acoustic performance of the room which is measured
in terms of acoustic parameters.
Table 1. Recommended RT60 values in classrooms
(Mikulski and Radosz, 2011).
Room type
Optimum
reverberation time, Topt
(s)
Tolerance range for
reverberation time, T
Classrooms
with volume
V = 30 – 1000
m3
Topt = 0.32 log(V) –
0.17
0.65 Topt < T < 1.2
Topt ( for 125 and
4000Hz bands)
0.8 Topt < T < 1.2 Topt
(for 250,500, 1000
and 2000 Hz bands)
In order to define acoustic parameters, impulse
response from energy decay curve (EDC) is used. The
time taken for EDC to decay to 60 dB is called the
reverberation time. Lower RT60 will gives better SI in
the room due to lesser effect of overlap masking
(Sodsri, 2012). By having volume of the room (V),
surface area of the room (S) and the absorption
coefficient based on material used for the room (α),
RT60 can be estimated by the Sabine’s formula as
shown in Equation (1).
Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017
66
S
VRT
161.060 (1)
RT60 plays a different role for different type of space.
For instant, classroom needs to have low RT60 to
prevent the delivered speech from being affected by
the reflected sound while orchestra hall need higher
RT60 for the music to sound livelier as shown in
Figure 1 below.
Figure 1. RT60 in different room purposes (Rossing,
2014).
Sitting in front of the class which is approximately at
critical distance of the class will have better speech
clarity as compared to the middle and back of class
which is near to the back wall (Sodsri, 2012). The
solid wall will give higher reflection of sound thus
producing higher RT60. Higher RT60 will give low
quality of SI (Andrijasevic et al., 2012).
Critical distance (dc) is defined as the distance from
sound source where the direct and reflected energies
are equal (Borwick, 2001). Reflected sound is small at
location near to dc. Critical distance is in meter can be
calculated by using Equation (2) below where V is the
volume of the room and RT60 is the reverberation
time of the room (Hall, 1993).
)60(1.0)(
RT
Vmdc
(2)
Another parameter used to analyse the acoustic
performance is clarity (C50). C50 is the logarithmic
ratio of sound energy receives within the first 50ms
over the energy arrive after 50ms (Kassim et al.,
2015). The late arriving sound or reflected sound
which arrives before 50ms will amplify the direct
sound while latter than that will mask the sound and
affect the SI of the delivered speech (Kristiansen et
al., 2011). C50 higher than 0 dB is optimal for SI
(Berardi, 2012). SI for C50 indicator is shown in
Table 1.
Table 1. Relationship between C50 and SI (Shimokura
et al., 2014)
Bad Poor Fair Good Excellent
C50 < -7
dB
-7 dB<
C50 < -2
dB
-2 dB<
C50 < 2
dB
2 dB<
C50 < 7
dB
C50 > 7
dB
C50 can be obtained by calculation using Equation (3)
below (Rossing, 2014).
1log10)
60
104.1(
50 RTeC (3)
Acoustic performance can also be analysed by
definition (D50). D50 is similar to C50 but it is
expressed in % instead of dB. It is the early sound
energy within the first 50ms after the direct sound
arrives (Kassim et al., 2015). Figure 2 shows the
relationship between D50 and SI which the value needs
to exceed 20% to have good SI. D50 increases
proportionally with SI (Kassim et al., 2015).
Figure 2. D50 against SI (Mikulski and Radosz, 2011)
SI quality can also be measured by using Speech
Transmission Index (STI). It measures SI by including
background noise and reverberant level during the
sound transmission (Borwick, 2001). The STI value
ranges from 0 to 1 which the 0 indicates bad SI and 1
indicates excellent SI as shown in Table 2 below. The
desired STI value for lecture halls is at least 0.6 to
have a high comprehensible level during speech
transmission (Rossing, 2014).
Table 2. STI meter for SI (Mikulski and Radosz,
2011)
STI 0-0.3 0.30-
0.45
0.45-
0.60
0.60-
0.75 0.75-1.0
Subjective
speech clarity
evaluation
Unintelligible Poor Fair Good Excellent
Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017
67
STIPA is also used to measure SI when Public
Address (PA) is installed in the system. It allows
accurate test with portable instrument. STIPA meter is
also the same as STI meter.
SI quality is also depends on Manner of Articulation
(MoA). MoA can be categorized into seven types for
Malay words (Table 3). Plosive, fricative, affricative,
nasal, trill, lateral and approximant are the types of
MoA for Malay words.
Table 3. Manner of Articulation (Tench, 2011).
Types of MoA
Definition Example
Malay words
Plosive Stop the air flow at some point and
releasing it suddenly
tadi
Fricative Air is force through a narrow gap to
produce hissing sound
sama
Affricate Speech sound consisting of a plosive
and a fricative articulated at the same place of articulation
jika
Nasal Produced through the nose with the mouth closed
mana
Trill Articulation with a rapid flutter of the tongue against the palate
rasa
Lateral Air passage in the centre is blocked lari
Approximant Blocked the airflow in small amount warna
In order to investigate the SI in the lecture room,
alveolar, plosive, fricative and nasal phonemes are
chose based on previous study. The chosen words are
tabulated in Table 4 below.
Table 4. Words used in the experiments
MoA choose Selected Words
Plosive DAPAT
Plosive TAHUN Nasal NADI
Fricative SAYA
Vowel AMAN Vowel USIA
2 METHODOLOGY
The 6 Malay words chosen are AMAN, DAPAT,
NADI, SAYA, TAHUN, and USIA. These words are
phonetically balanced which the manner of
articulation (MoA) are plosive, nasal, fricative and
vowel.
The sample rooms used to carry out this study are four
lecture rooms in Fakulti Kejuruteraan Elektrik (FKE),
Universiti Teknologi Malaysia Johor Bahru (UTM
Skudai) which are P05-105, P05-104 P16 room Demo
2 and P16 room BKT 2. The top view and the side
view of the rooms are obtained. An example is shown
in Table 5 below.
Table 5. Layout of sample room
Side view Top view
Measurement of RT60 is done by using SOLO sound
meter device. The RT60 for the rooms were measured
at critical distance, 2m, 4m, and 6m from the lecture
stage, where the sound source was placed. There
should be no statistically significant of RT60 between
Room 1 and Room 2 for this study to ensure the
validity of the same RT60 while the RT60 for Room 3
and Room 4 need to be statistically significant.
The speech intelligibility of the recorded words was
then analyzed by using DIRAC software. The steps
are shown in the flowchart in Figure 3 below.
Figure 3. Overall flow of project
3 RESULTS AND DISCUSSION
The results of measuring SI were collected in four
parameters which are RT60, C50, D50 and STIPA.
From Equation (2), critical distance obtained in this
experiment is 1.04m for both Room 1 and Room 2,
1.2m for Room 3 and 1.3m for Room 4.
Literature review on related research
Understanding MoA of Malay words
Selecting 6 Malay words with 2 syllables for 4
types of MoA
Finding four sample rooms which the two rooms
have the same dimension and the others are
different dimension
Measuring RT60 in all sample rooms by using
SOLO
Record the selected words in both rooms at critical
distance, 2m, 4m and 6m by using RPA
Analyzing results using DIRAC software
Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017
68
(a) Paired t-test
Hypothesized Difference = 0
RT60 Room 1, RT60 Room 2 Mean Diff. DF t-Value P-Value
.023 3 .940 .4166
(b)
Paired t-test Hypothesized Difference = 0
RT60 Room 3, RT60 Room 4 Mean Diff. DF t-Value P-Value
.230 3 3.860 .0307
Table 6. RT60 at various critical distances at one-octave center frequency 250 Hz to 2000 Hz. Sample
Rooms
RT60, (s)
Distance, (m)
250
500
1000
2000
Room 1
1.04 1.18 1.10 1.10 1.10
2.00 1.28 1.10 1.10 1.20
4.00 1.41 1.16 1.09 1.13
6.00 1.44 1.22 1.08 1.12
Average 1.33 1.15 1.09 1.14
Room 2
1.04 1.58 1.25 1.08 1.14
2.00 1.45 1.10 1.06 1.10
4.00 1.24 0.95 1.09 1.08
6.00 1.15 0.95 1.14 1.09
Average 1.36 1.06 1.09 1.10
Room 3
1.20 0.44 1.58 1.63 1.36
2.00 0.32 1.47 1.55 1.33
4.00 0.35 1.38 1.47 1.27
6.00 0.50 1.62 1.43 1.2
Average 0.45 1.48 1.50 1.33
Room 4
1.30 0.11 1.10 1.24 1.18
2.00 0.24 1.35 1.10 1.11
4.00 0.36 1.35 1.10 1.08
6.00 0.26 1.54 1.28 1.39
Average 0.23 1.35 1.10 1.16
Table 6 shows the RT60 at all sample rooms which
were measured at four distinct locations in the rooms.
The average values of RT60 were taken at 250Hz,
500Hz, 1000Hz and 2000Hz as the acoustical
characteristic such as noise reduction coefficient
(NRC) is considering those frequencies.
00.20.40.60.8
11.21.41.6
250 500 1000 2000Room 1Room 2Room 3Room 4
Frequency, (Hz)
RT6
0,(
s)
Figure 4. Graph of RT60 for Room 1 and Room 2
The graphical data in Figure 4 above shows that RT60
for both Room 1 and Room 2 are similar which are
1.1s while Room 3 and Room 4 are different which
are 1.5s and 1.1s respectively (taken at 1000Hz). This
similarity and differences are proven by significant
statistic test through paired-comparison test using
STATVIEW software which is indicated by the p-
value.
Figure 5.(a) Paired-comparison tests between RT60 at
Room 1 and RT60 at Room2. (b) Paired-comparison test between RT60 at Room 3 and RT60 at Room 4
Figure 5(a) and Figure 5(b) above show the paired-
comparison tests for RT60 at Room 1, Room 2, Room
3 and Room 4 simulated using STATVIEW software.
For the data to be statistically insignificant, p-value
needs to be greater than 0.05. The p-value for RT60
between Room 1 and Room 2 is 0.4 which is greater
than 0.05. Thus, the RT60 of both rooms are
considered similar as they are statistically
insignificant. In contrast, the p-value for RT60 at
Room 3 and Room 4 is 0.0307 which is less than 0.05
indicating that both rooms are different in RT60.
Value of RT60 is taken at 1000Hz as acoustic
measurements usually referred to that particular
frequency.
C50 is calculated for further analysis of SI in all
sample rooms. It can be obtained by substituting the
RT60 value into Equation (3).
Table 7. C50 at various critical distances at one-octave center frequency 250 Hz to 2000 Hz
Sample
Rooms
C50, (dB)
Distance, (m)
250
500
1000
2000
Room 1
1.04 2 2 2 2
2.00 1 2 2 2
4.00 1 2 2 2
6.00 1 2 3 2
Average RT60 1 2 2 2
Room 2
1.04 0 2 3 2
2.00 1 2 3 2
4.00 2 3 2 3
6.00 2 3 2 2
Average RT60 1 3 2 2
Room 3
1.20 11 0 0 1
2.00 15 1 0 1
4.00 14 1 1 1
6.00 9 0 1 2
Average RT60 10 0 0 1
Room 4
1.30 49 2 2 2
2.00 20 1 2 2
4.00 13 1 2 3
6.00 18 0 1 1
Average RT60 21 1 2 2
*all value for C50 had been rounded off to nearest decimal*
Frequency,
(Hz)
(Hz)
Frequency,
Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017
69
0
5
10
15
20
25
250 500 1000 2000
Room 1
Room 2
Room 3
Room 4
Frequency, (Hz)
C50
, (d
B)
Figure 6. Graph of C50 against frequency at average RT60
The result for C50 is tabulated and illustrated in Table 7
and Figure 6 above shows that the lowest value of C50
is 0 dB. This indicates that all rooms have good SI as
the clarity is greater than -2 dB. The average C50 for
Room 1 and Room 2 are the same which is 2 dB while
Room 3 and Room 4 are different which are 4 dB and
7dB respectively. Thus, for the same RT60 of the
rooms yield the same score of SI. It is also shown that
C50 is lower at lower frequency for Room 1 and Room
2 which indicates that SI for words started with
consonant is better in clarity as compared to word
started with vowel while Room 3 and Room 4 give
higher C50 at lower frequency indicating that word
started with vowel has better SI as compared to
consonant-started word.
Table 8. D50 at various critical distances at one-octave center frequency 250 Hz to 2000 Hz
Sample
Rooms
D50, (%)
Distance, (m)
A
M
A
N
D
A
P
A
T
N
A
D
I
S
A
Y
A
T
A
H
U
N
U
S
I
A
Room 1
1.04 25 18 0 18 39 1
2.00 29 13 0 17 33 0
4.00 10 7 0 13 46 0
6.00 12 12 0 16 0 0
Room 2
1.04 28 16 2 22 40 1
2.00 32 12 0 17 47 1
4.00 58 8 0 11 27 0
6.00 70 12 72 72 37 68
Room 3
1.20 32 3 9 0 20 11
2.00 22 12 0 0 3 57
4.00 12 4 0 0 4 17
6.00 7 1 0 0 2 23
Room 4
1.30 21 6 6 39 6 52
2.00 33 20 0 2 8 67
4.00 10 5 0 0 3 80
6.00 4 1 1 0 6 49
Table 8 above shows the value of D50 for all sample
rooms. To achieve good quality of SI, D50 needs to
exceed 20%. From both tables, the word AMAN and
TAHUN at Room 1 give D50 greater than 20% at
distance less than 2 meters and 4 meters respectively
while Room 1 is at all distances. These words give
better SI as compared to other words. The word NADI
which the phoneme is nasal shows the lowest D50 in
both rooms which indicates the most affected
phoneme in rooms with reverberant. Thus, for rooms
with the same RT60 will give the same quality of SI
for each phoneme.
Room 3 and Room 4 give good SI in terms of
definition for the word USIA which the word started
with vowel while the worst SI is the word SAYA
(fricative) in Room 3 and NADI (nasal) in both
rooms. Thus, rooms with different RT60 gives
different D50 score for different types of phonemes
which the phoneme fricative is the most affected
phoneme in Room 3 while the reverberant affects the
most for phonemes fricative and nasal in Room 4.
Table 9. STIPA value for all sample rooms
Sample
Rooms
STIPA
(m)
A
M
A
N
D
A
P
A
T
N
A
D
I
S
A
Y
A
T
A
H
U
N
U
S
I
A
Room 1
1.04 0.55 0.64 0.72 0.57 0.66 0.57
2.00 0.52 0.60 0.65 0.58 0.60 0.55
4.00 0.50 0.53 0.58 0.52 0.59 0.46
6.00 0.53 0.49 0.65 0.44 0.58 0.49
Room 2
1.04 0.57 0.63 0.72 0.57 0.67 0.55
2.00 0.54 0.61 0.67 0.59 0.60 0.55
4.00 0.49 0.55 0.58 0.46 0.59 0.46
6.00 0.49 0.47 0.56 0.46 0.58 0.50
Room 3
1.20 0.51 0.57 0.66 0.53 0.57 0.45
2.00 0.48 0.51 0.56 0.45 0.46 0.41
4.00 0.47 0.44 0.51 0.39 0.44 0.40
6.00 0.36 0.39 0.47 0.43 0.44 0.45
Room 4
1.30 0.48 0.52 0.61 0.50 0.62 0.55
2.00 0.44 0.47 0.55 0.45 0.41 0.45
4.00 0.46 0.46 0.51 0.44 0.47 0.49
6.00 0.43 0.38 0.44 0.39 0.44 0.40
Table 9 above shows the STIPA value for Room 1,
Room 2, Room 3 and Room 4. The words DAPAT,
NADI and TAHUN give good SI at critical distance
for both rooms. The word SAYA gives a poor SI at 6
meters for both rooms. As the distance increases, the
SI will drop. SI for phoneme nasal and plosive give
good SI while phoneme fricative gives poor SI as it is
affected by the reverberation in the rooms.
STIPA value gives poor SI for the words DAPAT,
TAHUN and SAYA at 4m and 6m in Room 3 while
Words
Words
Distance,
Proceedings of the International Conference on Vibration, Sound and System Dynamics Penang, 2 August 2017
70
poor SI for Room 4 is the word SAYA. This shows
that phoneme plosive and fricative is the most affected
phoneme in Room 3 while only phoneme fricative is
affected in room with reverberation. Thus, for the
same reverberation time of the room, the SI score will
be the same and vice versa.
4 CONCLUSION
From analysis of the results, it is concluded that,
a) RT60 of both Room 1 and Room 2 are not
statistically significant which the RT60 are 1.1s
while the RT60 for Room 3 and Room 4 are
statistically significant which are 1.5s and 1.1s
respectively.
b) The C50 value exceeds -2 dB for all distances in
all rooms which indicates good SI. The same
RT60 of the rooms yield the same score of SI in
terms of clarity and vice versa.
c) In terms of D50, reverberation affects the most
nasal phoneme in Room 1 and Room 2, fricative
phoneme in Room 3 and nasal and fricative
phonemes in Room 4.
d) For STIPA value, phoneme nasal and plosive give
good SI at Room 1, Room 2 and Room 4 while
only phoneme nasal gives good SI at Room 3.
e) Rooms with similar RT60 will give the same
quality of SI at all distances.
ACKNOWLEDGMENTS
This research is supported by the Ministry of
Education (MOE) Malaysia, Universiti Teknologi
Malaysia and Fundamental Research Grant Scheme
(FRGS) by Universiti Teknologi Malaysia (Vote. No.
4F870).
REFERENCES
Everest, F. A. (2001), Master Handbook of Acoustic,
McGraw-Hill
Mikulski, W., Radosz, J. (2011). “Acoustics of
Classrooms in Primary Schools- Results of the
Reverberation Time and the Speech Transmission
Index Assessments in Selected Buildings”, Archives
of Acoustics, 36 (4), 777-793.
Sodsri, C. (2012). “Effects of Classroom
Reverberation and Listener’s Locations to Speech
Intelligibility”, 2012 9th International Conference on
Electrical Engineering/Electronics, Computer,
Telecommunications and Information Technology,
Phetchaburi, 2012, pp. 1-4.
Rossing, T. (2014). Springer Handbook of Acoustic,
2nd
Edition, New York: Springer Dordrecht
Heidelberg.
Andrijasevic, A., Ivancevic, B., Horvat, M. (2012).
“Evaluation of Speech Intelligibility in Two
Acoustically Different Spaces Using Logatome Test
and Measured Impulse Response”, Engineering
Review. Vol. 32. 78-85.
Borwick, J. (2001). Loudspeaker and Headphone
Handbook. 3rd
Edition. Woburn, Massachusetts: Reed
Educational and Professional Publishing LTD.
Hall, D. E. (1993). Basic Acoustics. 2nd Edition. New
York: John Wiley and Sons.
Kassim, D. H., Putra, A., Nor, M. J. M. (2015). “The
Acoustical Characteristics of the Sayyidina Abu Bakar
Mosque, UTEM”. Journal of Engineering Science and
Technology, 10 (1), 97 – 110.
Shimokura, R., Matsui, T., Takaki, T., Nishimura, T.,
Yamanaka T., Hosoia, H. (2014). “Evaluation of
Speech Intelligibility in Short-Reverberant Sound
Fields.” Auris Nasus Larynx, 41(4), 343-349.
Tench, P. (2011). Transcribing the Sound of English.
Cambridge University Press.
Hodgson, M., Nosal, E. M. (2001). “Effect of noise
and occupancy on optimal reverberation times for
speech intelligibility in classrooms.” Journal of
Acoustical Society of America, 111(2), 931-939.
Kristiansen, J., Lund, S. P., Nielsen, P.M., Persson,
R., Shibuya, H. (2011). “Determinants of noise
annoyance in teachers from schools with different
classroom reverberation times”. Journal of
Environment Psychology, 31, 383-392.
Berardi, U. (2012). “A Double Synthetic Index to
Evaluate the Acoustics of Churches.” Archives of
Acoustics, 37(4), 521-528.