Acoustic Representation of BODO and RABHA Phonemes

9
1 @ 2012, IJCCN All Rights Reserved ABSTRACT In this paper we studied the spectral features of Bodo and Rabha Phonemes. The spectral features are studied using formant frequency and Cepstral coefficients. Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words we observed that significant variation of cepstral coefficients are observed among the Bodo vowels. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/.In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient is found to be maximum in case of male speakers with respect to vowel /u/ and minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. This observation may be helpful in sex determination for both Bodo and Rabha speakers.The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male-2.6654; Female- 1.0302) with respect to the Rabha vowels (Male-6.0750; Female-5.2419) i.e., the former is stable as compared to the latter. The investigation have shown that the range of formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. Keywords: Acoustic Representation, Phonemes, Cepstral Features 1. INTRODUCTION The Bodos and the Rabhas are the early ethnic and linguistic communities settled in the North-Eastern part of India. The Bodos belong to a larger group of ethnicity called the Bodo- Kachari. Racially, they belong to a Mongoloid stock of the Indo-Mongoloids or Indo-Tibetans. Mythologically, according to Dr. Suniti Kumar Chatterjee, a well-known historian, the Bodos are “the Offspring of son of Vishnu and mother Earth”, who are termed as Kiratas during the epic period. They are recognized as a plain tribe in the 6th schedule of the constitution of India. Historically, there are different views on the early migration of the Mongolian into the North-Eastern part of India. Some of them are: According to Grierson’s “The Linguistic Survey of India”, the Mongolian settled in old Assam, migrated from Hoang- Ho and Yangtze River banks and scattered and dwelt in different river banks of the state. The upper course of the Yangtz and Hoang-Ho in the North-West China were the original home of the Tibeto-Burman races. The hierarchy of Bodo community is shown in figure . Hierarchy of Bodo & Rabha Languages Acoustic Representation of BODO and RABHA Phonemes Jyotismita Talukdar 1 , Nabankur Pathak 2 1 Asia Institute of Technology, Bangkok, Thailand, E-mail:[email protected] 2 Gauhati University, India, E-mail:[email protected] ISSN No. Volume 1, No.1, July – August 2012 International Journal of Computing, Communications and Networking Available Online at http://warse.org/pdfs/ijccn01112012.pdf

description

Acoustic Representation of BODO and RABHA Phonemes

Transcript of Acoustic Representation of BODO and RABHA Phonemes

Page 1: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

1

@ 2012, IJCCN All Rights Reserved

ABSTRACT In this paper we studied the spectral features of Bodo and Rabha Phonemes. The spectral features are studied using formant frequency and Cepstral coefficients. Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words we observed that significant variation of cepstral coefficients are observed among the Bodo vowels. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/.In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient is found to be maximum in case of male speakers with respect to vowel /u/ and minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. This observation may be helpful in sex determination for both Bodo and Rabha speakers.The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male-2.6654; Female-1.0302) with respect to the Rabha vowels (Male-6.0750; Female-5.2419) i.e., the former is stable as compared to the latter. The investigation have shown that the range of

formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. Keywords: Acoustic Representation, Phonemes, Cepstral Features 1. INTRODUCTION The Bodos and the Rabhas are the early ethnic and linguistic communities settled in the North-Eastern part of India. The Bodos belong to a larger group of ethnicity called the Bodo-Kachari. Racially, they belong to a Mongoloid stock of the Indo-Mongoloids or Indo-Tibetans. Mythologically, according to Dr. Suniti Kumar Chatterjee, a well-known historian, the Bodos are “the Offspring of son of Vishnu and mother Earth”, who are termed as Kiratas during the epic period. They are recognized as a plain tribe in the 6th schedule of the constitution of India. Historically, there are different views on the early migration of the Mongolian into the North-Eastern part of India. Some of them are: According to Grierson’s “The Linguistic Survey of India”, the Mongolian settled in old Assam, migrated from Hoang-Ho and Yangtze River banks and scattered and dwelt in different river banks of the state. The upper course of the Yangtz and Hoang-Ho in the North-West China were the original home of the Tibeto-Burman races. The hierarchy of Bodo community is shown in figure .

Hierarchy of Bodo & Rabha Languages

Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar1, Nabankur Pathak2 1Asia Institute of Technology, Bangkok, Thailand, E-mail:[email protected]

2Gauhati University, India, E-mail:[email protected]

ISSN No. Volume 1, No.1, July – August 2012

International Journal of Computing, Communications and Networking Available Online at http://warse.org/pdfs/ijccn01112012.pdf

Page 2: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

2

@ 2012, IJCCN All Rights Reserved

Speech Data Collection for Acoustic Representation Typically, the spoken language data can be classified based on Mode of speech Medium of recoding Language Dialects Environment In the present study, speech data is collected from the native speakers of Bodo and Rabha language who are fluent in speaking and writing the language. Male and female speaker of age between 15 to 30 years, possessing a pleasant and a good voice quality are chosen to record the data. The recording is done one-by one manner. The speakers were instructed to read each word or sentence naturally, without emotions and expression. They were asked to speak clearly and to keep their normal speaking rate and volume. To keep the recording consistent, both in phonetic and prosodic (within the framework of symbolic Prosody) terms, an expert in acoustic phonetics supervised the recording. The average duration of recording session was about 4 hours (3 recording session) for each speaker (Male & Female). We have recorded the following data sets for analysis of the cepstral coefficients of vowel phonemes and formant frequencies of some selected Bodo and Rabha words. Bodo and Rabha vowel phonemes for cepstral analysis. Selected word sets of V, CV, VC and CVC structure in

both languages for formant analysis. The recording is done in audio editing software Cool Edit Pro and the analysis was done in MATLAB 7.1. Each digitized voice uttered, is divided or blocked into 50 frames of duration 20 millisecond (ms). Every frame contains 441 samples and for each frame 20 cepstral coefficients have been calculated. The spectral characteristics of six Bodo and Rabha vowels, corresponding to male and female speakers were investigated. Approximately 12 samples were averaged to obtain one coefficient. Firstly, 10th frame of all utterances of male and female speakers have been considered for analysis. The variation of the cepstral coefficients for the Bodo and Rabha vowels corresponding to the selected speakers have been shown in Table-(1) & Table-(2) and depicted in Figures-(3 & 4) and Figures-(6 & 7). However, from continuous frame wise analysis, it is observed that: 2, 4, 6, and 8 frames for Bodo speaker (Figure-5) and 9, 14, 16 and 17 frames for Rabha speaker (Figure-8) have shown distinct variation of the cepstral coefficients for male and female speakers.

2. LPC ANALYSIS

Linear prediction is a method for signal source modelling dominant in speech signal processing and having wide application in other areas. Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques. The glottis (the space between the vocal cords) produces the sound, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat, the mouth and the nasal cavity) forms the tube, which is characterized by its resonance frequencies, which are called formants. The basic problem of the LPC system is to determine the formants from the speech signal. The solution of this problem is a difference equation, which expresses each sample of the signal as a linear combination of previous samples. Such an equation is called a linear predictor i.e. Linear Predictive Coding. The coefficients of the difference equation (the prediction coefficients) characterize the formants. Therefore, the LPC system needs to estimate these coefficients. The estimation is made by minimizing the mean square error between the predicted signal and the actual signal. The basic idea behind the LPC model is that a given speech

sample at time n, can be approximated as a linear combination of the past p speech samples (Rabiner & Juang, 1993) such that (1)

(1)

Where the coefficients are naaa ,..., 21 assumed to be constants over the speech analysis frame. The equation (1) can be converted to an equality by including an excitation term Gu(n),

(2)

Where normalized excitation and G is the gain of excitation. Expressing equation (2) in Z domain we get the relation:

(3) Leading to the transfer function:

(4) based on our knowledge that the actual excitation function for speech is essentially either voiced speech sounds or an unvoiced sound.

Page 3: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

3

@ 2012, IJCCN All Rights Reserved

The relation between and is defined (based on the speech production model Figure-1.1)

(5)

We consider the linear combination of past speech samples

as the estimate , defined as,

(6)

The predictor error, , is defined as ,

(7) And the error transfer function is,

=1- (8) The basic problem of linear prediction analysis is to determine the set of predictor coefficient , directly from the speech signal so that the speech properties of the digital filter match those of the speech waveform within the analysis window. To set up the equations that must be solved to determine the predictor coefficients, we define the short-term speech and error segments at time n as,

(9) (10)

and tried to minimize the mean square error signal at time n,

(11) Using equation (9) & (10) we can write

(12) To solve the equation (4.12) we put

(13) giving

(14)

This term

mnn kmsims )()(

are related to the short term covariance of sn(m) i.e., (15) Which can be expressed in compact notation as,

(16) Which describe a set of p equations. It is readily shown that

the minimum mean-square error, , can be expressed as :

(17) thus the minimum mean-squared error consists of a fixed term and is depend on the predictor coefficients. To solve Equation (16) for the optimum coefficients ,

we have to compute , for and

, and then solve the resulting set of p simultaneous equations. A method to solve these equations and compute the coefficients is the autocorrelation method. The LPC-Cepstral Co-efficient In the present study, LPC-based cepstral coefficients and phonetically important parameters are used as feature vectors. Cepstral weighted feature vector is obtained for each frame by block processing of continuous speech signals. The analog speech waveform is then sampled and quantized analog-to-digital converter. To spectrally flatten the signal, the speech signal has been subjected to the pre-emphasis procedure through a first order digital filter whose transfer function has been given by

, with (19) Consecutive speech signal are taken as a single frame. To reduce the undesired effect of Gibbs phenomenon, the frames are multiplied by a windows function (Hamming window), which is given by (Proakis, & Manolakis, 2004;Talukdar , P.H, 2010) Where N is the number of sample in a block. Now, each frame of the windowed signal is next auto correlated to give (20)

Page 4: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

4

@ 2012, IJCCN All Rights Reserved

m=0, 1, 2…p

Where the highest auto correlated value is the order of the LPC analysis.

a. LPC Parameter Conversion to Cepstral Coeffecients

The LPC cepstral coefficients, which are a set of values that have been found to be more robust, reliable feature set for speech recognition than the LPC coefficients. These coefficients are obtained recursively as follows.

Where is the gain term in the LPC model.

(21)

(22) Equation (4.30) shows the computation of cepstral coefficients C p+1, C p+2…C p.

Generally, is taken for cepstral representation.

Table 1: Range of variation of the cepstral coefficients corresponding to the male and female Bodo speaker

Cepstral Coefficients Male Female

Vowel Max. Min. Range of variation Max Min Range of variation /o/ 2.2237 -1.5940 3.8177 1.9492 -0.0086 1.9578 /a/ 1.6260 -0.9615 2.5875 0.9492 -0.0641 1.0133 /i/ 1.1528 -0.1253 1.2781 0.9059 -0.0217 0.9276 /e/ 1.2355 -0.6532 1.8887 0.9847 -0.0578 1.0425 /u/ 1.0922 -0.0601 1.1523 1.1385 0.0690 1.2075 /w/ 1.1832 -0.1541 1.3373 1.1843 -0.1674 1.3517

Figure 1. Cepstral characteristics of Bodo vowels for male speaker

Figure 2. Cepstral characteristics of Bodo vowels for female speaker

0 5 10-20

-10

0

10

0 5 10-20

-10

0

10

0 5 10-1

0

1

2

Am

plitu

de(d

B)

0 5 10-2

0

2

4

0 5 10-1

0

1

2

Cepstral Coefficient0 5 10

-1

0

1

2

/o/ /a/

/i//e/

/u/ /w/

0 5 10-2

0

2

4

0 5 10-2

0

2

4

0 5 10-1

0

1

2

Am

plitu

de(d

B)

0 5 10-2

0

2

4

0 5 10-2

0

2

4

Cepstral Coefficient0 5 10

-2

0

2

4

/o/ /a/

/i/ /e/

/u/ /w/

Page 5: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

5

@ 2012, IJCCN All Rights Reserved

Figure 3. Distinction between Bodo Male & Female speaker in frame no 2,4,16 & 8

Table 2: Range of variation of the Cepstral coefficients corresponding to the Male and Female Bodo speaker

Cepstral Coefficients Male Female

vowel Max. Min. Range of variation Max. Min. Range of variation /o/ 1.0057 -1.0522 2.0579 3.9045 -3.7501 7.6546 /a/ 1.4964 -1.8083 3.3047 2.0135 -2.0784 4.0919 /i/ 1.4086 -1.8085 3.2171 1.9864 -1.9832 3.9696 /e/ 2.1054 -2.2054 4.3108 0.9164 -1.4963 2.4127 /u/ 3.4942 -4.6387 8.1329 1.0839 -1.6952 2.7791 /w/ 2.4834 -1.0627 3.5461 1.7201 -0.8801 2.6002

Figure 4. Distinction between the male and female Rabha speaker in frame no: 9,14,16 & 17 Figure 5. Cepstral characteristics of Rabha vowels for

male speaker Figure 6. Cepstral characteristics of Rabha vowels

for female speaker.

b. Formant Estimation of BODO and RABHA Phonemes

Formant frequency is the distinguishing frequency components of human speech. It refers to specific resonance frequencies of vocal tract which have maximum energy concentration during the vowels utterance. It can be

qualitatively distinguished by the frequency component of the vowel. Generally, three formants frequencies (F1, F2 and F3) are considered for perception and discrimination of vowels by a listener (Kewley, 1982, 1983). A variety of approaches, such as formant tracking articulator model and auditory model have been used for the analysis and synthesis of speech. The formant tracking method, based on Linear

0 2 4 6 8 10 12 14 16 18 20-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Cepstral Coefficients

LogM

agni

tude

(dB)

Frame no.:2

MaleFemale

0 2 4 6 8 10 12 14 16 18 20-1.5

-1

-0.5

0

0.5

1

1.5

2

Cepstral Coefficients

LogM

agni

tude

(dB)

Frame no.:4

MaleFemale

0 2 4 6 8 10 12 14 16 18 20-2

-1

0

1

2

3

4

5

6

7

8

Cepstral Coefficients

LogM

agni

tude

(dB)

Frame no.:16

MaleFemale

0 2 4 6 8 10 12 14 16 18 20-1

-0.5

0

0.5

1

1.5

Cepstral Coefficients

LogM

agni

tude

(dB)

Frame no.:6

MaleFemale

0 5 10-5

0

5

0 5 10-5

0

5

0 5 10-2

0

2

Am

plitu

de(d

B)

0 5 10-2

0

2

0 5 10-1

0

1

2

Cepstral Coefficient0 5 10

-5

0

5

/o/

/i/

/u/

/a/

/e/

/w/

0 5 10-5

0

5

0 5 10-5

0

5

0 5 10

-2

0

2

0 5 10-5

0

5

0 5 10-1

0

1

2

Cepstral coefficient

Am

plitu

de(d

B)

0 5 10-2

0

2

/o/ /a/

/i//e/

/u/ /w/

Page 6: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

6

@ 2012, IJCCN All Rights Reserved

Predictive Code (LPC), has received considerable attention. Based on digitalized technique, the entire frequency range is divided into a fixed number of segment and each segment is represents a formant frequency. A 2nd order resonator for each segment k with a specific boundary is defined. A predictor polynomial defined as the Fourier transform of the corresponding 2nd order predictor is given by (Welling. I, and Ney, II, 1998):

(23)

Where and are real valued predictor coefficients. Therefore, from equation (23) we get

(24)

= (25)

The parameter , determines the bandwidth of the

resonator defined as negative (-) of . .The formant frequency is given by, (26)

Table 3: Formant Frequencies Estimation of BODO WordsFormant frequency

Vowe /o/ /a/ /i/ /e/ /u/ /w/ Female F1 319.1 380.3 411.3 387.5 249.6 292.7

F2 833.3 1194.5 2409.8 2240.8 997.7 1527.2 F3 3030.4 3650.4 2911.8 3165.0 3044.3 3165.3

Male F1 309.3 343.8 394.6 384.9 244.7 206.4 F2 764.0 1172.0 2341.6 2178.1 837.5 1147.1 F3 2748.8 2494.5 3002.4 3577.1 3690.6 2486.9

VC /or/(fire) / /(I)

/ich/(pain) /un/(back side) /ul/(confuse) /em/(bed)

Female F1 326.4 326.1 293.3 300.5 347.2 311.4 F2 1623.4 1717.5 2371.3 1424.7 2353.1 2452.7 F3 3023,8 3006.2 3455.9 3276.9 2853.5 2765.3

Male F1 539.1 714.0 299.3 280.2 442.8 398.5 F2 2293.9 2365.5 2932.2 2240.0 2544.9 1265.7 F3 3242.6 3199.6 3189.1 2636.7 3350.7 2435.8

CV /hw/(to give) /bu/(to swell) /ru/(to boil) / /(to beat)

/be/(this) /gi/(to fear)

Female F1 320.7 382.1 311.1 337.6 354.6 334.8 F2 1687.9 1661.1 1623.5 1853.7 1699.5 1617.9 F3 3120.24 3077.1 3445.5 2996.8 3001.65 2947.7

Male F1 494.4 690.1 633.0 375.5 283.4 393.0 F2 2109.8 2545.8 2386.2 2536.2 2250.1 2223.5 F3 3216.3 3355.9 3298.9 2842.9 3220.0 3287.7

CVC /san/(the sun) /swb/(smoke) /bar/(wind) /lir/(to write) /dwn/(to keep) /thar/ (sure) Female F1 285.5 282.5 298.5 304.7 352.6 276.1

F2 1800.6 1966.4 2657.89 2354.87 1471.0 2491.2 F3 3286.8 3135.6 3024.78 3254.67 3163.2 3155.5

Male F1 838.3 727.5 892.2 745.3 300.7 415.6 F2 1494.4 1421.3 1356.9 1293.2 1238.3 1629.4 F3 3546.54 3265.67 3198.00 3354.52 3648.01 3674.98

Page 7: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

7

@ 2012, IJCCN All Rights Reserved

200 250 300 350 400 450 5000

500

1000

1500

2000

2500

F1 (Hz)

F2 (H

z)

Range of Formant Frequency

FVMV

CV

VC

CVC

MV-Male,vowelFV-Female,vowel

Formant Frequencies estimation of 6 Bodo vowels for female utterances

Formant Frequencies estimation of 6 Bodo vowels for male utterances

F1-F2 plot shows the vowel triangle for male and female speaker of Bodo language.

Formant frequency curves shows the distinction of formant variation for V,VC,CV & CVC word structure.

F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Bodo language mostly lies within the range of the formant frequencies of the vowels.

Table 4: Formant frequency Formant frequency Vowel /o/ /a/ /i/ /e/ /u/ /w/ Female F1 640.3 283.5 280.4 1040.8 480.9 340.5

F2 2560.4 1480.3 2560.8 1384.4 2360.1 1080.2 F3 3220.8 3600.2 2200.6 3151.8 3211.4 2720.4

Male F1 620.2 243.8 301.3 987.4 504.5 253.9 F2 2154.7 1654.4 2251.8 2657.9 2857.9 2415.7 F3 2876.1 2865,8 3985.8 3758.4 3415.8 2965.8

VC /ora /(you are) / /(I am)

/intcek/(this much) /ek/(to jump) /ut/(camal) /r /(length)

Female F1 543.7 375.3 275.7 765.2 653.9 392.8 F2 1748.9 1682.9 2769.4 1765.6 2015.9 2438.3 F3 3823.5 30165.5 3321.9 3546.9 2976.9 2657.9

Male F1 643.3 987.5 276.9 321.9 431.9 400.3 F2 2396.9 2401.6 3001.8 2394.9 2656,7 1834.9 F3 3242.6 3099.0 3548.2 2987.3 3241.9 2865.9

CV /to/(hen) /tsa/(to eat) /mi/(vegetable) /the/(fruit) //tcu/(thorn) /a /(shout)

Female F1 465,9 3428 354.9 698.8 387.03 565.7 F2 1874.5 2463.9 1987.7 1976.4 1687.5 176.5

0 1000 2000 3000 4000-50

0

50

0 1000 2000 3000 4000-50

0

50

0 1000 2000 3000 4000-50

0

50Am

plitu

de(dB

)

0 1000 2000 3000 4000-40

-20

0

20

0 1000 2000 3000 4000-40

-20

0

20

Frequency(Hz)0 1000 2000 3000 4000

-50

0

50

/o/

/a/

/i/

/e/

/u/

/w/

0 1000 2000 3000 4000-50

0

50

0 1000 2000 3000 4000-20

0

20

40

0 1000 2000 3000 4000-50

0

50

Am

plitu

de(d

B)

0 1000 2000 3000 4000-20

0

20

40

0 1000 2000 3000 4000-50

0

50

Frequency(Hz)0 1000 2000 3000 4000

-20

0

20

40

/o/

/a/

/i/

/e/

/u/

/w/

200 250 300 350 400 450 500600

800

1000

1200

1400

1600

1800

2000

2200

2400

2600

F1 (Hz)

F2 (H

z)

F1-F2 for male & female vowel tringle

/u/

/i/

/a/

Red-MaleBlue-Female

0 500 1000 1500 2000 2500 3000 3500 4000-30

-20

-10

0

10

20

30

40

Frequency (Hz)Gain (dB)

Change of fromant with v/vc/cv/cvc

V

VC

CV

CVC

Page 8: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

8

@ 2012, IJCCN All Rights Reserved

F3 2976.4 2981.6 3768,9 2885.6 3415.6 2986.3 Male F1 498.3 690.3 541,7 367.2 298.5 391.2

F2 2183.5 2574,8 3286.1 2653.0 2261.3 2695.3 F3 3216.3 3582.7 3321.8 2976.2 3139.7 3271.6

CVC /tcok/(compound) /na (You are)

/rin)(loan) /ben/(where)

/tbau/(owl) /tsara/(disease)

Female F1 276.4 265.8 301.6 312.4 299.7 261.6 F2 1867.5 20001.8 2782.5 2323.3 1976.4 2434.1 F3 3341.7 3875.4 3054.6 3198.6 2988.3 3145.1

Male F1 845.2 698.4 875.2 684.9 301.5 500.8 F2 1476.7 1501.0 1401,9 1354.8 1222.6 1687.8 F3 3498.6 3315.8 3176.0 3299.7 3571.8 3679.1

Formant Frequencies estimation of 6 Rabha Vowels for female utterances

Formant Frequencies estimation of 6 Rabha vowels for male utterances

F1-F2 plot shows the vowel triangle for male and female speaker of Rabha language

F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Rabha language mostly lies within the range of the formant frequencies of the vowels.

3.RESULTS AND DISCUSSION

Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words the following observations were made-Significant variation of cepstral coefficients are observed among the Bodo vowels as shown in Table-1. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/. In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient (Table-2) is found to be maximum in case of male speakers with respect to vowel /u/ and

minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. Significantly, cepstral coefficients of Bodo vowels for frame nos: 2, 3, 6 & 8 have shown distinctive characteristic (Figure-4) for male and female speaker. The variation of the cepstral coefficients for male is very irregular in contrast to the stable variation of female cepstral coefficients. The same phenomenon is also observed in case of Rabha vowels also, but in this case the frame numbers are different i.e. frame no: 9, 14, 16 and 17 (Figure-7). This observation may be helpful in sex determination for both Bodo and Rabha speakers. The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found

0 1000 2000 3000 4000-40

-20

0

20

0 1000 2000 3000 4000-20

0

20

40

0 1000 2000 3000 4000-20

0

20

Am

plitu

de(dB)

0 1000 2000 3000 4000-20

0

20

40

0 1000 2000 3000 4000-20

0

20

40

Frequency(Hz)0 1000 2000 3000 4000

-20

0

20

40

0 1000 2000 3000 4000-20

0

20

0 1000 2000 3000 4000-40

-20

0

20

0 1000 2000 3000 4000-40

-20

0

20

Am

plitu

de(d

B)

0 1000 2000 3000 4000-20

0

20

0 1000 2000 3000 4000-20

0

20

Frequency(Hz)0 1000 2000 3000 4000

-20

0

20

/o/

/a/

/i/

/e/

/u/

/w/

0 500 1000 1500500

1000

1500

2000

2500

3000

F1 (Hz)

F2(H

z)

Vowel tringle for male & female speaker

/u/

/i/

/a/

Red-MaleBlue-Female

0 500 1000 15001000

1200

1400

1600

1800

2000

2200

2400

2600

2800

3000

F1(Hz)

F2(H

z)

Range of Formant frequency

MV

FV

CV

VC

CVC

MV-Male,vowelFV-Female,Vowel

Page 9: Acoustic Representation of BODO and RABHA Phonemes

Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

9

@ 2012, IJCCN All Rights Reserved

1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male-2.6654; Female-1.0302)with respect to the Rabha vowels(Male-6.0750;Female-5.2419) i.e., the former is stable as compared to the latter. The Figure 10 and Figure 15 represent the extremes of formant locations in the F1-F2 plane for both Bodo and Rabha vowels. It is found that the formant locations for /u/ (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low F2) with other vowels are placed with respect to the triangle vertices. The Figure 12 and Figure 16 have shown that the formant frequencies of the selected word sets for both Bodo and Rabha lies within the range of the formant frequencies of the isolated vowels. The investigation have shown that (Table-3 & 4) the range of formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. ACKNOWLEDGEMENT We highly acknowledge the Ministry of Communication & Information Technology (MIT), New Delhi, Govt. of India, for providing us the relevant information while preparing the manuscript of this paper. REFERENCES 1. Rabiner, L.R and B. H. Juang. Fundamentals of

Speech Recognition, Prentice-Hall, Englewood Cliff, New Jersy, 1993.

2. A.M. Noll. Spectrum Pitch Determination, J. Acoustic Society. A.M. Vol.41. pp.293-309, Feb.1967

3. Borz. Porat. A course in digital Signal Processing, John Willy & Sons. 1997.

4. Proakis, J.G. and Manolakis, D.G. Digital Signal

Processing Principles, Algorithm and Applications, Pearson edition, Third Indian reprint 2004.

5. Kewley-Port, D. Measurement of formant transitions in naturally produced stop consonant-vowel syllables, Journal of the Acoustical Society of America, 72, pp. 379-389, 1982.

6. Kewley-Port, D. Time-varying features as correlates

of place of articulation of stop consonants, Journal of the Acoustical Society of America, 73, pp. 322-335, 1983.

7. Willing I., and Ney, II. Formant Estimation for

Speech Recognition, IEEE Transactions on Speech and Audio Processing, Vol 6. pp.-36-48,1998.

8. Talukdar P.H; 2010. Speech production, Analysis and

Coding, Lambert Publication, Germany 2010.