Mathew 1
Transcript of Mathew 1
![Page 1: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/1.jpg)
7:46 PM me: Hello Sir !
52 minutes
8:38 PM mathew.magimaidoss: hello Shweta
are you online?
me: Hello
yes
8:39 PM mathew.magimaidoss: sorry i did nt see your last message
me: That's ok
mathew.magimaidoss: is it ok to start the call
![Page 2: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/2.jpg)
me: ya
12 minutes
8:52 PM mathew.magimaidoss: Speech ----> DTW -----> Recognition score
Video -------> Matching (correlation) ----> Recognition score
8:53 PM combine speech score and video/visual score
8:54 PM speech score denote it as A
visual score denote it as B
A + B
w1 . A + w2. B
![Page 3: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/3.jpg)
w1 + w2 = 1
8:57 PM http://publications.idiap.ch/downloads/reports/2000/rr00-35.pdf
9:01 PM cepstral coefficients,
mel frequence cepstral coefficients
9:03 PM 20-30 ms - frame size
frame-shift - 10 ms
10 ms shift
25 ms frame
9:04 PM HTK
5 minutes
![Page 4: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/4.jpg)
9:10 PM mathew.magimaidoss: Dynamic Time Warping
9:13 PM Sakoe, H. and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1) pp. 43- 49
9:14 PM http://publications.idiap.ch/downloads/papers/2007/aradilla-mlmi-2007.pdf
9:15 PM http://publications.idiap.ch/downloads/papers/2011/Soldo_ICASSP_2011.pdf
9:17 PM http://www.cstr.ed.ac.uk/research/projects/featureMLPs/
9:20 PM speech -> extract cepstral coefficients
9:21 PM close talking microphone
9:22 PM six
9:23 PM end point detection
9:25 PM for each frame compute the energy
9:26 PM N frames
N energies
![Page 5: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/5.jpg)
take frames 1 to 10 in the begining
and take last 10 frames
9:28 PM 00000010000011111111111111110001000000
median smoothing
9:30 PM two microphones
one close talking and one table top
9:35 PM speech -> cepstral features -> DTW -> score
speech -> cepstral features -> posterior features -> DTW
-> score
9:36 PM MFCC
why not implement PLP cepstral coefficients
9:38 PM HTK
![Page 6: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/6.jpg)
code
5 minutes
9:44 PM mathew.magimaidoss: for each word
10 different participants
5 male 5 female
20 different participants
10 male 10 female
9:45 PM 10 words
10 x 20
![Page 7: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/7.jpg)
200 utterances
9:46 PM 10 x 20 x 5
9:47 PM word_m01_t01
word_m01_t02
word_m02_t01
word_f01_t01
9:48 PM 20 X 20 x 5
9:49 PM age
native language
which region
9:51 PM English words
9:54 PM data base
number of words
![Page 8: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/8.jpg)
number of speakers
number of trials
9:55 PM type of microphone
gender balance
equal number of male and female
9:56 PM metadata: age, native language, languages they can speak, which region they are from
me: ya ok
mathew.magimaidoss: get a good microphone
for data collection
9:57 PM like senheiser microphone
me: Ok . The call got disconnected.
9:58 PM I'll get a good microphone
![Page 9: Mathew 1](https://reader036.fdocuments.us/reader036/viewer/2022083118/577cce771a28ab9e788e1350/html5/thumbnails/9.jpg)
9:59 PM mathew.magimaidoss: sampling frequency is based on the bandwidth
20 - 20 kHz
44.1 kHz