1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1...

27
1.1 Vocal Tract Data Capture and Analysis Module 1.1.1 Introduction In this section, we provide updated manual instructions for three modules that were developed for the purpose of capture, visualization and analysis of rare singing. One module was developed at UPMC (i-tongueFeatures) whereas (i-THRec and i-THAn) were developed at CNRS. Detailed description of the design and frame work of these modules can be found in D3.3 “Final report on ICH Capture and analysis" section 3.4”. 1.1.2 Rare singing data acqusition module Overview To meet the requirements of the i-Treasures project for the rare-singing use case, a data acquisition system has been developed (see D3.3 section 3.4.2). The acquisition platform chosen has been developed using the Real-Time, Multi-sensor, Advanced Prototyping Software (RTMaps®) commercialized by Intempora Corporation (2011). Software and driver requirements The following instruction is for PC with Windows 7 32bit platform. It is necessary to have the following software and drivers configured on your PC first before running the application. -Download and install RTMaps 1 4.1.8. A 30 day trial period is offered for free but you need to purchase a license afterwards. - Install Terason t3000 module driver 2 and SDK tool kit -Download and install the theimagingsource 3 driver for the camera. - Download and Install USB sound card driver for Presonus Audiobox 4 44vsl For more information about the Terason please see Appendix 2 : vocal tract acquisition module. Hardware requirments Sensors requirements and specifications are documented in D2.2 "First report on system specification" (section 2.3) and D2.4 “Final report on system specification” (section 4.1.5). 1 "www.intempora.com/downloads/rtmaps-html" 2 "www.terason.com/products/t3000.asp" 3 "www.theimagingsource.com/en_US/products/oem-cameras/usb- cmosmono/dmm22buc03ml" 4 "www.presonus.com/products/AudioBox-44VSL"

Transcript of 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1...

Page 1: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

1.1 Vocal Tract Data Capture and Analysis Module

1.1.1 Introduction

In this section, we provide updated manual instructions for three modules that were developed for the purpose of capture, visualization and analysis of rare singing. One module was developed at UPMC (i-tongueFeatures) whereas (i-THRec and i-THAn) were developed at CNRS. Detailed description of the design and frame work of these modules can be found in D3.3 “Final report on ICH Capture and analysis" section 3.4”.

1.1.2 Rare singing data acqusition module

Overview

To meet the requirements of the i-Treasures project for the rare-singing use case, a data acquisition system has been developed (see D3.3 section 3.4.2). The acquisition platform chosen has been developed using the Real-Time, Multi-sensor, Advanced Prototyping Software (RTMaps®) commercialized by Intempora Corporation (2011).

Software and driver requirements

The following instruction is for PC with Windows 7 32bit platform. It is necessary to have the following software and drivers configured on your PC first before running the application.

-Download and install RTMaps1 4.1.8. A 30 day trial period is offered for free but you

need to purchase a license afterwards.

- Install Terason t3000 module driver2 and SDK tool kit

-Download and install the theimagingsource3 driver for the camera.

- Download and Install USB sound card driver for Presonus Audiobox4 44vsl

For more information about the Terason please see Appendix 2 : vocal tract acquisition module.

Hardware requirments

Sensors requirements and specifications are documented in D2.2 "First report on system specification" (section 2.3) and D2.4 “Final report on system specification” (section 4.1.5).

1 "www.intempora.com/downloads/rtmaps-html"

2 "www.terason.com/products/t3000.asp"

3 "www.theimagingsource.com/en_US/products/oem-cameras/usb-

cmosmono/dmm22buc03ml"

4 "www.presonus.com/products/AudioBox-44VSL"

Page 2: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 2 of 27

1.1.2.3.1 Sensors list and specification

-Imaging Source Monochrome camera (DMK 22BUC03) (Figure 1)

-Electroglottograph EG2-PCX2 (Figure 2)

-Treason t3000 Ultrasound module (Figure 3)

-Custom designed 8MC4X Probe (Figure 4)

-Universal Breathing Belt (Figure 5)

-Piezoelectric Accelerometer (Figure 6)

- Professional standard microphone for singing (Figure 7)

-Four inputs sound card driver AudioBox-44VSL (Figure 8)

- Audio buddy power gain for the Piezoelectric Accelerometer (Figure 9)

Figure 1: DMK 22BUC03 image Sensor

Figure 2: EG2-PCX2 and electrodes

Figure 3: Terason Module

Page 3: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 3 of 27

Figure 4: 8MC4X Probe

Figure 5: universal breathing belt

Figure 6: Piezoelectric

Figure 7: Professional standard microphone for singing

Page 4: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 4 of 27

Figure 8: Four input sound card AudioBox-44VSL

Figure 9: Audio buddy power gain for the Piezoelectric Accelerometer

1.1.2.3.2 PC requirements

-CPU: Core i7 or i5 -Memory: 3 Gigabyte and above. -Hard disk: 250 GB SSD -Ports-: Powered fire-wire and 2 USB -Recommend PC: Apple Mac Book Pro with fire-wire port

Installations and Connections

1.1.2.4.1 Senor setup Instructions

The setup instructions consist of following steps which are listed below and detailed in Figure 10

-Install the 8MC4X Probe on the helmet as show in figure below

-Install the USB camera (DMM22BUC03-ML) with infrared LED such as facing the mouth to detect lip and tongue tip movements

-Attach the microphone to helmet as shown in Figure 10

Page 5: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 5 of 27

Figure 10: Multi-sensor Helmet 1) Adj. Headband, 2) Probe Height Adj. Strut, 3) Swiveling Probe Platform, 4) Lip Camera Proximity Adj., 5) Microphone.

- Place the piezoelectric accelerometer on the singer nose as shown in Figure 11.

- Place the EGG sensor around the singer neck.

- Place the belt sensor around the signer stomach.

Figure 12 shows a Byzantine singer with all sensors installed. Please refer for reference.

Figure 11: Piezo, EGG and Respiration Belt.

Page 6: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 6 of 27

Figure 12: Byzantine singer

1.1.2.4.2 Sensor Setup to PC

After installing all the sensors on the helmet and singer's body, the next step is to connect all the sensors cables to their modules and PC connections.

- Connect the 8MC4X Probe cable and fire-wire cable to the ultrasound Terason module as show in Figure 13.

Figure 13: 1) 8MC4X Probe cable, 2) fire-wire cable, 3) Terason module

-Connect the USB cable to slot at the backend of the camera (DMK 22BUC03)

1

2

3

Page 7: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 7 of 27

Figure 14: 1) USB slot for camera cable connection

- Connect the EGG cable sensor to the EGG module as show in Figure 15

-

Figure 15: 1) EGG cable sensor 2) EGG Output connection cable to sound card

-Connect all the four sensor cables (Microphone, belt, EGG and Piezo) to the front end of the AudioBox-44VSL sound card as shown in Figure 16

Figure 16: 1) Microphone input slot, 2) Belt input slot, 3) EGG input slot, 4) Piezo input slot

1 2

1 2 3 4

1

Page 8: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 8 of 27

-Connect the USB cable and the power cable on the back of the Audio box sound card

Figure 17: 1) Power cable, 2) USB cable connection to PC

- connect the Terason fire-wire cable ,sound card USB cable and camera USB cable to the PC ports as show in Figure 18

Figure 18: 1) Fire-wire port connection, 2) Sound card USB connection 3) USB camera connection

audio buddy power gain. this step is necessary before the running the data acquisition module.

1 2

1 2 3

Page 9: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 9 of 27

Page 10: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 10 of 27

1.1.3 i-THRec

Since configuring separated sensors and recording their outputs may be a complicated issue if managed individually, a common module was specifically designed by CNRS, named i-THRec (i-Treasures Helmet Recording software). It contains multiple Graphical User Interfaces (GUI) forms, each of them aimed at one of the following objectives:

• Creating directories to organize and store the newly acquired data into corresponding sub-folders;

• Writing new .xml lyric files that contains the sub-session paragraphs to be pronounced;

• Calibrating the sensors of the hyper-helmet and supervising their performances;

• Operating the recording session;

• Replaying an old recording session;

• Multi singers recording mode over network.

i-THRec is implemented in C++ and its graphical interface is developed on Visual Studio®. The latter programming techniques do not take into account the interaction between the sensors and the computer. The acquisition of the data communicated by the six sensors of the hyper-helmet is insured by using the Real-Time Multi-sensor Advanced Prototyping software (RTMaps®, Intempora Inc.) [12]. Details on the RTMaps-based data acquisition platform can be found in D3.1. Notice that even if it is possible, it is not required for users to manipulate the RTMaps diagram. If modifying the RTMaps diagram is an absolute need, please make a copy of the original diagram. A manual of RTMaps diagrams has been published in D3.2 in March 2014.

First steps

In order to run i-THRec, you may first need to correctly install the ultrasound probe and check it on Terason software, see the Appendix 2 : vocal tract acquisition module for detailed tasks. You also need to have an RTMaps software and license. After installing, pressing the .exe launches the software. The welcome window allows the user to create a singer’s record session or to Re-play a recorded session. The following manual describes the pre-recording, calibrating and recording tools and procedures.

i-THRec creates a hierarchy of folders for each performance, therefore it needs some extra information (Song type, Singer name, Recording place, main folder). A recording session starts by displaying a window of fields filled with default info (Figure 19).

The user can modify these fields. I.e. The address of the created folder goes as following: MainFolder\SongType\SingerName\RecordingPlace\RecordingDate

Page 11: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 11 of 27

Figure 19: Session start window displayed when creating a new session

A recording session should be prepared in advance. i-THRec proposes a tool to help creating a lyrics corpus in advance in order to facilitate the lyrics display during the recording session and the phonetic transcription if needed afterwards. This tool (Figure 20) is accessible by the Menu tab of the session start window. The output of this window is an .xml file that carries multiple songs to be performed in the recording session. Users can add as many songs as they need by clicking on the tab (+) button, filling the song title and text fields and finally naming and saving the .xml file.

Figure 20: Tool to create lyrics files. An .xml file will be generated when clicking the save button.

Page 12: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 12 of 27

Sensors calibration

The next step is the sensors calibration. i-THRec will open a special window (Figure 21) – backed up with RTMaps functionalities – to modify the sensors acquisition settings. It all starts by establishing the contact with the hidden RTMaps diagram by clicking the “Initialize RTMaps” check button on the right upper corner of the screen. When the connection is established (the check mark appears), users can access the instruments parameters in the corresponding sensor tab at the right corner of the screen. As a reminder, it concerns the settings of the 6 hyper-helmet sensors: the lips camera (choice of the device, picture rotation, modifying and monitoring the acquisition rate), the ultrasonic imaging probe (flipping the image, modifying and monitoring the acquisition rate, save an image snap), and the four signals acquired by the audio-box - the acoustic microphone, the nasal accelerometer (piezo-instrument), the EGG glottal device and the respiratory belt. Please note that the latter are recorded as 2 stereo signals. The settings are then common for the four signals.

The green “Start Acquisition” button opens two picture-boxes to display the camera and the ultrasound probe acquired images. Three oscilloscopes also appear which they display signals acquired from the audio-box signals. It is the user’s duty to check the rate of the images, although the system helps him by generating a beep sound if the actual rate is not the defined one. The user should also check if any of the audio-box acquired signals is saturated (wave amplitude higher than 1 unit) or is too small (we require the user to keep enough divisions for one unit value on oscilloscope screens). A diagnostic area is also displayed to check if any data has been rejected, it can be destroyed sound buffers or rejected images caused by a leak of speed in the computer’s performance, or an unavailable sensor or connectivity.

Hereby, users are ready to start the recording. To do so, they need to create the recording session from the menu button. The chosen settings and parameters are then saved in an .xml file and will be later read in the recording session.

Figure 21: Calibration session window

Recording standalone mode

The recording session window shown in Figure 22 looks similar to the calibration session with some differences. The acquisition parameters are now fixed, the user

Page 13: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 13 of 27

cannot anymore modify these parameters. He therefore should go back to the calibration session to do so (Menu -> back to calibration session).

The previously prepared lyrics evoked in 1.1.3.2 can be transmitted in the upper area of the window. To load the proper corpus, go to the Lyrics menu and choose the wanted .xml file. When done, the fields will be filled with the previously prepared information. Choose the song to be projected and the lyrics will show in the white textbox. Some song or texts can be produced in several ways, i.e. seria and ballu in Cantu-a-Tenore or spoken, whispered and sung for custom texts. At the end of the recording, i-THRec will generate a .textgrid file holding these lyrics, ready to be used in Praat software for polyphonic transcription.

Just like the calibration session, the check button “Initialize RTMaps” establishes the connection with the RTMaps diagram and the green “Start Acquisition” button acquires and displays the data. The actual recording and data saving will not start before clicking on the “Rec” button. Hereby, i-THRec creates the folder MainFolder\SongType\SingerName\RecordingPlace\RecordingDate\SongName\PerformanceType\.

If more than one repetition of a performance are needed, i-THRec creates a new folder for each repetition with a _i extension, where i is the repetition iteration number. The recorded data of the second repetition of the performance will then be found in the folder:

MainFolder\SongType\SingerName\RecordingPlace\RecordingDate\SongName\PerformanceType _2\.

Find more info about the recorded data in Appendix 2.2: Exploring the recorded folder. Anyhow, we recommend using i-THAn for data exploring and analyzing since it forms a full package usage along with i-THRec.

Figure 22: Recording session window

Recording network mode

i-Treasures singing cases includes polyphonic singing styles as Cantu-in-Paghjella (at least 3 singers) and Canto-a-Tenore (at least 4 singers). Recording such songs requires a system that records synchronously multiple singers’ performances. In

Page 14: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 14 of 27

addition, in order to analyze separately each singer’s audio signal, singers should be acoustically isolated. Therefore, even when isolated, singers need to hear each-other and even see each other faces (feedback) for better coordination. i-THRec propose then a network solution to solve these three issues: connect the computer systems via a network hub, the singers being separated in different rooms (Figure 23). One i-THRec user will configure his session as a master (he can do so when creating the recording session: menu->new recording session->as master in the calibration session), he need then to create IP addresses for the other connected users. The rest of the user sessions will be his clients (menu-> new recording session -> as client). Therefore, starting the acquisition and the recording from the master system will be instantaneously implemented on the entire system. The audio-visual feedback is ensured thanks to third party software TrueConf [13]. Unfortunately, the latter appears to add a little latency that can be annoying while singing.

Figure 23: Polyphonic singing performance recording for the Canto a Tenore use case (4 singers: Boghe, Bassu, Mezzo, and Contra) using the i-THRec network version and 3

hyper-helmets. Singers are in different rooms but can hear and see each other.

1.1.4 i-THAn

i-THAn (i-Treasures Hyper-Helmet Analysing tool) is a multimedia (waveform, camera, US, ect) display and analysis tool that efficiently displays and analyzes the data streams captured by the multi-sensor hyper-helmet. i-THAN provides a comprehensive set of capabilities to display, check, measure and analyze synchronized data streams and to create measurement reports and extract specific data.

The acquisition system of the multi-sensor hyper-helmet is developed using a Real Time, Multisensor, Advanced Prototyping Software (RTMaps) developed by Intempora, and an upper level MSDN C++ LPP-developed tool called i-THRec (i-Treasures Hyper-Helmet Recording software). The developed system has the ability of recording and displaying data in real time. However, the recorded data are only readable using RTMaps software since the data format is specific to it. In order to overcome the limitation of viewing the data only on the computer were RTMaps is installed, a MatLab graphical user interface has been developed allowing viewing, checking and analyzing the signals.

Run i-THAn and open data sets

MezzoBassu

Boghe

i-THRec1

Hyper-helmet 1

Hyper-helmet 3

i-THRec2

i-THRec3

Network switch + cables

Hyper-helmet 2

i-

THRec

4

Contra

MezzoBassu

Boghe

i-THRec1

Hyper-helmet 1

Hyper-helmet 3

i-THRec2

i-THRec3

Network switch + cables

Hyper-helmet 2

i-

THRec

4

Contra

Page 15: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 15 of 27

To run i-THAn, MatLab software is needed. To start, we have to write "i_THAn" in the MatLab command window (Figure 24).

Figure 24: Run I-THAN using MatLab

Our data set include the recording of 4 analog signals (microphone, electroglottograph, breathing belt, and accelerometer) and also 2 video streams (lips video and ultrasound video of the tongue).

i-THAn is optimized to read data out from RTMaps : wav files for analog signal and raw files for video data (ultrasound, and video). But other formats can be managed such as jsq format for video data. Once i-THAn is loaded, click on the button [OPEN *.Rec File] and select a .rec file (i.e. a recording file generated by RTMaps) (Figure 25).

Figure 25: Open a dataset using I-THAN

Page 16: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 16 of 27

Figure 26: Global view after data loading

After all the data are loaded, i-THAn will read and plot all the data, making some statistics about the frame rate of each channel and checking for potential frame rate problems. (cf. 1.1.4.2 for more details).

Open annotation file

In order to identify the shown signals, one option allows including a textgrid file. A textgrid file is an annotation file produced with the software Praat in order to identify phonemes in wav file.

Figure 27: Open Textgrid file

Click on the button [Open TextGrid] (marked (1) on Figure 27) and select the .textgrid file corresponding to the data plotted. By default the search path is the same that the .rec file opened before. Phonemes (marked (2)) and boundaries appear on the audio

Page 17: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 17 of 27

window. (N.B. You should zoom to see the phonemes). The pop-up menu (marked (3) on Figure 27) allows to switch between tiers of the TextGrid. If you need to modify the phonemes or the boundaries, i-THAn makes it less time consuming: By clicking on “Modify in Praat” (marked (4)), MatLab executes praat.exe (included in the folder for windows user. For Mac users, users should already have installed praat and the executable sendpraat – see [sendpraat readme]), loads the TextGrid file and the audio file and then open them on Praat’s interface. Once finished, you can save it and load it again in i-THAn. You may also need to remove the phonemes from appearing on the audio window, then you need to click the ‘remove transcription’ button (marked (5)). If the lyrics disappear from the window or should be repositioned, reclicking on the pop-up menu (3) may be helpfull

[sendpraat readme] : Copy the file sendpraat on the desktop then copy these following lines on the terminal :

chmod +x ~/Desktop/sendpraat

sudo mv ~/Desktop/sendpraat /usr/local/bin

Depending on the mac os version, the second line may be:

sudo mv ~/Desktop/sendpraat /usr/bin

Checking recording rates

In i-THAn an automatic process to check the recording rate of each data flux (ultrasound, 2 stereo analog signals, video) is available and some warning or errors can be mentioned in the panel “Frame Rate checking”.

Page 18: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 18 of 27

Figure 28: Information and control panels corresponding to the right side of the graphical user interface

1.1.4.2.1 Warning and errors

The panel marked (1) on Figure 28 shows the mean frame rate of each flux and the standard deviation. The warnings are plotted in panel marked (2) and errors in (3). The warning occurs when a frame rate (normally 60 fps for video) goes down to 45 fps and an error when it goes down to 30 fps.

Warnings or errors do not necessary mean that data have been lost but only that the computer takes a lot of time to read it. For example in this case (presented on the Figure 29) where the ultrasound stream produces error because two frames are too closed (marked by a circle) it is possible in the future to move one of the both timestamp..

(2)

(1)

(3)

(4)

(5)

Page 19: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 19 of 27

The triangles plotted on the axes just below the images correspond to the timestamps (i.e. the acquisition time) for ultrasound stream and lips video stream in red and black respectively (cf. Figure 29).

Figure 29: Example of potential error

1.1.4.2.2 Plot frame rate

You can view the frame rate of each input by clicking on the button [View Frame Rates Graphs], marked (3) on Figure 28. After clicking, a figure appears summarizing the stream rate. This plot corresponds to the inverse of the timestamp difference. The warning and error lines are only informative and allow to view where the errors or warnings mentioned in the panel (2-3) are located.

Figure 30: Frame rate figure

Page 20: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 20 of 27

Play the audio, piezo or EGG sounds

By using the buttons included in the panel “Play Audio”, it is possible to read the audio signal plotted on the screen. You can also stop or pause the playing (marked (1) on Figure 31). Panel (2) makes it possible to listen to the piezo, EGG or breath signals as well.

You can zoom in and out using the button on the top of the program. By clicking on the axes you can view the corresponding images on the top of the interface. You can also go forward and backward by 10 ms (marked (4) in Figure 31) by clicking on the arrows on the panel (marked (3) on Figure 31).

Figure 31: Playing and exploring panel at the right side of the GUI

Rotate images

Some recorded images may be flipped up/down or rotated. The user can re-rotate these images. First the image to be modified should be specified by clicking on the correspondent check box (1). The button (2) allows the user to horizontally flip the image and (3) allows him to rotate it in 180°. The modification won’t be displayed before clicking on the play button (4).

Figure 32: Flipping and rotating the images

Zoom and Extract displayed data

Two icons at the upper left of the screen (1) allows the user to zoom in or out by drawing a rectangle in the audio fields or just by clicking in these fields. Acoustic data, EGG, piezo and breath signals along with Lips and Ultrasound images follow the zoom. The phonemes if displayed also adjust to the boundaries changes.

The displayed data can be then extracted in separated files if needed. To make precise what data to extract, check boxes (2) are available in the sensors correspondent fields. The TextGrid can also be extracted by checking its check box (3). The “Extract selected” button (4) creates a folder in the main folder destination (5) and extract the chosen data. Lips and UltraSound data can be extracted as images or as .avi files, both options are available in the list box (6).

Page 21: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 21 of 27

The user can also choose to display the extracted textgrid and audio in praat by checking the box (7). When the extraction is done, a notification appears in the info field at the right side of the window.

Figure 33: Zoom and extract data

Analysis

On the signal plotted in the interface, some analyses can be done by clicking on the button [Current view] (marked (5) on Figure 28). This button launches the computation of signals processed from the acquired one. Here’s the list of these signals available until this version:

'Audio', 'EGG', 'DEGG', 'Piezo', 'Breath', 'Spectrogram', 'F0 Boersam', 'F0 for OQ', 'Notes 1/2 ton', 'H_1-H_2', 'Open Quotient', 'Voicing'

You can use the windows length slider in order to change the length of data used when calculating the mean of certain features like the voicing.

After it finishes calculations, the console saves all the signals in a .mat file and it opens a graphical user interface containing 4 empty axes. Each axis has a pop-up list used to attribute a signal from the list above, to be plotted in the axes.

If you need to extract high level features (classification) of certain analyzed signal, choose them from the list on the right of the screen (by holding the ‘Ctrl’ key for multiple selection) and click on ‘Extract features from :’ button. The output is an .xml file containing details of the song and the extracted features.

Two types of xml can be generated:

• “feature by line” generates an xml in which elements (balises) contains one feature. When open in excel it shows a feature by row. This is the type asked for by i-treasures partners

• “timestamp by line” generates an xml in which elements (balises) contains one time label. When opened in excel it shows all features of the same timestamp on the same row. This is the type suited for analyzing and comparing parameters. Pay attention and be sure that all features have the same sampling frequency. One timestamp should be common for all the features.

Page 22: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 22 of 27

Figure 34: Analyses figure showing from the top to the bottom: the audio signal, the spectrogram, the fundamental frequency of the speech directly on the EGG signal and

the binary voicing ratio.

1.1.4.6.1 Voicing classification

The voicing feature is a bi-value information. The calculus of this feature is done before opening the ‘Analyse’ window. The algorithm verifies if a fundamental frequency is detected on a certain timestamp, then it attributes a true Boolean value if yes. The voicing feature calculus is an average of these Boolean values on a window scale level. If the average (which becomes a decimal value between 0 and 1) is higher than 0.5, we consider that the sound is voiced.

1.1.4.6.2 Fundamental frequency classification

The F0 feature is a 5 values intervals: VeryLowFreq, LowFreq, AverageFreq, HighFreq and VeryHighFreq. Since the male voice and the female voice fundamental frequencies are in a different range of values, two F0 feature values can be extracted. The default boundaries of the intervals for the male case are elements of a geometric sequence as following:

MVeryLowFreq : F0 < 98.00Hz (G2 / Sol)

MLowFreq: 98.00Hz (G2 / Sol)< F0 < 138.59Hz (C#3/Do#)

MAverageFreq: 138.59Hz (C#3/Do#)< F0 < 196.00Hz (G3/Sol)

MHighFreq: 196.00Hz (G3/Sol) < F0 < 277.19 (C#4 / Do#)

MVeryHighFreq: 277.19 (C#4 / Do#) < F0.

Anyhow, the user can always change these boundaries in the parameters (settings) window. In the female case, the boundaries are twice the values of the male case. The features are (FVeryLowFreq, FLowFreq, FAverageFreq, FHighFreq, FVeryHighFreq.

Page 23: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 23 of 27

1.1.4.6.3 Variation of fundamental frequency classification

The variation of the fundamental frequency indicates if the singer is increasing his F0 or decreasing it. The values are calculated by applying a derivative of the fundamental frequencies. The 5 intervals are HighlyDecreasing, Decreasing, stable, increasing and HighlyIncreasing.

The user can change the classification intervals by clicking the parameters button. Anyhow, he can reset the default ones by clicking the ‘Reset default parameters’ button.

Page 24: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 24 of 27

2. Appendix 2 : vocal tract acquisition module

2.1 Installing the Vermont probe and the Terason acquisition module

The ultrasonic imaging probe we are using in our system has been especially manufactured by Vermont following our needs. The reason we are using Vermont’s probe is that it is free from a handset so it can be less annoying for the user use when it is mounted on the hyper-helmet. Although, Vermont probes were initially developed for clinical use, making their acquired signal inaccessible on regular computer systems. To bypass this problem, we established a contact with another ultrasonic imaging probe manufacturer – Terason. Terason do not produce handset-free probes but can develop PC interface modules (hardware) for ultrasonic probes. Nevertheless, the procedure to install the Terason software, forcing it to recognize the Vermont probe and adapt RTMaps for ultrasound image acquisition is a multi-task procedure that we detail in the following.

1 – In its actual version, Terason’s software works on 32bits windows machines. The Terason interface module transfers data and requires power supply via a 1394a-1995 FireWire plug. Mac book pro devices can be used after installing windows system. Some other PCs (i.e. Dell precision) provide FireWire 1394 plugs for connectivity only. You may need an external FireWire hub in order to insure power supply; we require "LINDY 6 ports" FireWire hubs.

2- Install and run the Terason software. The adapted one is version 4.6.4 (Release build 2156). For license issues, the software asks for a permission. You need to send a mail to "[email protected]" providing them with the license status and the module’s Serial Number S/N. It normally takes a couple of hours to get the code. You need to keep the Terason window open while waiting for the answer, otherwise the S/N will be changed and you need to send another mail.

3 - Make sure the windows User Account Controls (UAC) is shut off and the antivirus is disabled.

4 - Set your FireWire driver to legacy mode: You can change the 1394 driver in the device manager by clicking “update driver”, then browse your computer looking for a list of FireWire device drivers. One of the drivers should say “legacy”.

At this point, when the ultrasound is plugged to the pc, you should hear one beep each time the Terason software is launched or on windows start.

5 – Since the probe we are using has been manufactured out of Terason factories, a software module for programming its EPROM - the "Probe_id_experimental" - shall be provided. Run "Probe_ID_Experimental.exe", find the probe you're using in the list and click on program. (The probe we used is 8MC4_Vet_Exp). At this point, you should hear two beeps when connecting the probe. Acquired ultrasound images should now be depicted in the Terason main software.

6- It is time to save default settings for your ultrasound probe. Create an save an exam using Terason software.

7 - In order to acquire the images, visualize them and save them via RTMaps, you need the RTMaps Terason package. If not provided, please ask RTMaps for these packages.

Page 25: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 25 of 27

2.2 Exploring the recorded folder

In this appendix, we will explore the recorded data in: MainFolder\SongType\SingerName\RecordingPlace\RecordingDate\SongName\SongType\.

“SongType_SingerName_RecordingPlace_RecordingDate_SongName_PerformanceType.TextGrid”: The .Textgrid file that holds the corpus lyrics,

“YYYYMMDD_hhmmss_RecFile_1” : is the directory created by RTMaps (where “YYYY” is the year, “MM” month, “DD” day, “hh” hour, mm minutes, “ss” seconds of the recording time). It includes all the signals and images, along with acquisition information files, that RTMaps generates upon a recording. The following cites and details these files.

“RecFile_1_ YYYYMMDD_hhmmss.idx” is an info file, proper for RTMaps,

“RecFile_1_ YYYYMMDD_hhmmss.idy” is an info file, it includes information about the acquisition format of each sensor as long as the timestamp of the first connection with these sensors.

“RecFile_1_ YYYYMMDD_hhmmss.rec” is the main file needed for the recording. In addition to the information found in the “.idy” version, the “.rec” file includes all the timestamp for each acquired audio buffer or image.

“RecFile_1_ YYYYMMDD_hhmmss_Belt_Piezo_Sound_Capture_monoOutput1.wav” is the mono signal recorded by the breathing belt, in 16bits and 44100Hz.

“RecFile_1_ YYYYMMDD_hhmmss_Belt_Piezo_Sound_Capture_monoOutput1.inf” holds info of the corresponding recorded .wav

“RecFile_1_ YYYYMMDD_hhmmss_Belt_Piezo_Sound_Capture_monoOutput2.wav” is the mono signal recorded by the nasal accelerometer, in 16bits and 44100Hz.

“RecFile_1_ YYYYMMDD_hhmmss_Belt_Piezo_Sound_Capture_monoOutput2.inf” holds info of the corresponding recorded .wav

“RecFile_1_YYYYMMDD_hhmmss_Micro_EGG_Sound_Capture_monoOutput1.wav” is the mono signal recorded by the acoustic microphone, in 16bits and 44100Hz.

“RecFile_1_ YYYYMMDD_hhmmss_ Micro_EGG_Sound_Capture_monoOutput1.inf” holds info of the corresponding recorded .wav

“RecFile_1_YYYYMMDD_hhmmss_Micro_EGG_Sound_Capture_monoOutput2.wav” is the mono signal recorded by the electroGlottoGraph, in 16bits and 44100Hz.

“RecFile_1_ YYYYMMDD_hhmmss_ Micro_EGG_Sound_Capture_monoOutput2.inf” holds info of the corresponding recorded .wav

“RecFile_1_ YYYYMMDD_hhmmss_Reverse_Webcam_1_output.raw” is a single raw file that contains all the recorded camera images. If i-THAn is not available, a Matlab script or an RTMaps diagram could be used to convert these images to .jpeg or to any other image format.

“RecFile_1_ YYYYMMDD_hhmmss_Reverse_Webcam_1_output.inf” is an info file that includes acquisition information of the camera images.

“RecFile_1_ YYYYMMDD_hhmmss_t3000_ultrasound_1_image.raw” is a single raw file that contains all the recorded ultrasound images. If i-THAn is not available, a Matlab script or an RTMaps diagram could be used to convert these images to .jpeg or to any other image format.

“RecFile_1_ YYYYMMDD_hhmmss_t3000_ultrasound_1_image.inf” is an info file that includes acquisition information of the ultrasound images.

Page 26: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 26 of 27

“RecFile_1_ YYYYMMDD_hhmmss _Overflows_record_nb_fifo_overflows.csv” holds info about any overflow that might happen during the recording.

Page 27: 1.1 Vocal Tract Data Capture and Analysis Module …lpp.in2p3.fr/IMG/pdf/manual_ithrec.pdf · 1.1 Vocal Tract Data Capture and Analysis Module ... For more information about the Terason

D3.4 Final Version of ICH Capture and Analysis Modules i-Treasures ICT-600676

Manual.docx Page 27 of 27

3. References

[1] http://qt.nokia.com/

[2] http://opencv.willowgarage.com/wiki/

[3] http://www.opengl.org/

[4] http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-

ipp-open-source-computer-vision-library-opencv-faq/#3

[5] http://software.intel.com/en-us/intel-mkl/

[6] https://www.csie.ntu.edu.tw/~cjlin/liblinear/

[7] http://www.microsoft.com/en-us/kinectforwindows

[8] www.openni.org/

[9] http://www.openni.org/openni-sdk/openni-sdk-history-2/

[10] http://xface.fbk.eu

[11] J. Ostermann, "Chapter 2: Face Animation in MPEG-4", in MPEG-4 Facial Animation: The Standard, Implementation and Applications, I. Pandzic and R. Forchheimer Eds., John Wiley & Sons, Inc., New York, 2003, pp. 17-55.

[12] http://www.intempora.com/rtmaps4/rtmaps-software/overview.html

[13] http://trueconf.com/

[14] Müller, M. (2007). Information Retrieval for Music and Motion. Berlin: Springer-Verlag Berlin Heidelberg.

[15] Hussein, M. E., Marwan, T., Gowayyed, M. A., & El-Saban, M. (2013). Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations. IJCAI, (pp. 2466-2472).

[16] Lv, F., & Nevatia, R. (2006). Recognition and segmentation of 3-d human action using hmm and multi-class adaboost. Computer Vision–ECCV 2006 (pp. 359-372). Springer Berlin Heidelberg.

[17] L. BREIMAN. Random forests. Machine learning, 45(1):5–32, 2001.