technology Children Voices Against Bullying in...

1

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

L2 F - Spoken Language Systems Laboratory

Children Voices Against Bullying in Schools

Luís Caldas de [email protected]


technologyfrom seed

2Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Outline

• Spoken Language Systems Laboratory of INESC-ID• Bullying and FearNot!• Acoustics of children voices• Voice building and TTS System• Experimental evaluation• Results• Conclusion and Future Work


technologyfrom seed


About L2F

• HistoryWork on speech processing for Portuguese since the 90s. L2F was created in 2001.

• GoalBring together several groups in the area of spoken language processing for European Portuguese, united by the problem we want to solve, not by the technology we share.

• MissionCreating technology to bridge the gap between natural spoken language and the underlying semantic information.

• Interdisciplinary backgroundSignal processing, natural language processing, linguistics, etc.


technologyfrom seed


Prioritary Lines of Activity

• Semantic processing of multimedia contents– Follow up of ALERT project: continued research on segmentation,

recognition, topic indexation, summarization– Automatic closed captioning

• Spoken dialogue systems and intelligent multimodal interfaces– Domotics: "intelligent" rooms controllable by voice– Telephone-based information systems;– Voices for synthetic characters.


technologyfrom seed


Companies working with L2F

• Vodafone Portugal• RTP (public broadcasting)• Promosoft (banking solutions)• Priberam (law databases)• Edisoft (security and defense industry)• Tecmic (fleet management solutions)• Ano (local government solutions)• CPC HS (health systems)• Microsoft Language Development

Center

• Repeated oppression, psychological or physical, of a less powerful person by a more powerful person – David Farrington (1993)

Bullying

FearNot!

Interaction

F0 vs Age

• Lee, S., Potamianos, A. and Narayanan, S., Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am., 105:1455–1468, Mar. 1999.

girl boy

woman

Target

436 subjects (ages 5 to 18) 56 adults

Formant Scaling

• Lee, S., Potamianos, A. and Narayanan, S., Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am., 105:1455–1468, Mar. 1999.

girl

boy

woman

Target


technologyfrom seed


Voice Building

• English and German voices. • 12 voices from 4 speakers.• Characters speak with TTS-Voice.• Matching voice characteristics.


technologyfrom seed


Recordings

• Language models generate around 5000 different utterances for each language.

• A greedy algorithm was used to automatically select a representative sub-set (550 utterances).

• 2 native German speakers and 2 native British speakers recorded the sentences.

• The recordings were modified to generate multiple child-like voices.


technologyfrom seed


Normalisation

Voice Building Architecture

AudioSegmentationUtterances Labels Audio


technologyfrom seed


Segmentation

• Segmented by our own segmentation tool adapted to British English

• Gender dependent models were trained using the British English WSJ corpus

• 85% / 84% of accuracy for female and male speakers, respectively

• A speaker adaptation procedure was performed 2 times, using the canonical word pronunciations for segmentation

• 3rd iteration: a pronunciation graph was provided for canonical pronunciations with alternative pronunciations using post-lexical rules


technologyfrom seed


XML Representation


technologyfrom seed


Speech-Engine Architecture


technologyfrom seed


Speech-Engine

• No explicit Prosody-Model for Duration and F0• Duration and F0 are predicted during runtime • Use as a feature for the context matching segment pre-

selection• Segment Selection:


technologyfrom seed


Character with original voice


technologyfrom seed


Character with modified voice


technologyfrom seed


Experimental Evaluation

• Subjects divided into 2 categories:– Audio only– Audio and video

• Rating of 6 items (Likert scale):– Overall sound quality– Naturalness– Sounds like boy/girl?– Sounds like bully/victim?

• 8 different versions of each stimuli:– 2 original voices– 2 modified voices– 2 synthesised voices– 2 modified synthesized voices.

[1] Johnson et al., Limited domain synthesis of expressive military speech for animated characters, IEEE Workshop on Speech Synthesis, 2002.


technologyfrom seed


Results

• The presence of video result in a better rating on the overall perceive quality: 3.42 (p<0.005) vs 3.70 (p<0.00001).

• The presence of the animated character made the voices more believable especially for the victim (3.68, p<0.00001).

• The modified voices had the same rating in overall quality as the unmodified voices for the audio only test (3.42, p<0.04) but were better rated when played in video clips (3.82, p<0.00001 vs 3.59, p<0.009).

• The results for the overall quality of both the modified and unmodified recording were above 4 (4.45, p<0.00001)


technologyfrom seed


Conclusions and Future Work

• Limited domain synthesis allowed us to produce voices for 3D animated characters with almost natural speech quality

• Although there was no story context in our evaluation, the video of the animated characters influenced positively the perceived overall quality and intonation

• Additional voices need to be generated• Some segmentation and concatenation problems need to

be corrected


technologyfrom seed


Thank youObrigado

L2 F - Spoken Language Systems Laboratory

technology Children Voices Against Bullying in...

Documents

Transcript of technology Children Voices Against Bullying in...