technology Children Voices Against Bullying in...

24
1 Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa technology from seed L 2 F - Spoken Language Systems Laboratory Children Voices Against Bullying in Schools Luís Caldas de Oliveira [email protected]

Transcript of technology Children Voices Against Bullying in...

Page 1: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

1

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

L2 F - Spoken Language Systems Laboratory

Children Voices Against Bullying in Schools

Luís Caldas de [email protected]

Page 2: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

2Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Outline

• Spoken Language Systems Laboratory of INESC-ID• Bullying and FearNot!• Acoustics of children voices• Voice building and TTS System• Experimental evaluation• Results• Conclusion and Future Work

Page 3: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

3Luís Caldas de Oliveira - Spoken Language Systems Laboratory

About L2F

• HistoryWork on speech processing for Portuguese since the 90s. L2F was created in 2001.

• GoalBring together several groups in the area of spoken language processing for European Portuguese, united by the problem we want to solve, not by the technology we share.

• MissionCreating technology to bridge the gap between natural spoken language and the underlying semantic information.

• Interdisciplinary backgroundSignal processing, natural language processing, linguistics, etc.

Page 4: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

4Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Prioritary Lines of Activity

• Semantic processing of multimedia contents– Follow up of ALERT project: continued research on segmentation,

recognition, topic indexation, summarization– Automatic closed captioning

• Spoken dialogue systems and intelligent multimodal interfaces– Domotics: "intelligent" rooms controllable by voice– Telephone-based information systems;– Voices for synthetic characters.

Page 5: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

5Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Companies working with L2F

• Vodafone Portugal• RTP (public broadcasting)• Promosoft (banking solutions)• Priberam (law databases)• Edisoft (security and defense industry)• Tecmic (fleet management solutions)• Ano (local government solutions)• CPC HS (health systems)• Microsoft Language Development

Center

Page 6: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

• Repeated oppression, psychological or physical, of a less powerful person by a more powerful person – David Farrington (1993)

Bullying

Page 7: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

FearNot!

Page 8: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Interaction

Page 9: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

F0 vs Age

• Lee, S., Potamianos, A. and Narayanan, S., Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am., 105:1455–1468, Mar. 1999.

girl boy

woman

Target

436 subjects (ages 5 to 18) 56 adults

Page 10: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Formant Scaling

• Lee, S., Potamianos, A. and Narayanan, S., Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am., 105:1455–1468, Mar. 1999.

girl

boy

woman

Target

Page 11: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

11Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Voice Building

• English and German voices. • 12 voices from 4 speakers.• Characters speak with TTS-Voice.• Matching voice characteristics.

Page 12: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

12Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Recordings

• Language models generate around 5000 different utterances for each language.

• A greedy algorithm was used to automatically select a representative sub-set (550 utterances).

• 2 native German speakers and 2 native British speakers recorded the sentences.

• The recordings were modified to generate multiple child-like voices.

Page 13: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

13Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Normalisation

Voice Building Architecture

AudioSegmentationUtterances Labels Audio

Page 14: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

14Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Segmentation

• Segmented by our own segmentation tool adapted to British English

• Gender dependent models were trained using the British English WSJ corpus

• 85% / 84% of accuracy for female and male speakers, respectively

• A speaker adaptation procedure was performed 2 times, using the canonical word pronunciations for segmentation

• 3rd iteration: a pronunciation graph was provided for canonical pronunciations with alternative pronunciations using post-lexical rules

Page 15: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

15Luís Caldas de Oliveira - Spoken Language Systems Laboratory

XML Representation

Page 16: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

16Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Speech-Engine Architecture

Page 17: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

17Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Speech-Engine

• No explicit Prosody-Model for Duration and F0• Duration and F0 are predicted during runtime • Use as a feature for the context matching segment pre-

selection• Segment Selection:

Page 18: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

18Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Character with original voice

Page 19: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

19Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Character with modified voice

Page 20: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

20Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Character with modified voice

Page 21: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

21Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Experimental Evaluation

• Subjects divided into 2 categories:– Audio only– Audio and video

• Rating of 6 items (Likert scale):– Overall sound quality– Naturalness– Sounds like boy/girl?– Sounds like bully/victim?

• 8 different versions of each stimuli:– 2 original voices– 2 modified voices– 2 synthesised voices– 2 modified synthesized voices.

[1] Johnson et al., Limited domain synthesis of expressive military speech for animated characters, IEEE Workshop on Speech Synthesis, 2002.

Page 22: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

22Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Results

• The presence of video result in a better rating on the overall perceive quality: 3.42 (p<0.005) vs 3.70 (p<0.00001).

• The presence of the animated character made the voices more believable especially for the victim (3.68, p<0.00001).

• The modified voices had the same rating in overall quality as the unmodified voices for the audio only test (3.42, p<0.04) but were better rated when played in video clips (3.82, p<0.00001 vs 3.59, p<0.009).

• The results for the overall quality of both the modified and unmodified recording were above 4 (4.45, p<0.00001)

Page 23: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

23Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Conclusions and Future Work

• Limited domain synthesis allowed us to produce voices for 3D animated characters with almost natural speech quality

• Although there was no story context in our evaluation, the video of the animated characters influenced positively the perceived overall quality and intonation

• Additional voices need to be generated• Some segmentation and concatenation problems need to

be corrected

Page 24: technology Children Voices Against Bullying in Schoolsdownload.microsoft.com/download/b/7/0/b7016182...• RTP (public broadcasting) • Promosoft (banking solutions) • Priberam

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa

technologyfrom seed

24Luís Caldas de Oliveira - Spoken Language Systems Laboratory

Thank youObrigado

L2 F - Spoken Language Systems Laboratory