Geneva, Switzerland, 24 October 2013
Speech signal processingfor media accessibility
Takayuki Ito, Dr. Eng.Executive Research Engineer, NHK Engineering System, Inc.
ITU Workshop on “Making Media Accessible to all:The options and the economics”
(Geneva, Switzerland, 24 (p.m.) – 25 October 2013)
Geneva, Switzerland, 24 October 2013 2
Ageing : A Global Issue
Population of elderly persons is increasing globally because of fertility rates decline.
Need providing elderly persons with the opportunity to continue contributing to society.
(UN 2002 Madrid International Plan of Action on Ageing)
From “supported” to “supporting”
Japan Aged65 and over
2010 23%2040 36%
Ageing : degradation of hearing
Hearing loss especially in higher frequenciesHearing Aid is available.
Background sound interferes to understand speech.
Better mixing balance for TV programs is needed.
Degradation of cognitive speedSlower speech rate is preferable.
Compensating these degradations makes easier for their social participation.
Geneva, Switzerland, 24 October 2013 3
Speech rate conversion technology
4Geneva, Switzerland, 24 October 2013
TV and radio set with “Slow button”
Speech rate conversion for elderly people
5
The elderly sometimes claim “Recent speeches on TV programs are too fast for me to understand.”A need to slow down speech rate without degrading sound quality
Geneva, Switzerland, 24 October 2013
×
Slowertime
①②③③④⑤⑥⑦⑧⑧⑨⑩
Fastertime
①②④⑤⑥⑦⑨⑩
Originaltime
time
①②③④⑤⑥⑦⑧⑨⑩
stop
Analog elongation
×①②③④⑤⑥⑦⑧⑨⑩
Speech rate conversion without changing length
6
Start is coincided at blue line positions
Start is not coincidedbut…
Again it coincides
Original
Converted
Stop
Geneva, Switzerland, 24 October 2013
streaming data
Visually impaired people use fast replay to find a main idea in audio books or web pages.(Audio skimming)
Original(n times) BGM speechsilent
Important part(speech)
J
Intelligible high speed speechfor visually impaired people
7
Make this part easier to understand
Converted(same length)
Make slower Make slower
time
E
GF
Stop
Geneva, Switzerland, 24 October 2013
recorded data
Applications of speech rate conversion
Geneva, Switzerland, 24 October 2013 8
slower faster
Learn foreign language
For people with learning disability
Quick news internet service
Audio skimming for visually
impaired people
For elderly people
clean audio
9Geneva, Switzerland, 24 October 2013
A TV receiver with clean audio dial
10Geneva, Switzerland, 24 October 2013
Various ways to realize this.For detailed information, please see FG AVA TR Part 12.
Receiver-side re-mixing for the elderly(Clean Audio)
Separate speech from background sound by stereo correlation.Estimated speech component is enhanced for clearer speech. Speech and BG sound is re-mixed with favorite ratio.Nothing is necessary to change in production and transmission.
11
BroadcastSound Output
Sound
Stereosignal
adaptive filter
Voice detectorSpeech / non-speech
flag
Estimated speech Re-mixing
speech and BG with specified
ratio
spectrum emphasiz
-er×α
×β
×γ
×η
Geneva, Switzerland, 24 October 2013
Estimated BG sound
Demonstration of the receiver-side clear audio
Geneva, Switzerland, 24 October 2013 12
Conclusions and Recommendations
Compensating degraded functions of the elderly helps their social participation.Speech rate conversion and re-mixing F/B sounds are promising technologies for these purpose.Broadcasters/TV manufacturers are encouraged to provide these services/ devices with these functions.
Refer FG AVA Tech. Report Part 12 for more information.
Geneva, Switzerland, 24 October 2013 13
Geneva, Switzerland, 24 October 2013 14
Mixing balance meterIndicate loudness-basedmixing balance“Elderly emulation mode”indicates better mixing for the elderly.Young mixing engineerscan produce better balanced audio for the elderly.
Clear audio in studio : Mixing balance meter
15
Speech(narration etc.)
Backgroundsounds
Mixedsound
Mixing balance meter
Studio
CalculatesLoudness
&Estimate the favorability of the MIX‐Level
Geneva, Switzerland, 24 October 2013
Top Related