Non-intrusive Speech Quality Assessment Algorithm Based on ... · Parametric based methods Using...
Transcript of Non-intrusive Speech Quality Assessment Algorithm Based on ... · Parametric based methods Using...
Non-intrusive Speech Quality Assessment
Algorithm Based on Spectro-temporal Analysis
School of Computer Engineering
LI Qiaohong
(Supervisor: Prof Weisi Lin)
(Co-Supervisor: Prof Daniel THALMANN)
July. 22, 2014
Outline
• Motivation
• Review
• Method
• Experimental results
• Conclusion and future work
Motivation
Speech
Signals
6. Transmission(eg. VoIP, IPTV)
1. Acquisition
(Noise)
3. Reproduction(eg. imperfect
reconstruction)
5. Postprocessing(eg. enhancement)
2. Synthesis(eg. Text-to-speech)
4. Security(eg. watermarking)
School of Computer Engineering
• Applications of speech quality assessment methods:
• speech acquisition, enhancement, watermarking, compression,
transmission, reconstruction, authentication, speech synthesis …
• Two broad approaches:
• subjective vs. objective methods
• Subjective assessment suffers from drawbacks
• time-consuming, laborious and expensive; requires many human subjects
and repeated viewing/listening sessions
• Not feasible for on-line signal manipulations (such as encoding,
transmission, relaying, etc.)
• depends upon viewers’ physical conditions, emotional states, personal
experience etc
Motivation
Objective Speech Quality Assessment
• Intrusive SQA methods(a.k.a double-ended, full-reference methods)
Parametric based methods
Using the parameters of the compression and transmission protocols to estimate the final quality score.
Signal based methods
Calculating the perceptually weighted distance between the reference and
degraded speech signals.
Eg. SNR, LLR, BSD, PSQM, PESQ, POLQA.
• Non-intrusive SQA methods• (a.k.a single-ended, no-reference)
Engineering Approach Framework
School of Computer Engineering
Feature extraction
Reference signal
Distorted signal
Feature pooling
(cognitive mapping)Quality score
Stage I Stage II
For intrusive methods
Exploits signal processing
techniques
Based on machine learning
Gabor Feature Extraction for Speech
Quality Assessment
Gabor feature extraction pipeline
Gabor Feature Extraction for Speech
Quality Assessment
The effectiveness of extracted Gabor features
SVR for feature mapping
School of Computer EngineeringSchool of Computer Engineering
We adopt the SVR to learn the mapping from extracted Gabor features to
objective speech quality
We use the Radial Basis Function (RBF) kernel with the kernel
function of 𝐾(𝒙𝑖, 𝒙𝑗) = exp(−𝜌 ∥𝒙𝑖 − 𝒙𝑗∥2) in this work. The
parameters {𝐶, 𝜌, 𝜖} are selected through cross validation
80% Training data vs. 20% Test data
Overall Test
Split dataset according to different contents
Test 1
Split dataset according to different noise levels
Test 2
Split dataset according to different noise types
Test 3
Split dataset according to different noise enhanced algorithms
The scatter plot of the perdition results of proposed metric
versus the subjective scores in NOIZEUS database
Experimental results
Comparison with state-of-the-arts
[7] T. H. Falk and Chan Wai-Yip, “Single-ended speech quality measurement using machine learning methods,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 6, pp. 1935–1947, 2006
[10] M. Narwaria, Lin Weisi, I. V. McLoughlin, S. Emmanuel, and Chia Liang-Tien, “Nonintrusive quality assessment of noise suppressed speech with mel-filtered energies and support vector regression,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 4, pp. 1217–1232, 2012
Experimental results
Future Work
No-reference visual quality assessment
Joint audiovisual quality assessment:
humans perceive ‘overall’ multimedia quality and not separate assessment
Possible approaches include one-stage and two-stage fusion (OSF/TSF)
OSF: both audio and speech features pooled in one stage
TSF: first pool audio, then video features and the two scores into an overall score
Thank you!
Questions?
School of Computer Engineering