Real-Time Non-Intrusive Speech Quality Estimation for VoIP

42
Real-time Non-intrusive Real-time Non-intrusive Speech Quality Estimation for Speech Quality Estimation for VoIP VoIP Adil Raja Adil Raja

Transcript of Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Page 1: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Real-time Non-intrusive Real-time Non-intrusive Speech Quality Estimation for Speech Quality Estimation for

VoIPVoIP

Adil RajaAdil Raja

Page 2: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

OutlineOutline

• Research MilestonesResearch Milestones• Theoretical AspectsTheoretical Aspects• Evaluation Platforms Evaluation Platforms • ConclusionConclusion

Page 3: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Research MilestonesResearch Milestones

• Problem Statement.Problem Statement.• Objectives.Objectives.• Related Work. Related Work. • Current Status of The Project.Current Status of The Project.• Future Work.Future Work.

Page 4: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Problem StatementProblem Statement

• Lack of a Lack of a Real-timeReal-time, , Non-intrusiveNon-intrusive Speech Quality Estimation at Speech Quality Estimation at mid-mid-networknetwork points. points.

Page 5: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

ObjectivesObjectives

• To develop a To develop a Real-TimeReal-Time, , Non-intrusiveNon-intrusive speech quality estimation model for VoIP speech quality estimation model for VoIP networks.networks.

• Particular emphasis is on effectiveness of Particular emphasis is on effectiveness of the model on the model on “mid-network”“mid-network” points. points.

• The model should assess the over-all speech The model should assess the over-all speech quality by evaluating:quality by evaluating: Transport Layer metrics.Transport Layer metrics. Speech layer metrics.Speech layer metrics.

• Effective implementation of a perceptual Effective implementation of a perceptual model is crucial.model is crucial.

Page 6: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Related WorkRelated Work

• Standards – E-Model, P.563 {ITU-T}.Standards – E-Model, P.563 {ITU-T}.• Industry.Industry.

PsyVoIP for gateways {Psytechnics}.PsyVoIP for gateways {Psytechnics}. VQMon/EP {Telechemy}VQMon/EP {Telechemy} 3SQM {OPTICOM}3SQM {OPTICOM} PSM {Psytechnics}PSM {Psytechnics}

• Theoretical Research.Theoretical Research. Transport layer assessments.Transport layer assessments. Perceptual Models.Perceptual Models. Cognitive Models.Cognitive Models.

Page 7: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Current Status of The Current Status of The ProjectProject

• Transport layer metrics can be Transport layer metrics can be captured using RTP packets and RTCP captured using RTP packets and RTCP reports.reports.

• The metrics include:The metrics include:Packet loss – From RTP packetsPacket loss – From RTP packetsJitter – From RTP packetsJitter – From RTP packetsRound-trip-delay – RTCP-SR/RR reportsRound-trip-delay – RTCP-SR/RR reports

Page 8: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Current Status of The Current Status of The ProjectProject

• A perceptual model based on Perceptual Linear A perceptual model based on Perceptual Linear Prediction (and MFCC) has been ported to Prediction (and MFCC) has been ported to IXP2400 XScale processor.IXP2400 XScale processor.

• SOM_PAK has been ported to IXP2400 XScale SOM_PAK has been ported to IXP2400 XScale processor.processor.

• MicroEngine code for buffering of packets on MicroEngine code for buffering of packets on SRAM has been done.SRAM has been done.

• The overall model design is based on a single The overall model design is based on a single VoIP call.VoIP call.

Page 9: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Future WorkFuture Work

• Integration of Speech layer model with Integration of Speech layer model with transport layer model.transport layer model.

• Testing under various packet delay and Testing under various packet delay and loss scenarios.loss scenarios.

• Evaluation of Model for low bit-rate Evaluation of Model for low bit-rate codecs.codecs.

• Scalability testing for multiple VoIP Scalability testing for multiple VoIP calls.calls.

Page 10: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Theoretical AspectsTheoretical Aspects

• Packet Loss and Jitter Evaluation.Packet Loss and Jitter Evaluation.• Effect of Packet Loss Distribution.Effect of Packet Loss Distribution.• Unordered and Missing Packets.Unordered and Missing Packets.• Computational Lag.Computational Lag.• Methodology.Methodology.• Perceptual Evaluation of Low Bit-Rate Vocoders.Perceptual Evaluation of Low Bit-Rate Vocoders.• Self Organizing Maps.Self Organizing Maps.• Hidden Markov Models.Hidden Markov Models.

Page 11: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Packet Loss and Jitter Packet Loss and Jitter EvaluationEvaluation

• Performance on mid-network points.Performance on mid-network points.

IXP2400NPU

RTCP-SR and RTCP-RRpackets used to compute

round-trip delay.

ENDPOINT-A ENDPOINT-B

RTP PACKETS USED TOCOMPUTE THE VALUES OFJITTER AND PACKET LOSS

Page 12: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Packet Loss and Jitter Packet Loss and Jitter EvaluationEvaluation

RouterComputer ComputerRouter Router

Router

Router

Page 13: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Packet Loss and Jitter Packet Loss and Jitter EvaluationEvaluation

• ReasonsReasons Routing Table updates.Routing Table updates. Traffic Engineering.Traffic Engineering.

Page 14: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Packet Loss and Jitter Packet Loss and Jitter EvaluationEvaluation

• To Capture Packet loss and jitter from RTCP-SR/RR To Capture Packet loss and jitter from RTCP-SR/RR packets.packets.

• Other Advantages.Other Advantages. RTCP-SR/RR report fraction of packets lost over a certain RTCP-SR/RR report fraction of packets lost over a certain

interval of time.interval of time. This provides the mean loss rate for a call in the current This provides the mean loss rate for a call in the current

time frame as opposed to overall loss rate.time frame as opposed to overall loss rate. Some computation is offloaded from the IXP2400.Some computation is offloaded from the IXP2400. End-to-end transport layer metrics as opposed to end-to End-to-end transport layer metrics as opposed to end-to

mid network point metrics.mid network point metrics.

Page 15: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Effect of Packet Loss Effect of Packet Loss DistributionDistribution

• Most models assess the impact of Most models assess the impact of packet loss on speech quality in terms packet loss on speech quality in terms of mean loss rate.of mean loss rate.

• Packet loss is bursty in nature.Packet loss is bursty in nature.• Packet loss location has a variable effect Packet loss location has a variable effect

on the quality of speech. {H. on the quality of speech. {H. Schulzrinne}.Schulzrinne}.

• The impact of packet loss distribution The impact of packet loss distribution should be used as a QoS metric.should be used as a QoS metric.

Page 16: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Unordered and Missing Unordered and Missing PacketsPackets

• Packets arrive out of order.Packets arrive out of order.• Some packets are lost and some take Some packets are lost and some take

alternative paths.alternative paths.• These factors can have adverse These factors can have adverse

effects when acoustic back-end is a effects when acoustic back-end is a HMM (for instance).HMM (for instance).

Page 17: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Computational LagComputational Lag

T0TN

Speech Layer Processing

Transport Layer Processing

• Perceptual Model reports the results of the past Perceptual Model reports the results of the past samples.samples.

• The computational lag between the speech The computational lag between the speech layer model and the perceptual model layer model and the perceptual model increases as the time progresses.increases as the time progresses.

• Some samples have to be skipped to overcome Some samples have to be skipped to overcome this lag.this lag.

Page 18: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

MethodologyMethodology

• Transport Layer Model Transport Layer Model Jitter, Loss, Delay.Jitter, Loss, Delay.

• Speech layer ModelSpeech layer Model Perceptual Model.Perceptual Model.

Perceptual Linear Prediction.Perceptual Linear Prediction. Mel Frequency Cepstral Coefficients.Mel Frequency Cepstral Coefficients. Bark Spectral Distortion.Bark Spectral Distortion.

Code-book of Clean Speech Feature VectorsCode-book of Clean Speech Feature Vectors Self-organizing Maps – Vector Quantization.Self-organizing Maps – Vector Quantization. Hidden Markov Models – Probabilistic.Hidden Markov Models – Probabilistic.

Page 19: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

MethodologyMethodologySRAM

Optional hostCPU, PCI

bus devices

ExternalMedia

Device(s)

ScratchpadMemory

PCIController

Media SwitchFabric

Interface

CAP

Hash Unit

SRAMController 1

SRAMController 0

DRAMCOntroller 0

DRAM

IXP2400

PCI (64 bit, 33/66 MHz)

SP14, CSIX QDR DDR

Packet Receive/Transmit MEs

Packet ProcessingMEs

SHaC

These MEs Receivethe packets from theMSF interface and

forward them toDRAM controller on

reception. And do theopposite fortransmission

Parse various headerfields of VoIP

packets and Calcultepacket based Qos

Metrics and place theresults on SRAM.They buffer the

speech frames onSRAM on addressesknown to perceptual

model.

Intel XScale Core

The perceptual modelcalculates distortions due toencoding and bit-errors and

places the result on theSRAM.

This module calculates theobjective score from all the

values accumulated onSRAM

ObjectiveScore = S(s,c,e)

Page 20: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

MethodologyMethodology

• At a given time a number of At a given time a number of (contiguous) packets are buffered to (contiguous) packets are buffered to be input to the perceptual model.be input to the perceptual model.

• Statistical Analysis.Statistical Analysis.

Page 21: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

MethodologyMethodology

• Optimum number of packets to be buffered?Optimum number of packets to be buffered?• Optimum buffering interval?Optimum buffering interval?• The overall speech quality is a function of both The overall speech quality is a function of both

auditory distance and transport layer distortions.auditory distance and transport layer distortions.

Page 22: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

MethodologyMethodology

• Assessment of Model for one VoIP call Assessment of Model for one VoIP call scenario.scenario.

• G.711 is the preferred codecG.711 is the preferred codec• Simulate Packet loss rate, packet loss Simulate Packet loss rate, packet loss

distribution, delay and jitter (Fine Tuning).distribution, delay and jitter (Fine Tuning).• Analysis of low-bit rate codecs.Analysis of low-bit rate codecs.• Scale the model for multiple VoIP calls.Scale the model for multiple VoIP calls.• IXP2400 NPU is the target hardware platform.IXP2400 NPU is the target hardware platform.

Page 23: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Perceptual Evaluation of Low Perceptual Evaluation of Low Bit-Rate Vocoders.Bit-Rate Vocoders.

• Real time speech quality estimation for low bit Real time speech quality estimation for low bit rate codecs (G.729, G.723.1) without decoding rate codecs (G.729, G.723.1) without decoding the frames.the frames.

• {Carmen Peláez-Moreno, Ascensión Gallardo-Antolín, and Fernando} perform speech recognition by only extracting the LP coefficients.

Page 24: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

SOMSOM

Page 25: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

SOM TrainingSOM Training

Page 26: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

SOMSOM

• What is the average quantization error (QE)?What is the average quantization error (QE)?• Auditory Distance = Distortion + QE.Auditory Distance = Distortion + QE.• How to deal with QE?How to deal with QE?

Page 27: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

SOM – Quantization ErrorSOM – Quantization Error

• SOM Discretizes data.SOM Discretizes data.

Page 28: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

SOM – Data DistributionSOM – Data DistributionTimo Kostiainen

Page 29: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Growing Hierarchical SOMGrowing Hierarchical SOMLayer 0

Layer 2

Layer 1

Page 30: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

GHSOM - AdvantagesGHSOM - Advantages

• A desired level of granularity in discriminating A desired level of granularity in discriminating input data is achievable.input data is achievable.

• Horizontal Expansion.Horizontal Expansion.• Vertical Expansion.Vertical Expansion.• As the SOM is hierarchical, the searching time As the SOM is hierarchical, the searching time

is reduced.is reduced.• What if a distorted signal of class A has lower What if a distorted signal of class A has lower

AD with class B?AD with class B?

0.i imqe mqe0.m mMQE mqe

Page 31: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Hidden Markov ModelsHidden Markov Models

• Auditory scores based on logically connected Auditory scores based on logically connected sequence of feature vectors.sequence of feature vectors.

• λλ = (A, B, = (A, B, ))• A – transition probability matrix from one A – transition probability matrix from one

phonemic class to the next.phonemic class to the next.• B – Emission probability of a phonemic vector.B – Emission probability of a phonemic vector. - Initial State Probability.- Initial State Probability.• Parameters are learnt during training.Parameters are learnt during training.

Page 32: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Hidden Markov ModelsHidden Markov Models

• A suitably trained HMM can be A suitably trained HMM can be used to find auditory distance.used to find auditory distance.

• Continuous HMM.Continuous HMM.• Reliable Results.Reliable Results.

Page 33: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Evaluation PlatformsEvaluation Platforms

• Cell Broad Band Engine Processor Cell Broad Band Engine Processor Architecture.Architecture.

• Programming the Cell.Programming the Cell.• Some ConcernsSome Concerns

Page 34: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Cell Broadband ProcessorCell Broadband Processor

Page 35: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Cell BE Processor ……Cell BE Processor ……

Sr No Feature Qty

1 Power Processing Element (PPEs) 1

2 Synergistic Processing Elements (SPEs) 8

3 Element Interconnect Bus (EIB) 1

4 Direct Memory Access Controller (DMAC) 1

5 Rambus XDR memory controllers 2

6 Rambus File IO interface

7 PCI Express x 4

7 256 GFLOPS (Single precision at 4 GHz).

8 25 GFLOPS (Double precision at 4 GHz).

Page 36: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Mercury Computer SystemsMercury Computer Systems

Cell Technology Evaluation System (CTES)

Page 37: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Programming the CellProgramming the Cell

• The primary language is C, C++ is also supported The primary language is C, C++ is also supported to some exteent.to some exteent.

• Programming ModelsProgramming Models Job Queue – PPE schedules the jobs for SPEs.Job Queue – PPE schedules the jobs for SPEs. Self-multitasking of SPEs – kernel and scheduling is Self-multitasking of SPEs – kernel and scheduling is

distributed across SPEs.distributed across SPEs. Stream Processing - The SPEs use shared memory for all Stream Processing - The SPEs use shared memory for all

tasks.tasks.

• Development PlatformsDevelopment Platforms Cell BE Engine SDK (alpha version) {IBM} – Full system Cell BE Engine SDK (alpha version) {IBM} – Full system

simulator.simulator. Yellow Dog Linux {Mercury Computer Systems}.Yellow Dog Linux {Mercury Computer Systems}.

Page 38: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Some ConcernsSome Concerns

• Software design is key to effective performance of Cell.Software design is key to effective performance of Cell.• Multithread execution – key to effective execution.Multithread execution – key to effective execution.• The code has to be vectorisable and parrallisable. The code has to be vectorisable and parrallisable. • To port the code to SPEs it has to be partitioned from rest of To port the code to SPEs it has to be partitioned from rest of

the code so that it is fully self-contained.the code so that it is fully self-contained.• Hardware abstraction.Hardware abstraction.• Learning Curve Effect.Learning Curve Effect.

Support from IBM.Support from IBM. Credibility of SDK/APIs.Credibility of SDK/APIs. Comments from Peter Seebach.Comments from Peter Seebach.

• CostCost

Page 39: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

AlternativesAlternatives

• XScale.XScale.• Offloading of compute intensive tasks Offloading of compute intensive tasks

to another processor using gigabit port.to another processor using gigabit port.• PCI with Pentium 4.PCI with Pentium 4.• PCI with a suitable graphics card.PCI with a suitable graphics card.

Page 40: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Gigabit Alternative …Gigabit Alternative …

IP NETWORKIP NETWORK

IXP2400NPU

Workstation

Page 41: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

ConclusionsConclusions

• Preliminary work of the model is complete.Preliminary work of the model is complete.• Packet loss distribution.Packet loss distribution.• Evaluation of low bit rate codecs.Evaluation of low bit rate codecs.• Evaluation platform.Evaluation platform.• Overall Research GoalsOverall Research Goals

Real-time non-intrusive VoIP Quality Real-time non-intrusive VoIP Quality assessment model.assessment model.

Perceptual Distortion Measures.Perceptual Distortion Measures. Model Training.Model Training.

Page 42: Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Wireless Access Research Group, University of Limerick

Thank you for Your TimeThank you for Your Time