Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno...

download Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Université de Sherbrooke Département.

If you can't read please download the document

Transcript of Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno...

  • Slide 1

Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Universit de Sherbrooke Dpartement de Gnie Electrique et Informatique Sherbrooke, Qubec, Canada Slide 2 Outline The 3GPP AMR-WB+ Standard Source of inspiration for LPD processing in USAC Changes brought to LPD processing Forward Aliasing Cancellation Frequency-Domain Noise Shaping Other changes Conclusion More efficient LPD processing Better unification of LPD and non-LPD FD coders Slide 3 Context The 3GPP AMR-WB+ Standard Hybrid codec Time (ACELP) and Frequency (TCX) Domain Very efficient on speech and speech-over- music contents Slide 4 The AMR-WB+ Encoder Mode Selection PACKETIZATION Bitstream ACELP TCX Audio 1, 2 or 4 frames 1 frame Mode Index, ISF Slide 5 AMR-WB+ Frame Structure Three out of the 26 possible ACELP/TCX coding configurations ACELP Short TCXACELPMedium TCX Long TCX One super-frame = 1024 samples (a) (b) (c) Slide 6 Transitions from ACELP to TCX Zero-input response (ZIR) of LPC weighting filter provides pseudo-windowing ACELPFrame1/8 overlap Decoded TCX window Slide 7 Transitions from TCX to ACELP Redundant windowed TCX samples are discarded ACELPFrame 1/8 overlap Decoded TCX window Slide 8 Limitations of the AMR-WB+ model Non-critically sampled transforms FFT vs. MDCT Inefficiencies at transitions between modes Sub-optimal windowing (from ACELP to TCX) Discarded samples (from TCX to ACELP) Transform windows not aligned with ACELP grid LPC analysis window also shifted to the right Even worse when switching with AAC Time-Domain Aliasing Cancellation (TDAC) Transitions between LPD and non-LPD processing Slide 9 Changes brought to the LPD processing Replaced FFTs by MDCTs Introduced Frequency Domain Noise Shaping Introduced Forward Aliasing Cancellation Other changes Slide 10 Frequency Domain Noise Shaping To unify processing of AAC and TCX frames, the MDCT transform in TCX is applied in the original signal domain Noise shaping for TCX frames is performed in the MDCT domain based on LPC filters mapped to the MDCT domain FDNS allows a smooth (sample-by-sample) time- domain noise envelope by applying a 1st-order filtering to the MDCT coefficients (similar in principle to TNS) Slide 11 Effect of FDNS on the spectral shape and the time-domain envelope of the noise time axis (n) A B C Noise gains g 1 [m] calculated at time position A Interpolated gains seen in the time domain, for each of the M bands Noise gains g 2 [m] calculated at time position B Frequency axis (k or m) Frequency axis (k or m) Slide 12 Frequency-Domain Noise Shaping FDNS allows a smooth (sample-by-sample) time- domain noise envelope by applying a 1st-order filtering to the MDCT coefficients (similar in principle to TNS) Slide 13 Forward Aliasing Cancellation +- TCX frame output ACELP synthesis Next ACELP frame Windowing effect and Time Domain Aliasing Introduced to compensate windowing and time- domain aliasing in MDCT-coded frames when switching to and from ACELP frames Slide 14 Forward Aliasing Cancellation FAC is applied in the original signal domain FAC is quantized in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature For transition from ACELP to TCX, the ACELP synthesis can be taken into account; this reduces the bitrate needed to encode FAC Slide 15 Computation of FAC targets for transitions from and to ACELP (encoder) Slide 16 Quantization of FAC targets W 1 (z) LPC1 FAC target DCT-IVQ DCT-IV -1 LPC1 FAC synthesis 1/W 1 (z) 1/W 1 (z) ZIR Transmit to decoder Filter memory (ACELP error) Zero memory LPC2 FAC target LPC2 FAC synthesis W 2 (z)DCT-IVQ DCT-IV -1 1/W 2 (z) Filter memory (TCX frame error) Zero memory Transmit to decoder Transition from ACELP to TCX Transition from TC to ACELP Slide 17 Other changes brought to the LPD processing Critical sampling MDCT vs. FFT FAC+FDNS Scalar quantizer + adaptive arithmetic coder for TCX (AMR-WB+ uses AVQ) Variable bit rate LPC quantizer Bit reservoir adaptation Slide 18 Conclusion USAC makes use of LPD and non-LPD processing LPD mode inspired by AMR-WB+ Non-LPD mode derived from AAC Substantial changes were brought to the LPD processing, and new tools were introduced to make it more efficient Frequency Domain Noise Shaping (FDNS) Forward Aliasing Cancellation (FAC) USAC is a real unification of two coding models Slide 19