Tutorial LPC SpeechCompression

download Tutorial LPC SpeechCompression

of 13

Transcript of Tutorial LPC SpeechCompression

  • 7/30/2019 Tutorial LPC SpeechCompression

    1/13

    Speech Compression

    Home Theory Lossless VQ Speech Image Download Links

    Speech Compression

    Contents

    I. Introduction

    II. LPC Modeling

    III. LPC Analysis

    IV. 2.4 kbps LPC Vocoder

    V. 4.8 kbps CELP Coder

    VI. 8.0 kbps CS-ACELP Coder

    VII. Demonstration

    VIII. References

    I. IntroductionThe compression of speech signals has many practical applications. One example is in digital cellular technology where

    many users share the same frequency bandwidth. Compression allows more users to share the system than otherwise possible.

    Another example is in digital voice storage (e.g. answering machines). For a given memory size, compression allows longer

    messages to be stored than otherwise.

    Historically, digital speech signals are sampled at a rate of 8000 samples/sec. Typically, each sample is represented by 8 bits

    http://www.data-compression.com/speech.shtml (1 of 13)8/9/2009 12:49:05 AM

    http://www.data-compression.com/index.shtmlhttp://www.data-compression.com/theory.shtmlhttp://www.data-compression.com/lossless.shtmlhttp://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/image.shtmlhttp://www.data-compression.com/download.shtmlhttp://www.data-compression.com/links.shtmlhttp://www.data-compression.com/links.shtmlhttp://www.data-compression.com/download.shtmlhttp://www.data-compression.com/image.shtmlhttp://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/lossless.shtmlhttp://www.data-compression.com/theory.shtmlhttp://www.data-compression.com/index.shtml
  • 7/30/2019 Tutorial LPC SpeechCompression

    2/13

    Speech Compression

    (using mu-law). This corresponds to an uncompressed rate of 64 kbps (kbits/sec). With current compression techniques (all of

    which are lossy), it is possible to reduce the rate to 8 kbps with almost no perceptible loss in quality. Further compression is

    possible at a cost of lower quality. All of the current low-rate speech coders are based on the principle of linear predictive

    coding (LPC) which is presented in the following sections.

    II. LPC Modeling

    A. Physical Model:

    When you speak:

    http://www.data-compression.com/speech.shtml (2 of 13)8/9/2009 12:49:05 AM

  • 7/30/2019 Tutorial LPC SpeechCompression

    3/13

    Speech Compression

    q Air is pushed from your lung through your vocal tract and out of your mouth comes speech.

    q For certain voicedsound, your vocal cords vibrate (open and close). The rate at which the vocal cords vibrate

    determines thepitch of your voice. Women and young children tend to have high pitch (fast vibration) while adult

    males tend to have low pitch (slow vibration).

    q For certainfricatives and plosive (or unvoiced) sound, your vocal cords do not vibrate but remain constantly opened.

    q The shape of your vocal tract determines the sound that you make.

    q As you speak, your vocal tract changes its shape producing different sound.

    q The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec).

    q The amount of air coming from your lung determines the loudness of your voice.

    B. Mathematical Model:

    q The above model is often called the LPC Model.

    q The model says that the digital speech signal is the output of a digital filter (called the LPC filter) whose input is either

    a train of impulses or a white noise sequence.

    q The relationship between the physical and the mathematical models:

    Vocal Tract (LPC Filter)

    Air (Innovations)

    Vocal Cord Vibration (voiced)

    http://www.data-compression.com/speech.shtml (3 of 13)8/9/2009 12:49:05 AM

  • 7/30/2019 Tutorial LPC SpeechCompression

    4/13

    Speech Compression

    Vocal Cord Vibration Period (pitch period)

    Fricatives and Plosives (unvoiced)

    Air Volume (gain)

    q The LPC filter is given by:

    which is equivalent to saying that the input-output relationship of the filter is given by the linear difference equation:

    q The LPC model can be represented in vector form as:

    q changes every 20 msec or so. At a sampling rate of 8000 samples/sec, 20 msec is equivalent to 160 samples.

    q

    The digital speech signal is divided intoframes of size 20 msec. There are 50 frames/second.q The model says that

    is equivalent to

    Thus the 160 values of is compactly represented by the 13 values of .

    q There's almost no perceptual difference in if:

    r For Voiced Sounds (V): the impulse train is shifted (insensitive to phase change).

    r For Unvoiced Sounds (UV):} a different white noise sequence is used.

    q LPC Synthesis: Given , generate (this is done using standard filtering techniques).

    q LPC Analysis: Given , find the best (this is described in the next section).

    III. LPC Analysis

    q Consider one frame of speech signal:

    http://www.data-compression.com/speech.shtml (4 of 13)8/9/2009 12:49:05 AM

  • 7/30/2019 Tutorial LPC SpeechCompression

    5/13

    Speech Compression

    q The signal is related to the innovation through the linear difference equation:

    q The ten LPC parameters are chosen to minimize the energy of the innovation:

    q Using standard calculus, we take the derivative of with respect to and set it to zero:

    q We now have 10 linear equations with 10 unknowns:

    where

    q The above matrix equation could be solved using:

    r The Gaussian elimination method.r Any matrix inversion method (MATLAB).

    r The Levinson-Durbin recursion (described below).

    http://www.data-compression.com/speech.shtml (5 of 13)8/9/2009 12:49:05 AM

  • 7/30/2019 Tutorial LPC SpeechCompression

    6/13

    Speech Compression

    q Levinson-Durbin Recursion:

    Solve the above for , and then set

    q To get the other three parameters: , we solve for the innovation:

    q Then calculate the autocorrelation of :

    q Then make a decision based on the autocorrelation:

    http://www.data-compression.com/speech.shtml (6 of 13)8/9/2009 12:49:05 AM

  • 7/30/2019 Tutorial LPC SpeechCompression

    7/13

    Speech Compression

    IV. 2.4kbps LPC Vocoder

    q The following is a block diagram of a 2.4 kbps LPC Vocoder:

    http://www.data-compression.com/speech.shtml (7 of 13)8/9/2009 12:49:05 AM

  • 7/30/2019 Tutorial LPC SpeechCompression

    8/13

    Speech Compression

    q The LPC coefficients are represented as line spectrum pair(LSP) parameters.

    q LSP are mathematically equivalent (one-to-one) to LPC.

    q LSP are more amenable to quantization.

    q LSP are calculated as follows:

    q Factoring the above equations, we get:

    are called the LSP parameters.

    q LSP are orderedand bounded:

    q LSP are more correlated from one frame to the next than LPC.

    q The frame size is 20 msec. There are 50 frames/sec. 2400 bps is equivalent to 48 bits/frame. These bits are allocated as

    follows:

    q The 34 bits for the LSP are allocated as follows:

    q The gain, , is encoded using a 7-bit non-uniform scalar quantizer (a 1-dimensional vector quantizer).

    http://www.data-compression.com/speech.shtml (8 of 13)8/9/2009 12:49:05 AM

    http://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/vq.shtml
  • 7/30/2019 Tutorial LPC SpeechCompression

    9/13

    Speech Compression

    q For voiced speech, values of ranges from 20 to 146. are jointly encoded as follows:

    V. 4.8 kbps CELP Coder

    q CELP=Code-Excited Linear Prediction.

    q The principle is similar to the LPC Vocoder except:

    r Frame size is 30 msec (240 samples)

    r is coded directly

    r More bits are need

    r Computationally more complex

    r A pitch prediction filter is included

    r Vector quantization concept is used

    q A block diagram of the CELP encoder is shown below:

    http://www.data-compression.com/speech.shtml (9 of 13)8/9/2009 12:49:05 AM

    http://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/vq.shtml
  • 7/30/2019 Tutorial LPC SpeechCompression

    10/13

    Speech Compression

    q The pitch prediction filter is given by:

    where could be an integer or a fraction thereof.

    q The perceptual weighting filter is given by:

    where have been determined to be good choices.

    q Each frame is divided into 4 subframes. In each subframe, the codebook contains 512 codevectors.

    q The gain is quantized using 5 bits per subframe.

    q The LSP parameters are quantized using 34 bits similar to the LPC Vocoder.

    q At 30 msec per frame, 4.8 kbps is equivalent to 144 bits/frame. These 144 bits are allocated as follows:

    http://www.data-compression.com/speech.shtml (10 of 13)8/9/2009 12:49:05 AM

    S h C i

  • 7/30/2019 Tutorial LPC SpeechCompression

    11/13

    Speech Compression

    VI. 8.0 kbps CS-ACELP

    q CS-ACELP=Conjugate-Structured Algebraic CELP.q The principle is similar to the 4.8 kbps CELP Coder except:

    r Frame size is 10 msec (80 samples)

    r There are only two subframes, each of which is 5 msec (40 samples)

    r The LSP parameters are encoded using two-stage vector quantization.

    r The gains are also encoded using vector quantization.

    q At 10 msec per frame, 8 kbps is equivalent to 80 bits/frame. These 80 bits are allocated as follows:

    VII. DemonstrationThis is a demonstration of five different speech compression algorithms (ADPCM, LD-CELP, CS-ACELP, CELP, and

    LPC10).

    http://www.data-compression.com/speech.shtml (11 of 13)8/9/2009 12:49:05 AM

    S h C i

    http://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/vq.shtml
  • 7/30/2019 Tutorial LPC SpeechCompression

    12/13

    Speech Compression

    To use this demo, you need a Sun Audio (.au) Player. To distinguish subtle differences in the speech files, high-quality

    speakers and/or headphones are recommended. Also, it is recommended that you run this demo in a quiet room (with a low

    level of background noise).

    "A lathe is a big tool. Grab every dish of sugar."

    q Original (64000 bps) This is the original speech signal sampled at 8000 samples/second and u-law quantized

    at 8 bits/sample. Approximately 4 seconds of speech.

    q ADPCM (32000 bps) This is speech compressed using the Adaptive Differential Pulse Coded Modulation

    (ADPCM) scheme. The bit rate is 4 bits/sample (compression ratio of 2:1).

    q LD-CELP (16000 bps) This is speech compressed using the Low-Delay Code Excited Linear Prediction (LD-

    CELP) scheme. The bit rate is 2 bits/sample (compression ratio of 4:1).

    q CS-ACELP (8000 bps) This is speech compressed using the Conjugate-Structured Algebraic Code Excited

    Linear Prediction (CS-ACELP) scheme. The bit rate is 1 bit/sample (compression ratio of 8:1).

    q CELP (4800 bps) This is speech compressed using the Code Excited Linear Prediction (CELP) scheme. The

    bit rate is 0.6 bits/sample (compression ratio of 13.3:1).

    q LPC10 (2400 bps) This is speech compressed using the Linear Predictive Coding (LPC10) scheme. The bit

    rate is 0.3 bits/sample (compression ratio of 26.6:1).

    VIII. References

    http://www.data-compression.com/speech.shtml (12 of 13)8/9/2009 12:49:05 AM

    Speech Compression

    http://www.data-compression.com/lpc10.auhttp://www.data-compression.com/celp.auhttp://www.data-compression.com/csacelp.auhttp://www.data-compression.com/ldcelp.auhttp://www.data-compression.com/adpcm.auhttp://www.data-compression.com/original.au
  • 7/30/2019 Tutorial LPC SpeechCompression

    13/13

    Speech Compression

    1. L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals.

    2. N. Morgan and B. Gold, Speech and Audio Signal Processing : Processing and Perception of Speech and Music .

    3. J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals.

    4. S. Furui, Digital Speech Processing, Synthesis and Recognition.

    5. D. O'Shaughnessy, Speech Communications : Human and Machine.

    6. A. J. Rubio Ayuso and J. M. Lopez Soler, Speech Recognition and Coding : New Advances and Trends.7. M. R. Schroeder, Computer Speech: Recognition, Compression, Synthesis.

    8. B. S. Atal, V. Cuperman, and A. Gersho, Speech and Audio Coding for Wireless and Network Applications.

    9. B. S. Atal, V. Cuperman, and A. Gersho, Advances in Speech Coding.

    10. D. G. Childers, Speech Processing and Synthesis Toolboxes.

    11. R. Goldberg and L. Rick, A Practical Handbook of Speech Coders.

    Home Theory Lossless VQ Speech Image Download Links

    Support the EFF

    Copyright 2000-2007. All rights reserved.

    http://www data-compression com/speech shtml (13 of 13)8/9/2009 12:49:05 AM

    http://www.amazon.com/exec/obidos/ASIN/0132136031/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0471351547/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0780353862/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0824779657/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0780334493/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/3540600981/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/3540643974/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0792393457/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0792390911/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0471349593/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0849385253/datacompcomawebshttp://www.data-compression.com/index.shtmlhttp://www.data-compression.com/theory.shtmlhttp://www.data-compression.com/lossless.shtmlhttp://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/image.shtmlhttp://www.data-compression.com/download.shtmlhttp://www.data-compression.com/links.shtmlhttp://www.eff.org/http://www.data-compression.com/links.shtmlhttp://www.data-compression.com/download.shtmlhttp://www.data-compression.com/image.shtmlhttp://www.data-compression.com/vq.shtmlhttp://www.data-compression.com/lossless.shtmlhttp://www.data-compression.com/theory.shtmlhttp://www.data-compression.com/index.shtmlhttp://www.amazon.com/exec/obidos/ASIN/0849385253/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0471349593/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0792390911/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0792393457/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/3540643974/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/3540600981/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0780334493/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0824779657/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0780353862/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0471351547/datacompcomawebshttp://www.amazon.com/exec/obidos/ASIN/0132136031/datacompcomawebs