Lec04, Speech II, v1.06.ppt - ce.sharif.educe.sharif.edu/courses/91-92/2/ce342-1/resources... ·...

Multimedia SystemsMultimedia Systems

Speech IISpeech II

Course PresentationCourse Presentation

Mahdi Amiri

February 2013

Sharif University of Technology

Speech Compression

Based on Time Domain analysis

Differential Pulse-Code Modulation (DPCM)

Adaptive DPCM (ADPCM)

Road MapRoad Map

Multimedia Systems, Speech II

Based on Frequency Domain analysis

Linear Predictive Coding (LPC)

Code Excited Linear Prediction (CELP)

Differential PCM (DPCM)IdeaIdea

Take advantage of data redundancy

[… 110 112 111 112 112 114 115 115 114 114… ] [… +2 -1 +1 0 +2 +1 0 -1 0 …]

Or histogram of PCM samples in a chunk

of digitized audio.

Differential PCM (DPCM)Basic SchemeBasic Scheme

General Predictive Coding

1Delta Modulation (DM): i n ia x z

−⇒∑

Problem?

Differential PCM (DPCM)Error PropagationError Propagation

General Predictive Coding

The output of dequantizer in decoder is not equal with the input of the

quantizer in the encoder � The input of predictor in decoder is not the

same as input values of predictor in encoder � This is the source of error

propagation.

Differential PCM (DPCM)Better StructureBetter Structure

Adaptive DPCM (ADPCM)IdeaIdea

Problem?

Adaptive DPCM (ADPCM)Size of Quantization StepSize of Quantization Step

Delta Modulation (DM)

1 bit quantizer: 0 means + and 1 means ∆ −∆

ADM: [ ] [ 1]n M n∆ = ∆ −

P Q= =

1 if [ ] [ 1]

M P c n c n

M Q c n c n

= > = −

= < ≠ −

Adaptive Delta Modulation (ADM)

Speech Compression ConceptsFFT, No Time LocalizationFFT, No Time Localization

Speech Signal

(is only localized in frequency)

Joseph Fourier, 1768-1830

Speech Compression ConceptsFFT, No Time LocalizationFFT, No Time Localization

See Power Spectral Density (PSD) examples in MATLAB

Speech Compression ConceptsSTFTSTFT

Speech Signal

(fixed time and frequency localization)

Dennis Gabor, 1900-1979

Speech Compression ConceptsSpectrogramSpectrogram

3D surface spectrogram of a part

from a music piece.

Speech Compression ConceptsSpectrogramSpectrogram

Spectrogram of a male voice saying ‘nineteenth century’.

Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity

Waveform

Spectrogram

Speech Compression ConceptsSpectrogram Display in Spectrogram Display in AudaCityAudaCity

AudaCity | Edit | Preferences |

Spectrograms | FFT Window |

Window size

FFT Window size:128

FFT Window size:1024

Speech Compression ConceptsSpectrogram, DemonstrationSpectrogram, Demonstration

Bat Echolocation Call Flute by Jean Pierre Rampal

Singing Voice Face!

Speech Compression ConceptsFormantFormant

The time and frequency domain

presentation of vowels /a/, /i/, and /u//a/

Speech Compression ConceptsSample ApplicationSample Application

A computing system to answer

questions posed in natural language

Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson

www-943.ibm.com/innovation/us/watson/

Dr. David Ferrucci, Watson Principal Investigator

Linear Predictive Coding (LPC)ModelingModeling

Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)

Buzzer � Filter

Speech = Formants + Residue

Chuncks: 30 thr. 50 frames/sec.

[ ] [ ]P

x n a x n i=

= −∑ɶPredictor for each frame:

Speech = Formants + Residue

Linear Predictive Coding (LPC)Modeling (Hiss or Buzz)Modeling (Hiss or Buzz)

The human vocal tract as an infinite impulse response (IIR) system Vowel /a/

LPC Block Diagram

Linear Predictive Coding (LPC)Original Paper, Original Paper, AtalAtal--HanauerHanauer 19711971

Original

Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's

time we rounded up that herd of Asian cattle," spoken by a male speaker

Original

Synthetic

Linear Predictive Coding (LPC)Voiced Frame ExampleVoiced Frame Example

Original

Synthetic

Time Domain Frequency Domain

180 samples, Pitch period: 75

Linear Predictive Coding (LPC)Unvoiced Frame ExampleUnvoiced Frame Example

Original

Synthetic:

White noise

with uniform

distribution

Time Domain Frequency Domain

180 samples

Code Excited Linear Prediction

Problem of LPCWhere there is both Hiss and Buzz

Solution

CELPCELP

Encoder

SolutionEncode residue

MethodVector Quantization

(Codebook)Decoder

Vector QuantizationBlock DiagramBlock Diagram

Vector QuantizationExampleExample

Sample scalar quantizer

We have 3 possible colors for

each square; so we can quantize

each square with 2 bits � (28 *

2 = 56 bits for all 28 (7*4)

squares.

Sample vector quantizer

We have 8 forms in the

codebook; so we can quantize

each form with 3 bits � (7 * 3

= 21 bits for all 28 (7*4)

squares.Codebook

Vector QuantizationCodebook DesignCodebook Design

Comparison of Speech CodersSample SpeechSample Speech

A lathe is a big tool. Grab every dish of sugar.

Comparison of Speech CodersDemonstrationDemonstration

Original ADPCM

LPC CELP

Speech Coding

u-law, a-law

64, 80 and 96 kbps

ITUITU--T StandardsT Standards

Check out a complete list athttp://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs

A comparison of Internet audio compression formats

http://www.sericyb.com.au/audio.html

48, 56 and 64 kbps

A form of CELP

16 kbps

Vocoders

http://www.sericyb.com.au/audio.html

Speech Coding

HawkVoice

Free and Open Source CodeFree and Open Source Code

http://hawksoft.com/hawkvoice/

Check out voice samples of HawkVoice™ codecs at

http://hawksoft.com/hawkvoice/codecs.shtml

Thank You

Multimedia SystemsMultimedia Systems

Speech IISpeech II

Thank You

1. http://ce.sharif.edu/~m_amiri/

2. http://www.dml.ir/

FIND OUT MORE AT...

Next Session: Entropy CodingNext Session: Entropy Coding

Lec04, Speech II, v1.06.ppt - ce.sharif.educe.sharif.edu/courses/91-92/2/ce342-1/resources... ·...

Documents

Transcript of Lec04, Speech II, v1.06.ppt - ce.sharif.educe.sharif.edu/courses/91-92/2/ce342-1/resources... ·...

Speech Coding Techniques (I) - Sharifce.sharif.edu/courses/86-87/2/ce342/resources/root/Lecture/speech...Speech Coding Techniques (I) ... analog communication H X = ... To understand

Unit 1 Friendship Grammar Direct speech & Indirect speech Grammar Direct speech & Indirect speech.

TimeDomain Speech Analysis - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/speechanalysis1.pdf · TimeDomain Speech Analysis Motivation for shorttime processing Fundamental

Lec04, Speech II, v1.10.ppt - Sharifce.sharif.edu/courses/94-95/2/ce342-1/resources/root/Lectures/Lec04...Page 1 Multimedia Systems, Speech II Speech Compression Based on Time Domain

Digital System Design Lecture 2: Design - ce.sharif.educe.sharif.edu/~gharehbaghi/DSD/2- Design.pdf · Digital System Design Lecture 2: Design Amir Masoud Gharehbaghi ... to VHDL

MHP - ce.sharif.educe.sharif.edu/courses/84-85/2/ce342/resources/root... · mhp -1 MHEG , Java TV , ATVEF MHP

The Very Hungry Caterpillar sequencing - Speech-Fun.com · The Very Hungry Caterpillar Sequencing speech-fun.com speech-fun.com speech-fun.com speech-fun.com speech-fun.com speech-fun.com

Lecture 16: Target Architectures - ce.sharif.educe.sharif.edu/courses/86-87/2/ce333/resources/root/Lecture Notes/L16-Target... · Power Consumption is a primaryPower Consumption is

ce.sharif.educe.sharif.edu/courses/90-91/2/ce151-1/resources/root... · Web viewهمانطور که میدانیم مساحت ربع دایره به شعاع یک برابر با

ce342 - Clinical Practice Guideline for an Infection ...

Smart Cards Cryptography and IT Security - ce.sharif.educe.sharif.edu/~boorghany/pubdown/Boorghany-SmartCards-v1.pdf · Ahmad Boorghany Smart Cards & Cryptography and IT Security

speech motion - Speech and Motion Therapy - Speech and ...

Lec01, Overview, v1.06 NoPhoneNum.ppt - ce.sharif.educe.sharif.edu/courses/94-95/2/ce342-1/resources/root/Lectures/Lec01... · Encoder Diagram Decoder Diagram ... PAL Analog Color

Multimedia Systemsce.sharif.ir/courses/89-90/2/ce342-1/resources/root... · Page 1 Multimedia Systems, Introduction Multimedia System Review course website resources Announcements

Overlay Networks - ce.sharif.educe.sharif.edu/~b_momeni/ce443/resources/09-overlay-networks.pdf · Overlay Networks Behnam Momeni Computer Engineering Department Sharif University

WordPress.com · 2016-01-29 · ž8. Maiden speech means- @ First speech @ Middle speech O Maid servant's speech 771 @ Final speech Maiden speech (First speech) I Botany is to plants

Multimedia Systems - Sharifce.sharif.edu/courses/89-90/2/ce342-1/resources/root... · 2020. 9. 7. · Page 7 Multimedia Systems, Speech I Pulse-code Modulation (PCM) 8,000 Hz - Telephone,

Static Random Network Models - ce.sharif.educe.sharif.edu/.../5-StaticRandomNetworkModels.pdf · Static Random Network Models •Static random network models: •Static random model

Lec03, Speech I, v1.12.ppt - Sharifce.sharif.edu/courses/92-93/2/ce342-1/resources/root/Lectures/Lec03... · Speech I Course Presentation Mahdi Amiri February 2014 Sharif University

Human Auditory System - Sharifce.sharif.edu/.../2/ce342/resources/root/Lecture/humanhearing.pdfPhysiology of Human Auditory System Frequency discrimination Early work suggested that