LING 439/539: Statistical Methods in Speech and Language Processing

27
1 LING 439/539: Statistical Methods in Speech and Language Processing Ying Lin Department of Linguistics University of Arizona

description

LING 439/539: Statistical Methods in Speech and Language Processing. Ying Lin Department of Linguistics University of Arizona. Welcome!. Get the syllabus Fill out and return the information sheet Email: [email protected] Office: Douglass 224 - PowerPoint PPT Presentation

Transcript of LING 439/539: Statistical Methods in Speech and Language Processing

Page 1: LING 439/539: Statistical Methods in Speech and Language Processing

1

LING 439/539: Statistical Methods in Speech and Language Processing

Ying LinDepartment of Linguistics

University of Arizona

Page 2: LING 439/539: Statistical Methods in Speech and Language Processing

2

Welcome! Get the syllabus Fill out and return the information sheet Email: [email protected] Office: Douglass 224 OH: MW 2:00 --3:00 by appoint (also

teaching another undergrad class) Course webpage: see syllabus Listserv coming soon.

Page 3: LING 439/539: Statistical Methods in Speech and Language Processing

3

438/538 and 439/539 LING 438/538 (Computational

Linguistics): Symbolic representations (mostly

syntax), e.g. FSA, CFG. Focus on logic Simple probabilistic models, e.g. N-

grams.

Page 4: LING 439/539: Statistical Methods in Speech and Language Processing

4

438/538 and 439/539 This class complements 438/538:

Numerical representations (speech signals): need digital signal processing

Focus on statistics/learning More sophisticated probabilistic

models, e.g. HMM, PCFG

Page 5: LING 439/539: Statistical Methods in Speech and Language Processing

5

Main reference texts (!) Huang, Acero and Hon (2001). Spoken Language

Processing: A guide to theory, algorithm, and system development. Prentice-Hall.

Manning and Schutze (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Rabiner and Juang (1993). Fundamental of Speech Recognition. Prentice-Hall.

Duda, Hart and Stork (2001). Pattern Classification (2nd ed). JohnWiley & Sons.

Rabiner and Schafer (1978). Digital Processing of Speech Signals. Prentice-Hall.

Hastie, Tibshirani and Friedman (2001). The Elements of Statistical Learning. Springer.

Page 6: LING 439/539: Statistical Methods in Speech and Language Processing

6

Guideline for course reading There is no single book that covers all of

our materials. Most books are written either for EE or CS

audience only. A few chapters are selected from each

book (see the reading list). Lecture notes will summarize the reading.

Expect a rough ride for the first time -- feedback is greatly appreciated!

Page 7: LING 439/539: Statistical Methods in Speech and Language Processing

7

Three skills for this class 1. Linguistics: understanding source

of particular patterns. 2. Math/Statistics: underlying

principles of the model. 3. Programming: implementation This class emphasizes 2, reason:

Models are based on simple structures Programming skills require much practice

Page 8: LING 439/539: Statistical Methods in Speech and Language Processing

8

What is “statistical approach”? Narrow: uses statistical principle,

I.e. based on the probability calculus or other theories of inductive inference Compared to logic: dedutive inference

Broad: any work that uses a quantative measure of success Relevant to both language

engineering and linguistic science

Page 9: LING 439/539: Statistical Methods in Speech and Language Processing

9

What is “statistical approach”? Narrow: uses statistical principle,

I.e. based on the probability calculus or other theories of inductive inference Compared to logic: dedutive inference

Broad: any work that uses a quantative measure of success Relevant to both anguage

engineering and linguistic science

Thiscourse

Page 10: LING 439/539: Statistical Methods in Speech and Language Processing

10

Language engineering: speech recognition Tasks: increasing level of difficulty

WordErrorRate

Page 11: LING 439/539: Statistical Methods in Speech and Language Processing

11

A brief history of speech recognition 1950’s: U.S. government started

funding research on automatic recognition of speech

1960-70’s: Isolated words, digit strings Debate: rules v.s. statistics Dynamic time warping

1980-now: continuous speech, speech understanding, spoken dialog Hidden Markov model dominates

Page 12: LING 439/539: Statistical Methods in Speech and Language Processing

12

Why the rules didn’t work? Completely bottom-up approach:

Rules are hand-coded by experts Problem: variability in speech

Sophisticated, symbolic rules are not flexible enough to handle continuous speech

“How are you?”

Phonetic rules

Phonologicalrules

Page 13: LING 439/539: Statistical Methods in Speech and Language Processing

13

The rise of statistical methods in speech Initial solution: hire many linguists to

continually improve the rule system This turns out to be costly and slow, failing the

high expectation Advantage of statistical models:

Allows training on different data: flexible, scalable Computing power much cheaper than expert Drives the move to less and less constrained tasks

Bitterness: “every time I fire a linguist, the word error rate goes up” -- F. Jelinek (IBM)

Page 14: LING 439/539: Statistical Methods in Speech and Language Processing

14

The rise of statistics in NLP Very similar scenarios also happened in NLP:

E.g. tagging, parsing, machine translation “Old” NLP: deductive systems, hand-coded “New” NLP: broad-coverage, corpus-based,

emphasize training, evaluation Speech is now merging with NLP

Many tools originated in speech, then got copied to NLP

New task keep emerging: web as an (unstructured) data source

Page 15: LING 439/539: Statistical Methods in Speech and Language Processing

15

Basic architecture of today’s ASR system

Audio speech Featureextraction

X

Model parameters trained offline:M1 = “I recognize speech”M2 = “I wreck a nice beach”…

Acoustic modeling

Likelihoodp(X|M1), p(X|M2) Scoring

Languagemodel

rank

p(M1),p(M2)

ANSWER

Page 16: LING 439/539: Statistical Methods in Speech and Language Processing

16

Component 1: signal processing / feature extraction First 1/3 of the course (also useful

for understanding synthesis):

Page 17: LING 439/539: Statistical Methods in Speech and Language Processing

17

Examples of some common features

Page 18: LING 439/539: Statistical Methods in Speech and Language Processing

18

Component 2: Acoustic models Mixture of Gaussians: p(ot | qi) =

Dimension reduction: principle component analysis, linear discriminant analysis, parameter tying

Page 19: LING 439/539: Statistical Methods in Speech and Language Processing

19

Component 3:Pronunciation modeling Model for differnent pronunciations of

“you” in continuous speech

Other types of units: triphones, syllables

ou

j

a

endstart

Each unit is an HMM

Page 20: LING 439/539: Statistical Methods in Speech and Language Processing

20

Component 4: Language model

Provide the probability of word sequence models p(M) to combine with the acoustic model p(X|M) Common: N-gram with smoothing, backoff,

very hard and specialized business Just starting to integrate parsing Fundamental equation:

M* = argmaxM p(M|X) = argmaxM p(X|M)p(M)Viterbi, beam, A*, N-best search

Page 21: LING 439/539: Statistical Methods in Speech and Language Processing

21

ASR: example of a generative model Component 2+3+4 provide an instance of generative models Language M generates word sequences Word sequence generates pronunciation Pronunciation generates acoustic features

Unsupervised learning/training Maximum likelihood estimation Expectation-Maximization algorithm (different

incarnations) Main focus of this class

Page 22: LING 439/539: Statistical Methods in Speech and Language Processing

22

Other models to look at: Descriptive/maximum entropy models

Started in vision, then copied to speech, then NLP

Discriminative models: directly using data to construct classifiers, with weak assumptions about prob distribution

Supervised learning, focus on the perspective of classification

Input string Feature vector Output labelscount classifier

“Machine learning approach to NLP”

Page 23: LING 439/539: Statistical Methods in Speech and Language Processing

23

Problem solved? No, improvements are mostly due

to larger training set and speed up

Driven byMoore’s law?

Page 24: LING 439/539: Statistical Methods in Speech and Language Processing

24

Challenges Environment distortion (microphone, noise,

cocktail party) breaks feature extraction Acoustic condition mismatch

Between + within speaker variability breaks the pronunciation modeling and acoustic modeling

Conversational speech breaks the language model

Understanding these problems is crucial for improving the performance of ASR

Page 25: LING 439/539: Statistical Methods in Speech and Language Processing

25

Dreaming “2001: A Space Odyssey” (1968)

Dave: “Open the pod bay doors, HAL”

HAL9000: “I’m sorry Dave. I’m afraid I can’t do that.”

Page 26: LING 439/539: Statistical Methods in Speech and Language Processing

26

The reality,before the problem is solved Speech is used as a user interface

only when people can’t use hand Driving a car (use speech to drive?) Device too small (cellphone) Customer service (who will tolerate

touch tone?) Dictation (how many people actually

use it?)

Page 27: LING 439/539: Statistical Methods in Speech and Language Processing

27

For next time: We will start with signal processing

Uses engineering math, including power series (including convergence), trigonometric functions, integration and representation of complex numbers.

If you forgot or do not know these materials, please look for references and study it before class.