26. Applied IR and Natural Language Processing [1]...

37
26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L... 1 of 37 11/25/2006 3:30 PM 26. Applied IR and Natural Language Processing [1] IS 202 - 28 November 2008 Bob Glushko

Transcript of 26. Applied IR and Natural Language Processing [1]...

Page 1: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

1 of 37 11/25/2006 3:30 PM

26. Applied IR and Natural Language Processing [1]

IS 202 - 28 November 2008

Bob Glushko

Page 2: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

2 of 37 11/25/2006 3:30 PM

Plan for Today's Class

Plan for the "Home Stretch"

Overview and Examples of Applied IR and Natural Language Processing

Machine Translation

Speech Processing

Page 3: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

3 of 37 11/25/2006 3:30 PM

Plan for the Home Stretch

We are grading your last two assignments

This Week: 2 Lectures on applied information retrieval and natural language processing

Next Tuesday: 202 Alumni Day -- IO & IR from the perspective of recent I-School grads

Thursday 12/8 - last day of class -- course review

Tuesday 12/12 -- final exam (9-12)

Page 4: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

4 of 37 11/25/2006 3:30 PM

Natural Language Processing

What Dave says

Hal's reply

Page 5: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

5 of 37 11/25/2006 3:30 PM

Natural Language Processing

NLP has the goal of creating computers and machines that can use natural language -- i.e., the language used by people -- as their inputs and outputs

The field is broad, and involves computer science, linguistics, cognitive psychology, statistics

We're including this "taste" of NLP in this course because it illustrates many IR techniques and in some cases illustrates the tradeoffs between IO and IR

Page 6: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

6 of 37 11/25/2006 3:30 PM

Real World Applications [1]

Machine Translation

Spelling Suggestions/Corrections

Grammar Checking

Speech Processing

Text Categorization and Clustering

Page 7: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

7 of 37 11/25/2006 3:30 PM

Real World Applications [2]

Summarization

Synonym Generation

Text Data Mining

Information Extraction

Automated Customer Service

Question Answering

Automated Metadata Assignment

Page 8: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

8 of 37 11/25/2006 3:30 PM

Two Radically Different Approaches to NLP

Linguistic Approach:

Linguistic models of grammar, morphology, and phonology are essential prerequisites for NLP

We also need to develop models of the "human language processor" and combine them with the linguistic models

Statistical Approach:

Statistical analysis of language samples reveals its structure and patterns

This extracted knowledge -- represented as the occurrence or co-occurrence probabilities of specific things -- can answer many of the questionsthat supposedly require "understanding" or more abstract "rules"

Page 9: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

9 of 37 11/25/2006 3:30 PM

An Important Motivation for Data-Driven Language Study

Page 10: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

10 of 37 11/25/2006 3:30 PM

Early Data-driven Perspectives on Language [1]

Just as in the last 10-15 years, as cognitive science has emerged in the intersection of cognitive psychology, linguistics, and computer science, if we look back to when computers were first being invented around 1940 the study of language reflected the prevailing psychology theories of behaviorism and structuralism

These theories held that language was learned through empirical learning mechanisms of conditioning, association, practice in exercising skills

These stimulus -> response notions do not postulate any internal knowledge representation.This perspective on language suggests a statisticalapproach to NLP

Page 11: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

11 of 37 11/25/2006 3:30 PM

Early Data-driven Perspectives on Language [2]

GK Zipf first expressed "Zipf's Law" in a 1935 book titled Psycho-Biology of Language

All of the work in WWII on codebreaking and cryptography emphasizes the empirical study of word and language patterns (Turing)

Claude Shannon on information theory (1948) saysthat the information in a message is not defined by its content but by its probability of being chosen among several alternatives

Page 12: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

12 of 37 11/25/2006 3:30 PM

Chomsky's Argument for "Deep" Language Analysis [1]

In late 1950s the statistical approaches fell out of favor, mostly because of the work of Noam Chomsky (Syntactic Structures, 1957)

Colorless green ideas sleep furiously

Furiously sleep ideas green colorless

Neither of these sentences is natural language andwon't occur in language samples, but the former is grammatical and the second isn't

Page 13: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

13 of 37 11/25/2006 3:30 PM

Chomsky's Argument for "Deep" Language Analysis [2]

So language knowledge can't be based on learned behavior -- because the relevant data is sparse (many reasonable sentences never appear); instead, it is generative and based on rules and representations

This view of language as a formal, mathematically-describable system became the dominant view in linguistics and computer science

Page 14: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

14 of 37 11/25/2006 3:30 PM

Probabilistic Models: The Data Strikes Back

These spoken phrases can be acoustically identical:

Your lie cured mother

You like your mother

Starting in the late 1980s, speech recognition systems used huge sets of speech data to build probabilistic models

Page 15: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

15 of 37 11/25/2006 3:30 PM

NLP by Computers != NLP by People

The "rules of language" are a theory of the knowledge that fluent speakers possess (competence), not a theory of how they generate and understand language (performance)

We have no conscious awareness of how we process language, and while we can sometimes explicitly apply the "rules" it certainly doesn't seem that we use that kind of abstract information in "normal" NLP

But likewise, it doesn't seem that we explicitly use information about the likelihood of various language structures occurring or co-occurring, which is what statistical NLP does

Page 16: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

16 of 37 11/25/2006 3:30 PM

"Learning the statistics" != "Statistically-driven learning"

What people seem to do as they learn language is"statistically-driven learning," not "learning the statistics"

That is, they use statistical evidence to build knowledge about the language into internal representations and language processing mechanisms that are more general than the specific data they were built with

Human language processing has been successfullymodeled using neural nets and other representations in which the "statistics" are encoded in the pattern of activations in a distributed way

Page 17: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

17 of 37 11/25/2006 3:30 PM

Combined Approaches

Today both the linguistic and data-driven approaches are seen as integral and complementary parts of an NLP application

Systems employ sophisticated techniques for dictionaries and grammars to identify parts of speech and do morphological analysis

The web is such a huge corpus that empirical approaches can be surprisingly informative

Page 18: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

18 of 37 11/25/2006 3:30 PM

Text Corpora

Computational linguists, computer scientists, experimental psychologists and others rely on text corpora for their research

Prominent pre-web examples include the Brown corpus (Kucera and Francis, 1967) that includes a million words of contemporary American English...

... and the British National Corpus (http://www.natcorp.ox.ac.uk/) that contains 100 million words of contemporary British English

But as large as the BNC is, because of Zipf's Law most words occur fewer than 50 times in 100,000,000 words -- not frequent enough to draw statistical conclusions

Page 19: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

19 of 37 11/25/2006 3:30 PM

The Web as Corpus

The web is not representative of anything other than itself, but then neither are other text corpora

But the web dwarfs any other (possible?) corpus -- Google probably indexes a few trillion words, making it orders of magnitude larger than any othertext collection

And most of it is freely available

Page 20: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

20 of 37 11/25/2006 3:30 PM

Phrases in BNC and Google

Page 21: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

21 of 37 11/25/2006 3:30 PM

Simplistic Computational Linguistics Using Web Corpus

Spelling correction weird or wierd? (compare the body and head titles here)

Word selection in translation:French phrase groupe de travail

groupe translates to cluster, group, grouping, concern, collective

travail translates to work, labor, labour

Page 22: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

22 of 37 11/25/2006 3:30 PM

Machine Translation: An Apocryphal but Important Example

A story often told about the early days of machine translation research (1950s) is that the English sentence:

The spirit is willing, but the flesh is weak

when translated into Russian, and then back to English became:

The vodka is strong but the meat is rotten

Page 23: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

23 of 37 11/25/2006 3:30 PM

Machine Translation: A BriefHistory [1]

Great optimism in the 1950s was followed by extreme pessimism

In 1966 a report by a the Automatic Language Processing Advisory Committee (ALPAC) concluded that "there is no immediate or predictable prospect of useful machine translation" and instead recommended the development of computer aids for human translators

Fortunately, ALPAC also recommended the continued support of basic research in computational linguistics

In the 1970s and 1980s MT systems continued to develop, strongly driven by the emergence of microcomputers and text processing. The dominanttechnical strategy relied on hand-crafted syntactic parsers, morphological analyzers, and dictionaries - intensely semantic and rule-based approaches.

Page 24: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

24 of 37 11/25/2006 3:30 PM

Machine Translation: A BriefHistory [2]

The beginning of the 1990s was a major turning point. An IBM research group developed a systemcalled Candide that relied purely on statistical analysis and "example-based" methods for phrase matching and translation

Candide used a very large corpus of English and French documents that had extremely reliable translations (in both directions)

This approach has really taken off with the emergence of the Web for obvious reasons...

Page 25: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

25 of 37 11/25/2006 3:30 PM

How Good is Machine Translation? [1]

Google Translate

Original Text: Microsoft's release of its Xbox 360 video-games console begins a new phase in the battle to remove Sony's PlayStation from the top spot. If it succeeds, the software giant may be tempted to make more incursions into the competitive market for home-entertainment hardware.

Roundtrip through French (Nov 2005):The release of Microsoft of its console the videoone of Xbox 360 begins a new phase in the battle to remove PlayStation of Sony of the higher spot. If itsucceeds, the giant of software can be tried to transform more incursions into the competitive market for the material of house-entertainment.

Roundtrip through French (Nov 2006):The release of Microsoft of its console the videoone of Xbox 360 begins a new phase in the battle to remove PlayStation of Sony of the higher spot. If it

Page 26: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

26 of 37 11/25/2006 3:30 PM

succeeds, the giant of software can be tried to transform more incursions into the competitive market for the material of house-entertainment

How Good is Machine Translation? [2]

Original Text: Microsoft's release of its Xbox 360 video-games console begins a new phase in the battle to remove Sony's PlayStation from the top spot. If it succeeds, the software giant may be tempted to make more incursions into the competitive market for home-entertainment hardware.

Roundtrip through German (Nov 2005): ReleaseMicrosofts of its video game console Xbox 360 begins a new phase in the battle for removing from PlayStation Sonys from the upper point. If itfollows, the software giant can be provoked, in order to form more ideas into the free market for house maintenance small articles.

Roundtrip through German (Nov 2006): Release of Microsoft of its Xbox 360 video game console begins a new phase in the battle to remove to of PlayStation Sony from the upper point to. If it follows, the software giant can be provoked, in

Page 27: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

27 of 37 11/25/2006 3:30 PM

order to form more ideas into the free market for house maintenance of small articlesHow Good is Machine

Translation? [3]

Original Text: Microsoft's release of its Xbox 360 video-games console begins a new phase in the battle to remove Sony's PlayStation from the top spot. If it succeeds, the software giant may be tempted to make more incursions into the competitive market for home-entertainment hardware.

Roundtrip through Chinese (Nov 2005): Its Xbox 360 video-games control bench Microsoft. The srelease starts one new stage removes Sony in thisbattle; s PlayStation from this top spot. If itsucceeds, perhaps the software giant does invadesinto the competitive market for the familyentertainment hardware

Roundtrip through Chinese (Nov 2006): Microsoft releases xbox360 video game consoles beginning of a new stage in the battle to remove Sony's games from the top. If successful,The software giant may be more competitive intrusion into family

Page 28: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

28 of 37 11/25/2006 3:30 PM

entertainment hardware market

Word Sense Disambiguation using Lexical Co-Occurrences

The co-occurrences of words in a text collection can tell us what the documents are about and distinguish different senses of polysemous words

Co-occurrences of frequent words are uninteresting:

"doctor" co-occurs with "with," "a," and "is"

Co-occurrences between two less frequent words help up understand what a text is about:

"doctor" co-occurs with "honorary," "dentist," "nurse," "examine, "treat," etc.

"drug" co-occurs with "price," "prescription" and "patient" in medical contexts and with "abuse," "paraphernalia," and "illicit" in non-medical contexts

Page 29: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

29 of 37 11/25/2006 3:30 PM

Pre-Nominal Adjective Ordering

In translation and text generation we need to arrange nouns and adjectives:

My big fat Greek wedding is grammatical

My fat Greek big wedding is not

Some NLP approaches attempt this using rules

But it can also be done using data-intensive approaches; compare frequency of {a, b} with {b, a}

Page 30: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

30 of 37 11/25/2006 3:30 PM

Speech Recognition

As we saw in the 2001: A Space Odyssey exampleI started with, speech recognition has great potential to enhance how people interact with machines

Like machine translation, speech recognition has a long history in NLP, but it raises many additional technical challenges because of the acoustical and signal processing challenges

And like machine translation, speech processing had a rule-based phase but it is now much more statistical in character

Page 31: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

31 of 37 11/25/2006 3:30 PM

Speech Recognition - SimplifiedProcess Depiction

Page 32: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

32 of 37 11/25/2006 3:30 PM

Speech Recognition -- Uses Transition Probabilities

Page 33: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

33 of 37 11/25/2006 3:30 PM

Speech Recognition - Hidden Markov Models

Page 34: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

34 of 37 11/25/2006 3:30 PM

Speech Recognition for Consumers

Continuous speech recognition systems for dictation can be remarkably accurate if the system has been trained to recognize the speaker's voice

Nuance Dragon Naturally Speaking(www.nuance.com/naturallyspeaking)

IBM Via Voice(www-306.ibm.com/software/voice/viavoice)

Page 35: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

35 of 37 11/25/2006 3:30 PM

Speech Recognition in IVR Applications

Integrated Voice Response (IVR) systems commonly used in automated customer support and self-service applications

They need to be speaker-independent, and achieving this is easier to the extent that the domain and vocabulary are constrained

This technology is NOT for consumers; it is often sold by telecom OEMs and VARs (EXAMPLE)

Page 36: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

36 of 37 11/25/2006 3:30 PM

Speech Synthesis

Text-to-speech synthesis (thanks to ATT labsdemo)

Being able to synthesize speech from computer-readable text is useful in applications that require spontaneous interaction ...

... or where reading isn't practical, like when you're driving and need directions.

Speech synthesis enables accessibility for people who are sight- or voice-impaired

When TTS is aimed at the general public, "naturalness" is essential to acceptance

TTS seems like the reverse of speech recognition, but there is limited technology overlap and neither involves much language understanding

Page 37: 26. Applied IR and Natural Language Processing [1] (1)courses.ischool.berkeley.edu/i202/f06/LectureNotes/202-20061128.pdf · 26. Applied IR and Natural Language Processing [1] (1)

26. Applied IR and Natural Language Processing [1] (1) file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...

37 of 37 11/25/2006 3:30 PM

Readings for IO & IR Lecture #27

"A Plan for Spam" Paul Graham

"Web question answering: Is more always better? (SIGIR '02)" Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, Andrew Ng

"Mining and Summarizing Customer Reviews"Minquing Hu and Bing Liu. Proceedings of thetenth ACM SIGKDD (2004)