INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For...

63
1 of 63 INFORMATION EXTRACTION CS8691-AI-AI-UNITV-INFORMATIONEXTRACTION

Transcript of INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For...

Page 1: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

1of 63

INFORMATION EXTRACTION

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 2: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

2of 63

Text Classification by Example

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 3: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

3of 63

Text Classification by Example

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 4: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

4of 63

Text Classification by Example

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 5: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

5of 63

Text Classification by Example

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 6: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

6of 63

Text Classification by Example

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 7: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

7of 63

How could you build a text classifier?

• Take some ideas from machine learning– Supervised learning setting– Examples of each class (a few or thousands)

• Take some ideas from machine translation– Generative models– Language models

• Simplify each and stir thoroughly

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 8: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

8of 63

Basic Approach of Generative Modeling

1. Pick representation for data

2. Write down probabilistic generative model

3. Estimate model parameters with training data

4. Turn model around to calculate unknown values for new data

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 9: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

9 of 63

Naïve Bayes: Bag of Words Representation

Corn prices rose today while corn futures dropped in surprising trading activity. Corn ...

activity 1 cable 0 corn 3 damp 0

drawer 0 dropped 1 elbow 0

earning 0 . . . . . .

All words in dictionary

Occurrence counts

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 10: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

10of 63

Naïve Bayes: Mixture of Multinomials Model1. Pick the class: P(class)

2. For every word, pick from the class urn: P(word|class)

while

polo

socceractivity

droppedsoccer

the

ball

COMPUTERS SPORTS

thein

web

windows

the

thein

java

windows

again

modem

Word independence assumption!CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 11: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

11of 63

Naïve Bayes: Estimating Parameters• Just like estimating biased coin flip probabilities

• Estimate MAP word probabilities:

• Estimate MAP class priors:

classdoc

classdoc

docNVocab

docwordclassword

)(||

),N(1)|P(

)N()N(),N(1)P(

docclassclassdocclass

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 12: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

12of 63

Naïve Bayes: Performing Classification

• Word independence assumption

• Take the class with the highest probability

docword

classwordclass )|P()P(

)(P)(P)|(P)|(P

docclassclassdocdocclass

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 13: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

13of 63

Classification Tricks of the Trade• Stemming

– run, runs, running, ran run– table, tables, tabled table– computer, compute, computing compute

• Stopwords– Very frequent function words generally uninformative– if, in, the, like, …

• Information gain feature selection– Keep just most indicative words in the vocabulary

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 14: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

14of 63

Naïve Bayes Rules of Thumb

• Need hundreds of labeled examples per class for good performance (~85% accuracy)

• Stemming and stopwords may or may not help

• Feature selection may or may not help

• Predicted probabilities will be very extreme

• Use sum of logs instead of multiplying probabilities for underflow prevention

• Coding this up is trivial, either as a mapreduce or not

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 15: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

15of 63

Information Extraction with Generative Models

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 16: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

16of 63

Example: A Problem

Genomics job

Mt. Baker, the school district

Baker Hostetler, the company

Baker, a job opening

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 17: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

17of 63

Example: A Solution

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 18: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

18of 63

Job Openings:Category = Food ServicesKeyword = BakerLocation = Continental U.S.

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 19: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

19 of 63

Extracting Job Openings from the Web

Title: Ice Cream Guru

Description: If you dream of cold creamy…

Contact:[email protected]

Category:Travel/Hospitality

Function:Food Services

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 20: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

20of 63

Potential Enabler of Faceted Search

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 21: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

21of 63

Lots of Structured Information in Text

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 22: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

22of 63

IE from Research Papers

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 23: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

23of 63

What is Information Extraction?

• Recovering structured data from formatted text

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 24: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

24of 63

What is Information Extraction?

• Recovering structured data from formatted text– Identifying fields (e.g. named entity recognition)

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 25: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

25of 63

What is Information Extraction?

• Recovering structured data from formatted text– Identifying fields (e.g. named entity recognition)

– Understanding relations between fields (e.g. record association)

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 26: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

26of 63

What is Information Extraction?

• Recovering structured data from formatted text– Identifying fields (e.g. named entity recognition)

– Understanding relations between fields (e.g. record association)

– Normalization and deduplication

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 27: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

27of 63

What is Information Extraction?

• Recovering structured data from formatted text– Identifying fields (e.g. named entity recognition)

– Understanding relations between fields (e.g. record association)

– Normalization and deduplication

• Today, focus on field identification

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 28: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

28of 63

IE HistoryPre-Web• Mostly news articles

– De Jong’s FRUMP [1982]• Hand-built system to fill Schank-style “scripts” from news wire

– Message Understanding Conference (MUC) DARPA [’87-’95], TIPSTER [’92-’96]

• Most early work dominated by hand-built models– E.g. SRI’s FASTUS, hand-built FSMs.– But by 1990’s, some machine learning: Lehnert, Cardie, Grishman and then HMMs: Elkan

[Leek ’97], BBN [Bikel et al ’98]

Web• AAAI ’94 Spring Symposium on “Software Agents”

– Much discussion of ML applied to Web. Maes, Mitchell, Etzioni.

• Tom Mitchell’s WebKB, ‘96– Build KB’s from the Web.

• Wrapper Induction– Initially hand-build, then ML: [Soderland ’96], [Kushmeric ’97],…

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 29: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

29 of 63

IE Posed as a Machine Learning Task

• Training data: documents marked up with ground truth

• In contrast to text classification, local features crucial. Features of:– Contents

– Text just before item

– Text just after item

– Begin/end boundaries

00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun

prefix contents suffix

… …

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 30: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

30of 63

Good Features for Information Extraction

Example word features:– identity of word

– is in all caps

– ends in “-ski”

– is part of a noun phrase

– is in a list of city names

– is under node X in WordNet or Cyc

– is in bold font

– is in hyperlink anchor

– features of past & future

– last person name was female

– next two words are “and Associates”

begins-with-number

begins-with-ordinal

begins-with-punctuation

begins-with-question-word

begins-with-subject

blank

contains-alphanum

contains-bracketed-number

contains-http

contains-non-space

contains-number

contains-pipe

contains-question-mark

contains-question-word

ends-with-question-mark

first-alpha-is-capitalized

indented

indented-1-to-4

indented-5-to-10

more-than-one-third-space

only-punctuation

prev-is-blank

prev-begins-with-ordinal

shorter-than-30

Creativity and Domain Knowledge Required!

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 31: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

31of 63

Is Capitalized

Is Mixed Caps

Is All Caps

Initial Cap

Contains Digit

All lowercase

Is Initial

Punctuation

Period

Comma

Apostrophe

Dash

Preceded by HTML tag

Character n-gram classifier says string is a person name (80% accurate)

In stopword list(the, of, their, etc)

In honorific list(Mr, Mrs, Dr, Sen, etc)

In person suffix list(Jr, Sr, PhD, etc)

In name particle list (de, la, van, der, etc)

In Census lastname list;segmented by P(name)

In Census firstname list;segmented by P(name)

In locations lists(states, cities, countries)

In company name list(“J. C. Penny”)

In list of company suffixes(Inc, & Associates, Foundation)

Word Features– lists of job titles, – Lists of prefixes– Lists of suffixes– 350 informative phrases

HTML/Formatting Features– {begin, end, in} x

{<b>, <i>, <a>, <hN>} x{lengths 1, 2, 3, 4, or longer}

– {begin, end} of line

Creativity and Domain Knowledge Required!Good Features for Information Extraction

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 32: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

32of 63

Landscape of ML Techniques for IE:

Any of these models can be used to capture words, formatting or both.

Classify Candidates

Abraham Lincolnwas born in Kentucky.

Classifier

which class?

Sliding Window

Abraham Lincoln was born in Kentucky.

Classifier

which class?

Try alternatewindow sizes:

Boundary Models

Abraham Lincoln was born in Kentucky.

Classifier

which class?

BEGIN END BEGIN END

BEGIN

Finite State Machines

Abraham Lincoln was born in Kentucky.

Most likely state sequence?

Wrapper Induction

<b><i>Abraham Lincoln</i></b> was born in Kentucky.

Learn and apply pattern for a website

<b>

<i>

PersonName

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 33: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

33of 63

Sliding Windows & Boundary Detection

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 34: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

34of 63

Information Extraction by Sliding WindowsGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 35: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

35of 63

Information Extraction by Sliding WindowsGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 36: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

36of 63

Information Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 37: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

37of 63

Information Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 38: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

38of 63

Information Extraction by Sliding Window

GRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 39: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

39 of 63

Information Extraction with Sliding Windows[Freitag 97, 98; Soderland 97; Califf 98]

00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrunw t-m w t-1 w t w t+n w t+n+1 w t+n+m

prefix contents suffix

… …

• Standard supervised learning setting– Positive instances: Windows with real label

– Negative instances: All other windows

– Features based on candidate, prefix and suffix

• Special-purpose rule learning systems work wellcourseNumber(X) :-

tokenLength(X,=,2), every(X, inTitle, false), some(X, A, <previousToken>, inTitle, true),some(X, B, <>. tripleton, true)CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 40: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

40of 63

IE by Boundary DetectionGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 41: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

41of 63

IE by Boundary DetectionGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 42: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

42of 63

IE by Boundary DetectionGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 43: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

43of 63

IE by Boundary DetectionGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 44: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

44of 63

IE by Boundary DetectionGRAND CHALLENGES FOR MACHINE LEARNING

Jaime CarbonellSchool of Computer ScienceCarnegie Mellon University

3:30 pm7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

CMU UseNet Seminar Announcement

E.g.Looking forseminarlocation

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 45: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

45of 63

BWI: Learning to detect boundaries

• Another formulation: learn three probabilistic classifiers:– START(i) = Prob( position i starts a field)– END(j) = Prob( position j ends a field)– LEN(k) = Prob( an extracted field has length k)

• Then score a possible extraction (i,j) bySTART(i) * END(j) * LEN(j-i)

• LEN(k) is estimated from a histogram

• START(i) and END(j) learned by boosting over simple boundary patterns and features

[Freitag & Kushmerick, AAAI 2000]

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 46: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

46of 63

Problems with Sliding Windows and Boundary Finders

• Decisions in neighboring parts of the input are made independently from each other.

– Sliding Window may predict a “seminar end time” before the “seminar start time”.

– It is possible for two overlapping windows to both be above threshold.

– In a Boundary-Finding system, left boundaries are laid down independently from right boundaries, and their pairing happens as a separate step.

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 47: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

47of 63

Hidden Markov Models

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 48: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

48of 63

Citation Parsing

• Fahlman, Scott & Lebiere, Christian(1989). The cascade-correlation learning architecture.Advances in Neural Information Processing Systems,pp. 524-532.

• Fahlman, S.E. and Lebiere, C., “The Cascade Correlation Learning Architecture,”Neural Information Processing Systems,pp. 524-532, 1990.

• Fahlman, S. E.(1991)The recurrent cascade-correlation learning architecture. NIPS 3, 190-205.

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 49: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

49 of 63

Can we do this with probabilistic generative models?

• Could have classes for {author, title, journal, year, pages}

• Could classify every word or sequence?– Which sequences?

• Something interesting in the sequence of fields that we’d like to capture– Authors come first– Title comes before journal– Page numbers come near the end

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 50: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

50of 63

Hidden Markov Models: The Representation

• A document is a sequence of words

• Each word is tagged by its class

• fahlman s e and lebiere c the cascade correlation learning architectureneural information processing systemspp 524 532 1990

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 51: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

51of 63

HMM: Generative Model (1)

Author Title Journal

Year Pages

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 52: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

52of 63

HMM: Generative Model (2)

Author Title

Year Pages

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 53: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

53of 63

HMM: Generative Model (3)

• States: xi

• State transitions: P(xi|xj) = a[xi|xj] • Output probabilities: P(oi|xj) = b[oi|xj]

• Markov independence assumptionCS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 54: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

54of 63

HMMs: Estimating Parameters• With fully-labeled data, just like naïve Bayes

• Estimate MAP output probabilities:

• Estimate MAP state transitions:

j

j

xdataword

xdatawordi

ji Vocab

wordoxo

@

@

1||

),N(1]|[b

datax

dataxij

ji

j

j

x

xxxxa

1||

),N(1]|[

1

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 55: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

55of 63

HMMs: Performing Extraction

• Given output words:– fahlman s e 1991 the recurrent cascade correlation learning architecture nips 3 190 205

• Find state sequence that maximizes:

• Lots of possible state sequences to test (514)

Hmm…

i

iiii xobxxa ]|[]|[ 1

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 56: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

56of 63

Representation for Paths: Trellis

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 57: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

57of 63

Representation for Paths: Trellis

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 58: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

58of 63CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 59: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

59 of 63CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 60: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

60of 63CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 61: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

61of 63

HMM Example: NymbleTask: Named Entity Extraction

Train on 450k words of news wire text.

Case Language F1 .Mixed English 93%Upper English 91%Mixed Spanish 90%

[Bikel, et al 97]

Person

Org

Other

(Five other name classes)

start-of-sentence

end-of-sentence

Results:

• Bigram within classes

• Backoff to unigram

• Special capitalization and number features…

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 62: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

62of 63

Nymble word features

CS8691-AI-AI-UNITV-INFORMATION EXTRACTION

Page 63: INFORMATION EXTRACTIONNaïve Bayes: Mixture of Multinomials Model 1. Pick the class: P(class) 2. For every word, pick from the class urn: P(word|class) while polo soccer activity dropped

63of 63

HMMs: A Plethora of Applications

• Information extraction• Part of speech tagging• Word segmentation

• Gene finding• Protein structure prediction

• Speech recognition

• Economics, Climatology, Robotics, …CS8691-AI-AI-UNITV-INFORMATION EXTRACTION