Information Extraction - Indian Institute of Technology...

100
Information Extraction Mausam (Based on slides of Heng Ji, Dan Jurafsky, Chris Manning, Ray Mooney, Alan Ritter, Alex Yates)

Transcript of Information Extraction - Indian Institute of Technology...

Page 1: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Information Extraction

Mausam(Based on slides of Heng Ji, Dan Jurafsky, Chris Manning, Ray Mooney, Alan Ritter, Alex Yates)

Page 2: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Extracting information from text

• Company report: “International Business Machines Corporation (IBM or the company) was incorporated in the State of New York on June 16, 1911, as the Computing-Tabulating-Recording Co. (C-T-R)…”

• Extracted Complex Relation:Company-Founding

Company IBMLocation New YorkDate June 16, 1911Original-Name Computing-Tabulating-Recording Co.

• But we will focus on the simpler task of extracting relation triplesFounding-year(IBM,1911)

Founding-location(IBM,New York)

Page 3: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Extracting Relation Triples from TextThe Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is an American private research university located in Stanford, California … near Palo Alto, California… Leland Stanford…founded the university in 1891

Stanford EQ Leland Stanford Junior University

Stanford LOC-IN California

Stanford IS-A research university

Stanford LOC-NEAR Palo Alto

Stanford FOUNDED-IN 1891

Stanford FOUNDER Leland Stanford

Page 4: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Why Information Extraction?

• Create new structured knowledge bases, useful for any app

• Augment current knowledge bases• Adding words to WordNet thesaurus, facts to FreeBase or DBPedia

• Support question answering• The granddaughter of which actor starred in the movie “E.T.”?

(acted-in ?x “E.T.”)(is-a ?y actor)(granddaughter-of ?x ?y)

• But which relations should we extract?

4

Page 5: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

5

Early Information Extraction

• FRUMP (Dejong, 1979) was an early information extraction system that processed news stories and identified various types of events (e.g. earthquakes, terrorist attacks, floods).

• Used “sketchy scripts” of various events to identify specific pieces of information about such events.

• Able to summarize articles in multiple languages.

• Relied on “brittle” hand-built symbolic knowledge structures that were hard to build and not very robust.

Page 6: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

6

MUC• DARPA funded significant efforts in IE in the early to mid 1990’s.

• Message Understanding Conference (MUC) was an annual event/competition where results were presented.

• Focused on extracting information from news articles:• Terrorist events

• Industrial joint ventures

• Company management changes

• Information extraction of particular interest to the intelligence community (CIA, NSA).

• Established standard evaluation methodology using development (training) and test data and metrics: precision, recall, F-measure.

Page 7: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Automated Content Extraction (ACE)

ARTIFACT

GENERALAFFILIATION

ORGAFFILIATION

PART-WHOLE

PERSON-SOCIAL

PHYSICAL

Located

Near

Business

Family Lasting Personal

Citizen-Resident-Ethnicity-Religion

Org-Location-Origin

Founder

EmploymentMembership

OwnershipStudent-Alum

Investor

User-Owner-Inventor-Manufacturer

Geographical

Subsidiary

Sports-Affiliation

17 relations from 2008 “Relation Extraction Task”

Page 8: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Automated Content Extraction (ACE)

• Physical-Located PER-GPE

He was in Tennessee

• Part-Whole-Subsidiary ORG-ORGXYZ, the parent company of ABC

• Person-Social-Family PER-PER

John’s wife Yoko

• Org-AFF-Founder PER-ORG

Steve Jobs, co-founder of Apple…

•8

Page 9: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

UMLS: Unified Medical Language System

• 134 entity types, 54 relations

Injury disrupts Physiological FunctionBodily Location location-of Biologic FunctionAnatomical Structure part-of OrganismPharmacologic Substance causes Pathological FunctionPharmacologic Substance treats Pathologic Function

Page 10: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Extracting UMLS relations from a sentence

Doppler echocardiography can be used to

diagnose left anterior descending artery

stenosis in patients with type 2 diabetes

Echocardiography, Doppler DIAGNOSES Acquired stenosis

10

Page 11: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Databases of Wikipedia Relations

11

Relations extracted from InfoboxStanford state CaliforniaStanford motto “Die Luft der Freiheit weht”…

Wikipedia Infobox

Page 12: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Relation databases that draw from Wikipedia

• Resource Description Framework (RDF) triplessubject predicate object

Golden Gate Park location San Francisco

dbpedia:Golden_Gate_Park dbpedia-owl:location dbpedia:San_Francisco

• DBPedia: 1 billion RDF triples, 385 from English Wikipedia

• Frequent Freebase relations:people/person/nationality, location/location/containspeople/person/profession, people/person/place-of-birthbiology/organism_higher_classification film/film/genre

12

Page 13: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Ontological relations

• IS-A (hypernym): subsumption between classes

• Giraffe IS-A ruminant IS-A ungulate IS-Amammal IS-A vertebrate IS-A animal…

• Instance-of: relation between individual and class

• San Francisco instance-of city

Examples from the WordNet Thesaurus

Page 14: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

How to build relation extractors

1. Hand-written patterns

2. Supervised machine learning

3. Semi-supervised and unsupervised • Bootstrapping (using seeds)

• Distant supervision

• Unsupervised learning from the web

Page 15: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Hand Written Patterns

15

Page 16: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Rules for extracting IS-A relation

Early intuition from Hearst (1992)

• “Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use”

• What does Gelidium mean?

• How do you know?`

Page 17: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Rules for extracting IS-A relation

Early intuition from Hearst (1992)

• “Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use”

• What does Gelidium mean?

• How do you know?`

Page 18: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Hearst’s Patterns for extracting IS-A relations

(Hearst, 1992): Automatic Acquisition of Hyponyms

“Y such as X ((, X)* (, and|or) X)”

“such Y as X”

“X or other Y”

“X and other Y”

“Y including X”

“Y, especially X”

Page 19: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Hearst’s Patterns for extracting IS-A relations

Hearst pattern Example occurrences

X and other Y ...temples, treasuries, and other important civic buildings.

X or other Y Bruises, wounds, broken bones or other injuries...

Y such as X The bow lute, such as the Bambara ndang...

Such Y as X ...such authors as Herrick, Goldsmith, and Shakespeare.

Y including X ...common-law countries, including Canada and England...

Y , especially X European countries, especially France, England, and Spain...

Page 20: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Extracting Richer Relations Using Rules

• Intuition: relations often hold between specific entities

• located-in (ORGANIZATION, LOCATION)

• founded (PERSON, ORGANIZATION)

• cures (DRUG, DISEASE)

• Start with Named Entity tags to help extract relation!

Page 21: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Named Entities aren’t quite enough.Which relations hold between 2 entities?

Drug Disease

Cure?

Prevent?

Cause?

Page 22: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

What relations hold between 2 entities?

PERSON ORGANIZATION

Founder?

Investor?

Member?

Employee?

President?

Page 23: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Extracting Richer Relations Using Rules andNamed Entities

Who holds what office in what organization?

PERSON, POSITION of ORG

• George Marshall, Secretary of State of the United States

PERSON(named|appointed|chose|etc.) PERSON Prep? POSITION

• Truman appointed Marshall Secretary of State

PERSON [be]? (named|appointed|etc.) Prep? ORG POSITION

• George Marshall was named US Secretary of State

Page 24: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Hand-built patterns for relations

• Plus:

• Human patterns tend to be high-precision

• Can be tailored to specific domains

• Minus

• Human patterns are often low-recall

• A lot of work to think of all possible patterns!

• Don’t want to have to do this for every relation!

• We’d like better accuracy

Page 25: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Supervised Algorithms

25

Page 26: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Supervised machine learning for relations

• Choose a set of relations we’d like to extract

• Choose a set of relevant named entities

• Find and label data• Choose a representative corpus

• Label the named entities in the corpus

• Hand-label the relations between these entities

• Break into training, development, and test

• Train a classifier on the training set

26

Page 27: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

How to do classification in supervised relation extraction

1. Find all pairs of named entities (usually in same sentence)

2. Decide if 2 entities are related

3. If yes, classify the relation

• Why the extra step?

• Faster classification training by eliminating most pairs

• Can use distinct feature-sets appropriate for each task.

27

Page 28: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Automated Content Extraction (ACE)

ARTIFACT

GENERALAFFILIATION

ORGAFFILIATION

PART-WHOLE

PERSON-SOCIAL

PHYSICAL

Located

Near

Business

Family Lasting Personal

Citizen-Resident-Ethnicity-Religion

Org-Location-Origin

Founder

EmploymentMembership

OwnershipStudent-Alum

Investor

User-Owner-Inventor-Manufacturer

Geographical

Subsidiary

Sports-Affiliation

17 sub-relations of 6 relations from 2008 “Relation Extraction Task”

Page 29: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Relation Extraction

Classify the relation between two entities in a sentence

American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.

SUBSIDIARY

FAMILY

EMPLOYMENT

NIL

FOUNDER

CITIZEN

INVENTOR…

Page 30: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Word Features for Relation Extraction

• Headwords of M1 and M2, and combinationAirlines Wagner Airlines-Wagner

• Bag of words and bigrams in M1 and M2{American, Airlines, Tim, Wagner, American Airlines, Tim Wagner}

• Words or bigrams in particular positions left and right of M1/M2M2: -1 spokesman

M2: +1 said

• Bag of words or bigrams between the two entities{a, AMR, of, immediately, matched, move, spokesman, the, unit}

American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner saidMention 1 Mention 2

Page 31: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Named Entity Type and Mention LevelFeatures for Relation Extraction

• Named-entity types• M1: ORG

• M2: PERSON

• Concatenation of the two named-entity types• ORG-PERSON

• Entity Level of M1 and M2 (NAME, NOMINAL, PRONOUN)• M1: NAME [it or he would be PRONOUN]

• M2: NAME [the company would be NOMINAL]

American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner saidMention 1 Mention 2

Page 32: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Parse Features for Relation Extraction

• Base syntactic chunk sequence from one to the otherNP NP PP VP NP NP

• Constituent path through the tree from one to the otherNP NP S S NP

• Dependency path

Airlines matched Wagner said

American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner saidMention 1 Mention 2

Page 33: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Gazeteer and trigger word features for relation extraction

• Trigger list for family: kinship terms• parent, wife, husband, grandparent, etc. [from WordNet]

• Gazeteer:• Lists of useful geo or geopolitical words

• Country name list

• Other sub-entities

Page 34: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.

Page 35: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Classifiers for supervised methods

• Now you can use any classifier you like

• MaxEnt

• Naïve Bayes

• SVM

• ...

• Train it on the training set, tune on the dev set, test on the test set

• Can also cast as sequence labeling – use CRFs

Page 36: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

37

Using Background Knowledge (Chan and Roth, 2010)

• Features employed are usually restricted to being defined on the various representations of the target sentences

• Humans rely on background knowledge to recognize relations

• Overall aim of this work• Propose methods of using knowledge or resources that exists

beyond the sentence

• Wikipedia, word clusters, hierarchy of relations, entity type constraints, coreference

• As additional features, or under the Constraint Conditional Model (CCM) framework with Integer Linear Programming (ILP)37

Page 37: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

3838

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

Page 38: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

3939

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

Page 39: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4040

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

Page 40: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4141

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

Page 41: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4242

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background KnowledgeDavid Brian Cone (born January 2, 1963) is a

former Major League Baseball pitcher. He compiled

an 8–3 postseason record over 21 postseason

starts and was a part of five World Series

championship teams (1992 with the Toronto Blue

Jays and 1996, 1998, 1999 & 2000 with the New

York Yankees). He had a career postseason ERA of

3.80. He is the subject of the book A Pitcher's

Story: Innings With David Cone by Roger Angell.

Fans of David are known as "Cone-Heads."

Cone lives in Stamford, Connecticut, and is formerly

a color commentator for the Yankees on the YES

Network.[1]

Contents

[hide]

1 Early years

2 Kansas City Royals

3 New York Mets

Partly because of the resulting lack of leadership,

after the 1994 season the Royals decided to

reduce payroll by trading pitcher David Cone and

outfielder Brian McRae, then continued their

salary dump in the 1995 season. In fact, the

team payroll, which was always among the

league's highest, was sliced in half from $40.5

million in 1994 (fourth-highest in the major

leagues) to $18.5 million in 1996 (second-lowest

in the major leagues)

Page 42: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4343

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

fine-grained

Employment:Staff 0.20

Employment:Executive 0.15

Personal:Family 0.10

Personal:Business 0.10

Affiliation:Citizen 0.20

Affiliation:Based-in 0.25

Page 43: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4444

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

Page 44: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4545

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

Page 45: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

4646

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Using Background Knowledge

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

0.55

Page 46: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

47

Knowledge1: Word Class Information(as additional feature)

• Supervised systems face an issue of data sparseness (of lexical features)

• Use class information of words to support generalization better: instantiated as word clusters in our work• Automatically generated from unlabeled texts

using algorithm of (Brown et al., 1992)

apple pear Apple IBM

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

47

Page 47: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

48

Knowledge1: Word Class Information

• Supervised systems face an issue of data sparseness (of lexical features)

• Use class information of words to support generalization better: instantiated as word clusters in our work• Automatically generated from unlabeled texts

using algorithm of (Brown et al., 1992)

apple pear Apple

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

48

IBM

Page 48: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

49

Knowledge1: Word Class Information

• Supervised systems face an issue of data sparseness (of lexical features)

• Use class information of words to support generalization better: instantiated as word clusters in our work• Automatically generated from unlabeled texts

using algorithm of (Brown et al., 1992)

apple pear Apple

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

49

IBM011

Page 49: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

50

Knowledge1: Word Class Information

• All lexical features consisting of single words will be duplicated with its corresponding bit-string representation

apple pear Apple IBM

0 1 0 1

0 1

bought run of in

0 1 0 1

0 1

0 1

50

00 01 10 11

Page 50: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

51

Knowledge2: Wikipedia(as additional feature)

• We use a Wikifier system (Ratinov et al., 2010) which performs context-sensitive mapping of mentions to Wikipedia pages

• Introduce a new feature based on:

• introduce a new feature by combining the above with the coarse-grained entity types of mi,mj

otherwise ,0

)(or )( if ,1),(1

imjm

ji

mAmAmmw ji

51

mi mjr ?

Page 51: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

5353

weight vector for

“local” modelscollection of

classifiers

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

Page 52: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

54

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

54

weight vector for

“local” modelscollection of

classifiers

penalty for violating

the constraint

how far y is from a

“legal” assignment

Page 53: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

55

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

55

•Wikipedia

•word clusters

•hierarchy of relations

•entity type constraints

•coreference

Page 54: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

5656

David

Cone

,

a

Kansas

City

native

,

was

originally

signed

by

the

Royals

and

broke

into

the

majors

with

the

team

Constraint Conditional Models (CCMs)

fine-grained coarse-grained

Employment:Staff 0.200.35 Employment

Employment:Executive 0.15

Personal:Family 0.100.40 Personal

Personal:Business 0.10

Affiliation:Citizen 0.200.25 Affiliation

Affiliation:Based-in 0.25

Page 55: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

57

• Key steps• Write down a linear objective function

• Write down constraints as linear inequalities

• Solve using integer linear programming (ILP) packages

57

Constraint Conditional Models (CCMs)(Roth and Yih, 2007; Chang et al., 2008)

Page 56: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

58

Knowledge3: Relations between our target relations

......

personal

......

employment

family biz executivestaff

58

Page 57: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

59

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

59

coarse-grained

classifier

fine-grained

classifier

Page 58: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

60

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

60

mi mj

coarse-grained?

fine-grained?

Page 59: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

61

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

61

Page 60: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

62

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

62

Page 61: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

63

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

63

Page 62: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

64

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

64

Page 63: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

65

Knowledge3: Hierarchy of Relations

......

personal

......

employment

family biz executivestaff

65

Page 64: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

66

Knowledge3: Hierarchy of Relations Write down a linear objective function

R LR L R rf

rfRR

R rcrcRR

RfRc

yrfpxrcp,,

)()(max

66

coarse-grained

prediction

probabilities

fine-grained

prediction

probabilities

Page 65: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

67

Knowledge3: Hierarchy of Relations Write down a linear objective function

R LR L R rf

rfRR

R rcrcRR

RfRc

yrfpxrcp,,

)()(max

67

coarse-grained

prediction

probabilities

fine-grained

prediction

probabilities

coarse-grained

indicator

variable

fine-grained

indicator

variable

indicator variable == relation assignment

Page 66: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

68

Knowledge3: Hierarchy of Relations Write down constraints

• If a relation R is assigned a coarse-grained label rc, then we must also assign to R a fine-grained relation rf which is a child of rc.

• (Capturing the inverse relationship) If we assign rfto R, then we must also assign to R the parent of rf, which is a corresponding coarse-grained label

68

nrfRrfRrfRrcRyyyx

,,,, 21

)(,, rfparentRrfRxy

Page 67: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

69

Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)

• Entity types are useful for constraining the possible labels that a relation R can assume

69

mi mj

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

Page 68: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

70

• Entity types are useful for constraining the possible labels that a relation R can assume

70

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

per org

per org

per

per per

per

per

org

gpe

gpe

per per

mi mj

Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)

Page 69: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

71

• We gather information on entity type constraints from ACE-2004 documentation and impose them on the coarse-grained relations• By improving the coarse-grained predictions and combining with the

hierarchical constraints defined earlier, the improvements would propagate to the fine-grained predications

71

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

per org

per org

per

per per

per

per

org

gpe

gpe

per per

mi mj

Knowledge4: Entity Type Constraints(Roth and Yih, 2004, 2007)

Page 70: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

72

Knowledge5: Coreference

72

mi mj

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

Page 71: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

73

Knowledge5: Coreference

• In this work, we assume that we are given the coreference information, which is available from the ACE annotation.

73

mi mj

Employment:Staff

Employment:Executive

Personal:Family

Personal:Business

Affiliation:Citizen

Affiliation:Based-in

null

Page 72: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

74

Experiment Results

74

F1% improvement from using each knowledge source

All nwire 10% of nwire

BasicRE 50.5% 31.0%

Page 73: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Summary: Supervised Relation Extraction

+ Can get high accuracies with enough hand-labeled

training data, if test similar enough to training

- Labeling a large training set is expensive

- Supervised models are brittle, don’t generalize well

to different genres

Page 74: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Bootstrapping Approaches

76

Page 75: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Rule Learning

• Thinking up some patterns for hyponyms might not be too hard, but what about some new relationship? • E.g., enzymes and the molecular pathway(s) they’re involved in?

• Cities and their mayors? Films and their directors?

• Can we automate the process of identifying patterns?

• Rule learning automates this process, if it is given some examples of the relationship of interest.• For instance, some example enzyme names and the names of the

pathways they’re involved in.

Page 76: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Seed-based or bootstrapping approaches to relation extraction

• No training set? Maybe you have:

• A few seed tuples or

• A few high-precision patterns

• Can you use those seeds to do something useful?

• Bootstrapping: use the seeds to directly learn to populate a relation

Page 77: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Relation Bootstrapping (Hearst 1992)

• Gather a set of seed pairs that have relation R

• Iterate:

1. Find sentences with these pairs

2. Look at the context between or around the pair and generalize the context to create patterns

3. Use the patterns for grep for more pairs

Page 78: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

Bootstrapping for Relation Extraction

Page 79: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Bootstrapping

• <Mark Twain, Elmira> Seed tuple• Grep (google) for the environments of the seed tuple

“Mark Twain is buried in Elmira, NY.”

X is buried in Y

“The grave of Mark Twain is in Elmira”

The grave of X is in Y

“Elmira is Mark Twain’s final resting place”

Y is X’s final resting place.

• Use those patterns to grep for new tuples

• Iterate

Page 80: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

BootstrappingSeed Examples

Philadelphia – Michael Nutter

New York – Michael Bloomberg

Rule Learning Extraction Rules

X is mayor of Y

X, mayor of Y

X runs City Hall in Y

High-confidenceExtractions

Page 81: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

BootstrappingSeed Examples

Philadelphia – Michael Nutter

New York – Michael Bloomberg

San Diego – Jerry Sanders

Belgrade -- Dragan Đilas

Rule Learning Extraction Rules

X is mayor of Y

X, mayor of Y

X runs City Hall in Y

Social Democrat X is new mayorof Y

High-confidenceExtractions

Page 82: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Occurrences of

seed tuples:

Computer servers at Microsoft’s

headquarters in Redmond…

In mid-afternoon trading, share of

Redmond-based Microsoft fell…

The Armonk-based IBM introduced

a new line…

The combined company will operate

from Boeing’s headquarters in Seattle.

Intel, Santa Clara, cut prices of its

Pentium processor.

ORGANIZATION LOCATION

MICROSOFT REDMOND

IBM ARMONK

BOEING SEATTLE

INTEL SANTA CLARA

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

Bootstrapping for Relation Extraction

Page 83: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

• <STRING1>’s headquarters in <STRING2>

•<STRING2> -based <STRING1>

•<STRING1> , <STRING2>

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

LearnedPatterns:

Bootstrapping for Relation Extraction

Page 84: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Initial Seed Tuples Occurrences of Seed Tuples

Generate Extraction Patterns

Generate New Seed Tuples

Augment Table

Generatenew seedtuples; start newiteration

ORGANIZATION LOCATION

AG EDWARDS ST LUIS

157TH STREET MANHATTAN

7TH LEVEL RICHARDSON

3COM CORP SANTA CLARA

3DO REDWOOD CITY

JELLIES APPLE

MACWEEK SAN FRANCISCO

Bootstrapping for Relation Extraction

Page 85: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Main Issue: Semantic Drift

• US Presidents “presidents such as…” Company presidents

• States “states such as…” liquid, solid, gas, Canada, …• “Microfinance institutions (MFIs) in the United States such as Accion”

• …

87

Page 86: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Doubly Anchored Patterns[Kozareva, ACL 08]

• “X such as Y and Z” • Fix two look for third

• “presidents such as Clinton and *”• Will likely not drift to company presidents

88

Page 87: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

89

Pattern-Match Rule Learning• Writing accurate patterns for each slot for each application

requires laborious software engineering.

• Alternative is to use rule induction methods.

• RAPIER system (Califf & Mooney, 1999) learns three regex-style patterns for each slot:• Pre-filler pattern

• Filler pattern

• Post-filler pattern

• RAPIER allows use of POS and WordNet categories in patterns to generalize over lexical items.

Page 88: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

90

RAPIER Pattern Induction Example• If goal is to extract the name of the city in which a

posted job is located, the least-general-generalization constructed by RAPIER is:

“…located in Atlanta, Georgia…” “…offices in Kansas City, Missouri…”

Rapier

Pattern Induction

Prefiller: “in” as Prep

Filler: 1 to 2 PropNouns

Postfiller: PropNoun which is a State

Page 89: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Distant Supervision

94

Page 90: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Distant Supervision

• Combine bootstrapping with supervised learning

• Instead of 5 seeds,

• Use a large database to get huge # of seed examples

• Create lots of features from all these examples

• Combine in a supervised classifier

Snow, Jurafsky, Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. NIPS 17Fei Wu and Daniel S. Weld. 2007. Autonomously Semantifying Wikipeida. CIKM 2007Mintz, Bills, Snow, Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL09

Page 91: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Distant supervision paradigm

• Like supervised classification:

• Uses a classifier with lots of features

• Supervised by detailed hand-created knowledge

• Doesn’t require iteratively expanding patterns

• Like unsupervised classification:

• Uses very large amounts of unlabeled data

• Not sensitive to genre issues in training corpus

Page 92: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Distantly supervised learning of relation extraction patterns

For each relation

For each tuple in big database

Find sentences in large corpus with both entities

Extract frequent features (parse, words, etc)

Train supervised classifier using thousands of patterns

4

1

2

3

5

PER was born in LOC

PER, born (XXXX), LOC

PER’s birthplace in LOC

<Edwin Hubble, Marshfield><Albert Einstein, Ulm>

Born-In

Hubble was born in Marshfield

Einstein, born (1879), Ulm

Hubble’s birthplace in Marshfield

P(born-in | f1,f2,f3,…,f70000)

Page 93: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Heuristics for Labeling Training Data

Person Birth Location

Barack Obama

Honolulu

Mitt Romney Detroit

Albert Einstein

Ulm

Nikola Tesla Smiljan

… …

“Barack Obama was born

on August 4, 1961 at … in

the city of Honolulu ...”

“… site between Downtown

Honolulu and Waikiki is

proposed site for Barack

Obama's presidential library.

“Born in Honolulu, Barack

Obama went on to become…”

(Barack Obama, Honolulu)

(Mitt Romney, Detroit)

(Albert Einstein, Ulm)

98

e.g. [Mintz et. al. 2009]

Problem 1: not all sentences are positive training data!

Page 94: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

𝑠1 𝑠2 𝑠3… 𝑠𝑛

𝑧1 𝑧2 𝑧3… 𝑧𝑛

𝑑1 𝑑2 𝑑𝑘…

Local Extractors

Deterministic OR

(Barack Obama, Honolulu)

99

Sentences

Aggregate Relations

(Born-In, Lived-In, children, etc…)

𝑃 𝑧𝑖 = 𝑟 𝑠𝑖 ∝ exp(𝜃 ⋅ 𝑓 𝑠𝑖 , 𝑟 ) Relation mentions

𝑧

𝑃(𝑧, 𝑑|𝑠; 𝜃)MaximizeConditionalLikelihood

Multi-instance Learning (MultiR)

Page 95: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Heuristics for Labeling Training Data

Person Birth Location

Barack Obama

Honolulu

Mitt Romney Detroit

Albert Einstein

Ulm

Nikola Tesla Smiljan

… …

“Barack Obama was born

on August 4, 1961 at … in

the city of Honolulu ...”

“… site between Downtown

Honolulu and Waikiki is

proposed site for Barack

Obama's presidential library.

“Born in Honolulu, Barack

Obama went on to become…”

(Barack Obama, Honolulu)

(Mitt Romney, Detroit)

(Albert Einstein, Ulm)

100

e.g. [Mintz et. al. 2009]

Problem 2: not all databases are complete (neither is text)

Page 96: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Missing Data Problems…

• 2 Assumptions Drive learning:• Not in DB -> not mentioned in text

• In DB -> must be mentioned at least once

• Leads to errors in training data:• False positives

• False negatives

101

Page 97: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Changes

𝑠1 𝑠2 𝑠3 … 𝑠𝑛

𝑧1 𝑧2 𝑧3 … 𝑧𝑛

𝑑1 𝑑2 𝑑𝑘…

102

Page 98: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Modeling Missing Data

𝑠1 𝑠2 𝑠3 … 𝑠𝑛

𝑧1 𝑧2 𝑧3 … 𝑧𝑛

𝑡1 𝑡2 𝑡𝑘…

Mentioned in DB 𝑑1 𝑑2 𝑑𝑘…

Encourage Agreement

Mentioned in Text

103

[Ritter et. al. TACL 2013]

Page 99: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Side Information• Entity coverage

in database• Popular

entities

• Good coverage in Freebase Wikipedia

• Unlikely to extract new facts

104

𝑠1 𝑠2 𝑠3 … 𝑠𝑛

𝑧1 𝑧2 𝑧3 … 𝑧𝑛

𝑡1 𝑡2 𝑡𝑘…

𝑑1 𝑑2 𝑑𝑘…

Page 100: Information Extraction - Indian Institute of Technology Delhimausam/courses/csl772/autumn2014/lectures/12-ie.pdf · • Information extraction of particular interest to the intelligence

Experiments

• Red: MultiR

• Black: Soft Constraints

• Green: Missing Data Model

105

[Hoffmann et. al. 2011]