I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara...

I256

Applied Natural Language Processing

Fall 2009

Lecture 14

Information Extraction (2)

Barbara Rosario

2

Today

• Midterm evaluations

• Discuss schedule for next class

• Finish slides lecture 13

• Information Extraction (2)

3

Text Mining/Information Extraction

• Text:– Stress is associated with migraines – Stress can lead to loss of magnesium – Calcium channel blockers prevent some migraines – Magnesium is a natural calcium channel blocker

1: Extract semantic entities from text

4

Text Mining


Stress Migraine

Magnesium Calcium channel blockers

1: Extract semantic entities from text

5

Text Mining (cont.)


Stress Migraine


2: Classify relations between entities

Associated withLead to loss

Prevent

Subtype-of (is a)

6

Text Mining (cont.)


Stress Migraine


3: Do reasoning: find new correlations


Prevent

Subtype-of (is a)

7

Text Mining (cont.)


Stress Migraine


4: Do reasoning: infer causality


Prevent

Subtype-of (is a)

No prevention

Deficiency of magnesium migraine

8

Death ReceptorsSignaling

Survival Factors Signaling

Ca++ Signaling

P53 pathway

Caspase 12

Effecter Caspases (3,6,7)

Caspase 9

Apaf 1IAPs

NFkB

Mitochondria Cytochrome c

Bax, Bak

Apoptosis

Bcl-2 like

BH3 only

Apoptosis Network

Smac

ER Stress

Genotoxic Stress

Initiator Caspases (8, 10)

AIF

Lost of Attachment Cell Cycle stress, etc

9

Project at UC Berkeley

• The network nodes are deduced from reading and processing of experimental knowledge by experts. Every month >1000 apoptosis papers are published.

• We need to keep track of ALL the information in order to understand the system better.

• Ultimate Goal: Produce models capable to predict system behavior, identify critical control points and propose critical experiments to extend current knowledge.

• 2005 Project: Develop an automatic literature analysis tool

– Zhang and Arkin, UC Berkeley

10

To convert free text into structured format (manually!)

11

To convert free text into structured format (manually!)

Problem: Which relations hold between 2 entities?

Treatment Disease

Cure?

Prevent?

Side Effect?

13

Hepatitis Examples

• Cure– These results suggest that con A-induced hepatitis

was ameliorated by pretreatment with TJ-135.

• Prevent– A two-dose combined hepatitis A and B vaccine

would facilitate immunization programs

• Vague– Effect of interferon on hepatitis B

14

Two tasks

• Relationship Extraction: – Identify the several semantic relations that

can occur between the entities disease and treatment in bioscience text

• Often the different relationships are determined by the entities involved

– Location of: LOCATION and ORGANIZATION

• Here different relations between the same entities

• Entity extraction: – Related problem: identify such entities

15

The Approach

• Data: MEDLINE abstracts and titles• Graphical models

– Combine in one framework both relation and entity extraction

– Both static and dynamic models

• Simple discriminative approach: – Neural network

• Lexical, syntactic and semantic features

16

Several DIFFERENT Relations between the Same Types of Entities

• Thus differs from the problem statement of other work on relations

• Many find one relation which holds between two entities (many based on ACE)– Agichtein and Gravano (2000), lexical patterns for location of– Zelenko et al. (2002) SVM for person affiliation and

organization-location– Hasegawa et al. (ACL 2004) Person-Organization ->

President “relation”– Craven (1999, 2001) HMM for subcellular-location and

disorder-association• Doesn’t identify the actual relation

17

Related work: Bioscience

• Many hand-built rules– Feldman et al. (2002),– Friedman et al. (2001)– Pustejovsky et al. (2002) – Saric et al.; this conference

18

Data and Relations

• MEDLINE, abstracts and titles• 3662 sentences labeled

– Relevant: 1724– Irrelevant: 1771

• e.g., “Patients were followed up for 6 months”

• 2 types of Entities, many instances – treatment and disease

• 7 Relationships between these entities

The labeled data is available at http://biotext.berkeley.edu

19

Labeling

20

Inter-annotators agreement

• F-measures between the 2 annotations was 81% (an upper limit for the system performance)

21

Annotators’ disagreement

22

Semantic Relationships

• 810: Cure– Intravenous immune globulin for recurrent

spontaneous abortion

• 616: Only Disease– Social ties and susceptibility to the common cold

• 166: Only Treatment– Flucticasone propionate is safe in recommended

doses

• 63: Prevent– Statins for prevention of stroke

23

Semantic Relationships

• 36: Vague– Phenylbutazone and leukemia

• 29: Side Effect– Malignant mesodermal mixed tumor of the

uterus following irradiation

• 4: Does NOT cure– Evidence for double resistance to

permethrin and malathion in head lice

24

Preprocessing

• Sentence splitter

• Tokenizer (Penn tree bank)

• Brill’s POS

• Collins parser

25

Preprocessing

26

Preprocessing

• Chunking

• Semantic tagging with MeSH: map the words into MeSH terms.

27

MeSH

• MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

28

MeSH

• 1. Anatomy [A]

• Body Regions [A01] +

• Musculoskeletal System [A02] Digestive System [A03] +

• Respiratory System [A04] +

• Urogenital System [A05] +

• Endocrine System [A06] +

• Cardiovascular System [A07] +

• Nervous System [A08] +

• Sense Organs [A09] +

• Tissues [A10] +

• Cells [A11] +

• Fluids and Secretions [A12] +

• Animal Structures [A13] +

• Stomatognathic System [A14]

• (…..)

• Body Regions [A01]– Abdomen [A01.047]

• Groin [A01.047.365]• Inguinal Canal [A01.047.412]• Peritoneum [A01.047.596] +• Umbilicus [A01.047.849]

– Axilla [A01.133]

– Back [A01.176] +

– Breast [A01.236] +

– Buttocks [A01.258]

– Extremities [A01.378] +

– Head [A01.456] +

– Neck [A01.598]

– (….)

29

Preprocessing

30

Features• Word• Part of speech• Phrase constituent (the phrase type from the shallow parse: NP<

PP..)• Belongs to the same chunk as previous word?• Orthographic features

– Is number?– Only part is number– Is negation– First letter is capital– All capital letters– All non word character– Contains non word character

• MeSH (semantic features)

• Extract these features with Python

31

Models

• 2 static generative models

• 3 dynamic generative models– Smoothing: absolute discounting

• 1 discriminative model (neural networks)

• These models in Matlab

32

ArchitectureGet text (Python)

Annotate

Preprocessing, extract features and transform features into numbers

(Python)

Algorithms

(Matlab)

Process output

(Python)

numerical features

prediction

33

Static Graphical Models

– S1: observations dependent on Role but independent from Relation given roles

– S2: observations dependent on both Relation and Role

S1 S2

34

Dynamic Graphical Models

• D1, D2 as in S1, S2

• D3: only one observation per state is

dependent on both the relation and the role

D1

D2

D3

35

Graphical Models

• Relation node: – Semantic relation (cure, prevent, none..)

expressed in the sentence

36

Graphical Models

• Role nodes: – 3 choices: treatment, disease, or none

37

Graphical Models

• Feature nodes (observed): – word, POS, MeSH…

39

Graphical Models

• For Dynamic Model D1:– Joint probability distribution over relation, roles and

features nodes

– Parameters estimated with maximum likelihood and absolute discounting smoothing

) Role | P(f, Rela) | RoleP(Role

Rela)|oleP(Rela)P(R)f,..f,RoleleP(Rela, Ro

t

T

1t

n

j

jtt-1t

0nTT0

1

10 , ,..,

40

Neural Networks

• Feed-forward network (MATLAB)– Training with conjugate gradient descent– One hidden layer (hyperbolic tangent

function)– Logistic sigmoid function for the output layer

representing the relationships

• Same features• Discriminative approach

41

Relation extraction

• Results in terms of classification accuracy (with irrelevant sentences)

• 2 cases:– Roles hidden– Roles given

• Graphical models

• NN: simple classification problem

)f,..,f,,...,RoleRole,P(RelaRela nTTkRela

^

k

argmax 100

42

Relation classification: Results

Input Base Static Dynamic NNS1 S2 D1 D2 D3

only feature

s

46.7 51.1 50.2 68.9 74.9 74.6 79.6

features +

roles

50.6 56.1 54.8 91.6 82.0 82.2 96.6

Accuracies on the relation classification task

43

Relation classification: Confusion Matrix

Computed for the model D2, “only features”

44

Role extraction

• Results in terms of F-measure• Graphical models

– Junction tree algorithm (BNT)– Relation hidden and marginalized over

• NN– Couldn’t run it (features vectors too large)

• (Graphical models can do role extraction and relationship classification simultaneously)

45

Evaluation

POS: Possible. The total number of truth values.POS = COR + INC + MIS

ACT: Actual. The total number of predictions.ACT = COR + INC + SPU

REC: Recall. A measure of how many of the truth values were produced:REC = COR / POS

PRE: Precision. A measure of how many of the predictions are actually in the truth:PRE = COR / ACT

46

Evaluation

• Alignment:

Prediction True value

From these we derive F-measures

47

Role Extraction: Results

Static DynamicS1 S2 D1 D2 D3

0.6 0.62 0.67 0.71 0.69

F-measures

48

Features impact: Role Extraction

• Most important features: 1)Word, 2)MeSH

• Models D1 D2 • All features 0.67 0.71• No word 0.58 0.61

-13.4% -14.1%

• No MeSH 0.63 0.65 -5.9% -8.4%

49

• Most important features: Roles

• Accuracy: D1 D2 NN • All feat. + roles 91.6 82.0 96.9• All feat. – roles 68.9 74.9 79.6

-24.7% -8.7% -17.8%

• All feat. + roles – Word 91.6 79.8 96.4 0% -2.8% -0.5%

• All feat. + roles – MeSH 91.6 84.6 97.3 0% 3.1% 0.4%

Features impact: Relation classification

50

Features impact: Relation classification

• Most realistic case: Roles not known• Most important features: 1) Mesh 2) Word for D1 and NN

(but vice versa for D2)

• Accuracy: D1 D2 NN • All feat. – roles 68.9 74.9 79.6 • All feat. - roles – Word 66.7 66.1 76.2

-3.3% -11.8% -4.3%

• All feat. - roles – MeSH 62.7 72.5 74.1 -9.1% -3.2% -6.9%

51

Conclusions

• Classification of subtle semantic relations in bioscience text– Discriminative model (neural network)

achieves high classification accuracy– Graphical models for the simultaneous

extraction of entities and relationships– Importance of lexical hierarchy

I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara...

Documents

Transcript of I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara...