Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley ...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley ...
Classifying Semantic Relations in Bioscience
Texts
Barbara RosarioMarti Hearst
SIMS, UC Berkeleyhttp://biotext.berkeley.edu
Supported by NSF DBI-0317510 and a gift from Genentech
Problem: Which relations hold between 2 entities?
Treatment Disease
Cure?
Prevent?
Side Effect?
Hepatitis Examples
Cure These results suggest that con A-induced
hepatitis was ameliorated by pretreatment with TJ-135.
Prevent A two-dose combined hepatitis A and B
vaccine would facilitate immunization programs
Vague Effect of interferon on hepatitis B
Two tasks
Relationship Extraction: Identify the several semantic relations
that can occur between the entities disease and treatment in bioscience text
Entity extraction: Related problem: identify such entities
The Approach
Data: MEDLINE abstracts and titles Graphical models
Combine in one framework both relation and entity extraction
Both static and dynamic models Simple discriminative approach:
Neural network Lexical, syntactic and semantic features
Outline
Related work Data and semantic relations Features Models and results Conclusions
Several DIFFERENT Relations between the Same Types of Entities
Thus differs from the problem statement of other work on relations
Many find one relation which holds between two entities (many based on ACE) Agichtein and Gravano (2000), lexical patterns for
location of Zelenko et al. (2002) SVM for person affiliation and
organization-location Hasegawa et al. (ACL 2004) Person-Organization ->
President “relation” Craven (1999, 2001) HMM for subcellular-location
and disorder-association Doesn’t identify the actual relation
Related work: Bioscience
Many hand-built rules Feldman et al. (2002), Friedman et al. (2001) Pustejovsky et al. (2002) Saric et al.; this conference
Data and Relations
MEDLINE, abstracts and titles 3662 sentences labeled
Relevant: 1724 Irrelevant: 1771
e.g., “Patients were followed up for 6 months”
2 types of Entities, many instances treatment and disease
7 Relationships between these entities
The labeled data is available at http://biotext.berkeley.edu
Semantic Relationships 810: Cure
Intravenous immune globulin for recurrent spontaneous abortion
616: Only Disease Social ties and susceptibility to the common
cold 166: Only Treatment
Flucticasone propionate is safe in recommended doses
63: Prevent Statins for prevention of stroke
Semantic Relationships 36: Vague
Phenylbutazone and leukemia 29: Side Effect
Malignant mesodermal mixed tumor of the uterus following irradiation
4: Does NOT cure Evidence for double resistance to
permethrin and malathion in head lice
Features
Word Part of speech Phrase constituent Orthographic features
‘is number’, ‘all letters are capitalized’, ‘first letter is capitalized’ …
MeSH (semantic features) Replace words, or sequences of words, with
generalizations via MeSH categories Peritoneum -> Abdomen
Features (cont.): MeSH MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]
Features (cont.): MeSH
1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02]
Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07]
+ Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..)
Body Regions [A01] Abdomen [A01.047]
Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Umbilicus [A01.047.849]
Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)
Models
2 static generative models 3 dynamic generative models 1 discriminative model (neural
networks)
Static Graphical Models S1: observations dependent on Role
but independent from Relation given roles
S2: observations dependent on both Relation and Role
S1 S2
Dynamic Graphical Models
D1, D2 as in S1, S2
D3: only one observation per state isdependent on both the relation and the role
D1
D2
D3
Graphical Models Relation node:
Semantic relation (cure, prevent, none..) expressed in the sentence
Graphical Models
Role nodes: 3 choices: treatment, disease, or
none
Graphical Models
Feature nodes (observed): word, POS, MeSH…
Graphical Models
For Dynamic Model D1: Joint probability distribution over relation,
roles and features nodes
Parameters estimated with maximum likelihood and absolute discounting smoothing
) Role | P(f, Rela) | RoleP(Role
Rela)|oleP(Rela)P(R)f,..f,RoleleP(Rela, Ro
t
T
1t
n
j
jtt-1t
0nTT0
1
10 , ,..,
Our D1
Thompson et al. 2003Frame classification and role
labeling for FrameNet sentencesTarget word must be observed
More relations and roles
Neural Networks
Feed-forward network (MATLAB) Training with conjugate gradient
descent One hidden layer (hyperbolic tangent
function) Logistic sigmoid function for the output
layer representing the relationships Same features Discriminative approach
Relation extraction
Results in terms of classification accuracy (with and without irrelevant sentences)
2 cases: Roles hidden Roles given
Graphical models
NN: simple classification problem
)f,..,f,,...,RoleRole,P(RelaRela nTTkRela
^
k
argmax 100
Relation classification: Results
Neural Net always best
Relation classification: Results
With no smoothing, D1 best Graphical Model
Relation classification: Results
With Smoothing and No Roles, D2 best GM
Relation classification: Results
With Smoothing and Roles, D1 best GM
Relation classification: Results
Dynamic models always outperform Static
Relation classification: Confusion Matrix
Computed for the model D2, “rel + irrel.”, “only features”
Role extraction
Results in terms of F-measure Graphical models
Junction tree algorithm (BNT) Relation hidden and marginalized over
NN Couldn’t run it (features vectors too large)
(Graphical models can do role extraction and relationship classification simultaneously)
Role Extraction: Results
F-measuresD1 best when no smoothing
Role Extraction: ResultsF-measuresD2 best with smoothing, but doesn’t boost
scores as much as in relation classification
Features impact: Role Extraction
Most important features: 1)Word, 2)MeSH
Models D1 D2 All features 0.67 0.71 No word 0.58 0.61
-13.4% -14.1% No MeSH 0.63 0.65
-5.9% -8.4%
(rel. + irrel.)
Most important features: Roles
Accuracy: D1 D2 NN All feat. + roles 91.6 82.0 96.9 All feat. – roles 68.9 74.9 79.6
-24.7% -8.7% -17.8% All feat. + roles – Word 91.6 79.8 96.4
0% -2.8% -0.5% All feat. + roles – MeSH 91.6 84.6 97.3
0% 3.1% 0.4%
Features impact: Relation classification
(rel. + irrel.)
Features impact: Relation classification
Most realistic case: Roles not known Most important features: 1) Mesh 2) Word for D1
and NN (but vice versa for D2)
Accuracy: D1 D2 NN All feat. – roles 68.9 74.9 79.6 All feat. - roles – Word 66.7 66.1 76.2
-3.3% -11.8% -4.3% All feat. - roles – MeSH 62.7 72.5 74.1
-9.1% -3.2% -6.9% (rel. + irrel.)
Conclusions Classification of subtle semantic relations in
bioscience text Discriminative model (neural network) achieves
high classification accuracy Graphical models for the simultaneous extraction
of entities and relationships Importance of lexical hierarchy
Future work: A new collection of disease/treatment data Different entities/relations Unsupervised learning to discover relation types
Thank you!
Barbara RosarioMarti Hearst
SIMS, UC Berkeleyhttp://biotext.berkeley.edu
Additional slides
Smoothing: absolute discounting
Lower the probability of seen events by subtracting a constant from their count (ML estimate: )
The remaining probability is evenly divided by the unseen events
e
MLec
eceP
)(
)()(
0)( if
0)( if )()(
eP
ePePeP
ML
MLMLad
events)seen (
events)seen (
UNc
c
F-measures for role extraction in function of smoothing factors
Relation accuracies in function of smoothing factors
Role Extraction: ResultsStatic models better than Dynamic for
Note: No Neural Networks
Features impact: Role Extraction
Most important features: 1)Word, 2)MeSH
Models D1 D2 Average All features 0.67 0.71 No word 0.58 0.61
-13.4% -14.1% -13.7% No MeSH 0.63 0.65
-5.9% -8.4% -7.2%
(rel. + irrel.)
Features impact: Role extraction
Most important features: 1) Word, 2) MeSH
F-measures: D1 D2 Average All features 0.72 0.73 No word 0.65 0.66
-9.7% -9.6% -9.6%
No MeSH 0.69 0.69 -4.2% -5.5% -4.8%
(only rel.)
Features impact: Role extraction
Most important features: 1) Word, 2) MeSH
F-measures: D1 D2 All features 0.72 0.73 No word 0.65 0.66
-9.7% -9.6% No MeSH 0.69 0.69
-4.2% -5.5%
(only rel.)
Most important features: Roles
Accuracy: D1 D2 NN Avg. All feat. + roles 91.6 82.0 96.9 All feat. – roles 68.9 74.9 79.6
-24.7% -8.7% -17.8% -17.1% All feat. + roles – Word 91.6 79.8 96.4
0% -2.8% -0.5% -1.1% All feat. + roles – MeSH 91.6 84.6 97.3
0% 3.1% 0.4% 1.1%
Features impact: Relation classification
(rel. + irrel.)
Features impact: Relation classification
Most realistic case: Roles not known Most important features: 1) Mesh 2) Word for D1
and NN (but vice versa for D2)
Accuracy: D1 D2 NN Avg. All feat. – roles 68.9 74.9 79.6 All feat. - roles – Word 66.7 66.1 76.2
-3.3% -11.8% -4.3% -6.4% All feat. - roles – MeSH 62.7 72.5 74.1
-9.1% -3.2% -6.9% -6.4%
(rel. + irrel.)