Syntactic Contributions in the Entailment Task

Syntactic Contributions in the Entailment Task

Lucy Vanderwende,

Arul Menezes,

Rion Snow (Stanford)

RTE-1 analysis

• Recap of MSR’s manual analysis of RTE-1 test data; in principle, 74% is achievable using syntax and thesaurus

Without thesaurus

Using thesaurus

True 69 (9%) 147 (18%)

False 197 (25%) 243 (30%)

Not syntax 534 (67%) 410 (51%)

MENT algorithm

Predicting negative entailment using syntactic features:

Obtain syntactic dependency graphs for T and H sentences

Attempt to align each H node to a node in T

Check syntactic heuristics on aligned nodes

if match, then predict false

If no match, use lexical similarity model (with threshold)

MENT: heuristic alignment

MENT: superlative heuristic

Superlative heuristic (100% accurate, 5 test items):– If the superlatives align, and their heads are aligned, and the

head in Text has any additional modifiers, and those modifiers are aligned to some modifier in H, say yes, else say no.

(RTE2-test- #477)

• Crater Lake is the deepest lake in the United States, the second deepest in the Western Hemisphere, and the seventh deepest in the world, dropping downward to 1,932 feet just southeast of Merriam Cone.

• Crater Lake is the deepest lake in the world.

MENT: superlative heuristic

Superlative heuristic (100% accurate, 5 test items):– If the superlatives align, and their heads are aligned, and the

head in Text has any additional modifiers, and those modifiers are aligned to some modifier in H, say yes, else say no.

(RTE2-test- #477)

• Crater Lake is the deepest lake in the United States, the second deepest in the Western Hemisphere, and the seventh deepest lake in the world, dropping downward to 1,932 feet just southeast of Merriam Cone.

• Crater Lake is the deepest lake in the world.

Counterfactual heuristic (80% accurate, 15 test items):– If there is a pair of aligned nodes, and a second pair of aligned nodes,

and the PATH in the dependency contains a conditional or counterfactual, say no.

(RTE2-test- #473)

• Blondlot was trying to polarize X-rays when he claimed to have discovered this new form of radiation.

• Blondlot discovered x-rays.

MENT: Counterfactual heuristic

MENT: training feature weights

• “run2”: treating a syntactic heuristic match as a yes/no vote, alignment threshold set using training data

• “run1”: learning weights (using MaxEnt) for each syntactic and alignment heuristic, as well as for sub-components of these heuristics

MENT: results

Run1 (with feature weights)

Run2

Training (1717 sents) 67.79 65.40

Dev (450 sents) 66.22 63.77

RTE2 test (800 sents) 60.25 58.50

RUN1

TRUTH Yes No

Yes 268 132

No 186 214

MENT Run1 says no 43.25% of the time

MENT variations – no thresholds

• If heuristics apply, say no• Else say yes• 56% accurate• system says no 35%

• Say no, unless• everything is aligned and no

heuristics apply• 59.25% accurate• system says no 74.5%

SYSTEM

TRUTH Yes No

Yes 284 116

No 236 164

SYSTEM

TRUTH Yes No

Yes 134 261

No 65 335

** Note: Run2 = if no heuristics apply, and alignment score is above a threshold trained on the training set, then say yes, else no. Accuracy: 58.50

MENT variations – with threshold

• With learned alignment and syntactic heuristic weights, with alignment threshold from training, say no

• Else say yes• 60.25% accurate• System says no 43% of the time

• Say no, unless• alignment score is above an

Oracle threshold and no heuristics apply

• 61.25% accurate• System says no 70% of the time

SYSTEM

TRUTH Yes No

Yes 168 232

No 75 325

RUN1

TRUTH Yes No

Yes 268 132

No 186 214

Lessons?

• Use syntactic heuristics and sub-components as features and apply discriminative training

• Thresholding for lexical similarity isn’t stable across data sets

• Error Analysis …

bad parses (e.g., rte2 test #550)

How far do you take syntactic heuristics?

Location: for a pair of aligned verb nodes, if there is an argument in H, and that argument is aligned to a node in T, say no if that node is not also the same argument of the aligned verb (applied 7 times, 5 incorrect)

• Brandenburg Gate is one of Berlin's best known landmarks and is now regarded as one of the greatest symbols of German unity.

• Brandenburg Gate is in Berlin.

A great heuristic …but

Unaligned Verb: if there is an aligned subject and an aligned object, then if their verb is not aligned, say no

• This heuristic was not used because of its poor performance, for example:

– Rodriguez told detectives he never touched the burning backpack, which was loaded with plastic pipes packed with gunpowder and BBs.

– The burning backpack contained plastic pipes packed with gunpowder and BBs.

• Need to learn paraphrase similarity for verbs – see NAACL-HLT paper forthcoming.

Directions and Plans

• MSR submission available at http://research.microsoft.com/~lucyv/Might it be possible to have access to all sites’ submissions?

• Need to learn paraphrase similarity for verbs

• More feature engineering

• Different graph-matching strategies to avoid brittleness of syntactic heuristics

• Find more data for training to build more stable systems

A plug for Pyramids• Conservatives oppose any form of devolution.• The conservatives are opposed to devolution.• The UK’s Tory Prime Minister adamantly resisted calls for devolution of

British rule.

• Scotts want self-rule• … as buoyed as most Scotts by North Ireland’s prospective self-rule• Wales is following Scotland, and moving towards a call for an elected

assembly with devolved powers …

• A self-governing Wales would be part of the EU• … an independent Wales within the European community• … Wales could participate directly in forthcoming EC meetings …• … a fully self-governing Wales within the European Community.

A plug for Pyramids• Conservatives oppose any form of devolution.• The conservatives are opposed to devolution.• The UK’s Tory Prime Minister adamantly resisted calls for devolution of

British rule.

• Scotts want self-rule• … as buoyed as most Scotts by North Ireland’s prospective self-rule• Wales is following Scotland, and moving towards a call for an elected

assembly with devolved powers …

• A self-governing Wales would be part of the EU• … an independent Wales within the European community• … Wales could participate directly in forthcoming EC meetings …• … a fully self-governing Wales within the European Community.

SCU name, given by annotator

Candidate hypothesis?Candidate Text?

Syntactic Contributions in the Entailment Task

Documents

Transcript of Syntactic Contributions in the Entailment Task