Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago,...

25
Bachelor Informatica Contradiction detection between news articles Kasper van Veen June 8, 2016 Supervisor(s): Christof Monz (UvA) Signed: Informatica — Universiteit van Amsterdam

Transcript of Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago,...

Page 1: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Bachelor Informatica

Contradiction detection betweennews articles

Kasper van Veen

June 8, 2016

Supervisor(s): Christof Monz (UvA)

Signed:

Informatica—

Universiteit

vanAmst

erdam

Page 2: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

2

Page 3: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Abstract

This thesis will try to detect contradictions in two news articles using four phases. Duringthese phases, dependency graphs are obtained, aligned with each other and non co-referentsentences are filtered. The last phase is to apply logistic regression using five features. Thisexperiment will focus on contradictions containing antonyms and negations. The experi-ments at the end will show the results using contradictions found in the RTE datasets andin between two news articles. From the results we can conclude that the chosen featureswork well, but are not enough to cover the whole RTE dataset.

3

Page 4: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

4

Page 5: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Contents

1 Introduction 71.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background 92.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 What are contradictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 How do we detect contradictions? 113.1 Find semantics and syntax of sentences . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Using spaCy for syntax analysis . . . . . . . . . . . . . . . . . . . . . 113.1.2 Dependency graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Alignment between dependency graphs . . . . . . . . . . . . . . . . . . . . . . 143.3 Filter non co-referent sentences . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5.1 Antonyms feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.5.2 Switching of object and subject feature . . . . . . . . . . . . . . . . . 183.5.3 Alignment feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.5.4 Negation feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Experiments 19

5 Conclusion and discussion 23

5

Page 6: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

6

Page 7: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

CHAPTER 1

Introduction

When the MH17 plane disaster occurred two years ago, the Western media immediatelypointed their fingers at the Russians. The Russians however, pleaded the Ukrainian govern-ment as guilty. Instead of accepting and believing what the Western news said, I decidedto start my own research to find out what both parties were saying. I found an articlepublished on the BBC on the 14th of October 2015, which stated:

‘Mr Joustra said pro-Russian rebels were in charge of the area from where themissile that hit MH17 had been fired.’

while Pravda (one of the biggest Russian news sources) published an article on the 15th ofOctober 2015 which stated:

‘Group representatives confirmed that the plane was shot down from the territorycontrolled by the official Kiev.’

It can easily be seen that two of the biggest news sources in the world were making totallydifferent statements about the same subject. This is an interesting observation because itcan be seen as propaganda by both parties. Although it is not possible to find out the truth,it is however possible to show the statements in which parties contradict from each other.This is where the idea for the thesis came from: to build a program that is able to detectcontradictions and therefore show the differences in opinion.

Stanford University did some research on contradiction detection.[3] It uses four steps todetermine if sentence pairs from the RTE3 dataset are either a contradiction or not. Thisthesis will use a similar approach from Stanford University but will also use different toolsand features which will be discussed later in this thesis.

The research question of this thesis is: what kind of contradictions can I detect in tworelated news articles?

1.1 Structure

This thesis starts with a short introduction about the methodology, recent research and thedefinition of a contradiction. Chapter three will go further into the subject and will describethe four steps how computers are able to detect contradictions. Chapter four will showthe results of the experiment using RTE datasets and a few contradictions found in newsarticles. Chapter five will mark out the conclusion and the discussion about future work.

7

Page 8: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

8

Page 9: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

CHAPTER 2

Background

2.1 Methodology

Instead of using the Stanford parser as a dependency parser like Stanford University, thisexperiment uses spaCy for a faster and more accurate result. The WordNet database is usedto acquire a large amount of antonyms and negations. This thesis will not only focus onthe contradictions in the RTE datasets, it (Recognizing Textual Entailment) will also tryto detect the contradictions found between different news articles about the MH17 disaster.The features that are used for the logistic regression are based on some contradictions foundin the news articles. These contradictions are mainly antonyms and negations.

We used the RTE1 dataset to verify if the features and the logistic regression classifierare sufficient enough to detect contradictions.

2.2 What are contradictions

Before the experiment could start, the definition of a contradiction should be clear. ”Con-tradictions occur when sentence A and sentence B are unlikely to be true at the same time.”[3] In terms of logical expressions it states: A∧¬B or ¬A∧B. An important requirement forcontradictions is that both sentences are about the same event (co-referent sentences). Con-tradictions occur in many different forms and levels of difficulties. Antonyms and negationsare the easiest to recognize followed by numerical differences. Next come factive and modalwords and the hardest are sentences which require world-knowledge to be understood.

Examples are a good way to show what the differences are and how they are recog-nized. Antonyms are words that are opposites of each other like big/small, rich/poor andyoung/old:

‘The people of Kenya are rich’ vs ’The people of Kenya are poor’

Negations are words that are negations of each other like did/didn’t, have/haven’t andcould/couldn’t:

‘Frank committed the crime’ vs ’Frank didn’t commit the crime’

Numerical differences occur when there is a difference between numbers:

‘Apple’s annual revenue was 50 million in 2016’ vs ’In 2016, Apple’s annualrevenue was 40 million.

Also a difference between the date on which an event occurred could be seen as a contra-diction:

‘Willem-Alexander became king of the Netherlands in 2013’ vs ’Willem-Alexanderbecame king of the Netherlands in 2010.

To detect other numerical differences the computer should be able to recognize words like‘no’, ‘some’, ‘many’, ‘most’, and ‘all’ [6]. These words add extra value to the number nextit to. An example:

9

Page 10: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

‘More than 500 people attended the ceremony’ vs ’700 people attended the cere-mony’

The program would detect this as a contradiction, since 500 and 700 are different numbers.However, ‘more than 500’ is technically the same as ‘700’[1]. Although this is not true inall cases: it depends on the range between the two numbers. ‘At least 200’ and ‘5000’ aretoo far apart from each other to be considered reliable sources, although it is technically thesame. It is up to the end user to determine this boundary.

Factive words add necessity or possibility to a verb:

‘The burglar managed to open the door’ vs ‘The burglar opened the door’

Modal words add modality to a verb:

‘He will go to work’ vs ‘He would to go to work’

The last and hardest type of contradictions needs world knowledge to be understood. Forthe human eye it might be easy to recognize them as a contradiction but for a computer itis difficult. An example:

‘Albert Einstein was in Austria’ vs ‘Albert Einstein was in Germany’

This is not a contradiction because both sentences can be true time. Albert Einstein couldhave been in both places, but not in the same time

‘Albert Einstein died in Austria’ vs ‘Albert Einstein died in Germany’

This is obviously a contradiction, because both sentences can’t be true. Someone can onlypass away in one place. ‘Died in’ should be seen as a function of a person’s unique place ofdeath[7]. For a program that is able to detect contradictions it is hard to know all of thesefunctions, so an idea for further research is to make a dataset of these functions. Researchshowed that only few contradictions could be detected using syntactic matching. The restdepends on having world knowledge and the understanding of the semantic structure of thesentences [2]. A relative easy part of world knowledge are location relations. It is commonsense that ‘Amsterdam’ is the capital of the Netherlands. A computer however, often doesnot have that knowledge.

‘Today, the mayor returned to the capital of the Netherlands, Amsterdam.’ vs‘Today, the mayor returned to Rotterdam, the capital of the Netherlands.’

The program would detect a contradiction here, because the mayor returned to two differentcities on the same day. Still, this is not a contradiction because the second sentence is false.Rotterdam is not the capital of the Netherlands. Holonyms should be used to constructa dataset containing world knowledge [1]. Holonomy is a semantic relationship betweenmultiple terms. ‘House’ is a holonomy of ‘door’ and ‘window’. In the case above, ‘capitalof the Netherlands’ is a holonomy of ‘Amsterdam’. If implemented correctly in a dataset, itshould see the second sentence as false and thereby will not detect any contradictions.

There are many different sorts of contradictions which makes it hard to detect them.Often the sentence pairs are not as similar as the sentences above and thus require theknowledge of their syntactic structure. This paper will mainly focus on the antonym andnegations to test if the contradiction detection program is working on a specific level.

10

Page 11: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

CHAPTER 3

How do we detect contradictions?

Two sentences are needed to determine if they are contradictions or not, so we name themsentence A and B. For contradiction detection there are various steps that need to be made.

The first step is the find the syntactic structure for each sentence. The syntactic structureof a sentence shows the words that form a sentence and their properties like verbs andadjuncts. spaCy will be used as a dependency parser to achieve the first step. Next thetwo graphs obtained from spaCy will be aligned with each other to acquire a specific scorewhich will determine if sentence A and B have the possibility to be a contradiction or not.This score is based on the occurrence of antonyms, negations and other words that mightlead to a contradiction. For the third step we have to filter the non co-referent sentences,which are sentences that are not about the same event. The final step for this programis to apply logistic regression which will determine if the sentences are true contradictionsof each other. For this experiment, a couple already existing datasets are used. RTE aredatasets from Stanford University which contains 800 pairs of sentences with the possibilityof being a contradiction. WordNet is a dataset that contains many of the English antonymsand negations. These datasets are used as a tool to complete this experiment

3.1 Find semantics and syntax of sentences

Computers and humans differ in many ways. When a human hears a sentence, its priorknowledge is used to understand it. The person does not only uses its knowledge of grammarbut he/she also understands the words and their meanings. To detect contradictions, acomputer should also be able to understand the context of the sentence and the meaningof the words. To get a better understanding about the content of a sentence, the semanticsshould be studied.

Semantic is the study of meaning of linguistic expressions. It shows how words andphrases are related to their denotation. The meaning of words often depend on the wholesentence. A good example to show this:

‘Spears have a very sharp point’ vs ‘You should not point at people’

Words like ‘point’ are called homonyms: they are written the same but could have differentmeanings. In the first sentence, ’point’ is a noun while in the second sentence ‘point’ is averb. The syntax are the grammar, rules and principles that give the structure of a language.The grammar and the meaning of a sentence are closely related. Sometimes the grammarcan be right but the sentence is meaningless, and vice versa.

‘The helpless apple saw anger’ vs ‘The young man took in the shop some cigarettes’

The first sentence doesn’t make any sense, however the syntactic structure is correct. Thesecond sentence is syntactic incorrect but the reader can understand the sentence.

3.1.1 Using spaCy for syntax analysis

spaCy was used to get the syntactic structure of a sentence. spaCy is a dependency parserwhich reads every word and links them together based on their syntactical structure. It uses

11

Page 12: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

tokenization to split the sentence in separate words and numbers. White space charactersare used to separate the tokens from each other. Although it uses 1.5GB of RAM, thechoice for using spaCy instead of other dependency parsers like Stanford was quickly made.spaCy is a fast and very accurate parser written in Python which made it easier to combinewith the other components of the contradiction detection program. It can also recognizehomonyms based on the rest of the sentence, a proof of this can be seen in figure 3.1 and3.2 where the sentence pair ‘Spears have a very sharp point’ and ‘You should not point atpeople’ is used. The figures are images taken from the CSS version of spaCy’s dependencyparser[8].

Figure 3.1: point as a noun

Figure 3.2: point as a verb

It is clear that ‘point’ in the first sentence is seen as a noun while ‘point’ in the secondsentence is a verb. Names of persons, companies, countries and other named entities arealso recognized which made spaCy the perfect tool to start the experiment. It is obviouswhat verb and noun means but the other tags are important as well so table 1 gives moreinformation about the most common tags [10].

12

Page 13: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

VERB verbs (all tenses and modes)NOUN nouns (common and proper)PRON pronounsPROPN proper nounAUX auxiliary verbADJ adjectivesADV adverbsADP adpositions (prepositions and postpositions)CONJ conjunctionsDET determinersINTJ interjectionNUM cardinal numbersPRT particles or other function wordsPUNCT punctuationSCONJ subordinating conjunctionSYM symbol

For this experiment it is useless to only use the CSS version of spaCy so the API isused to obtain the graphs. A next pressing issue that came across during this experimentare syntactic ambiguities. They occur when the specific word order is not enough to fullyunderstand the sentence. A famous example came from a quote by Grouche Marx:

‘One morning I shot an elephant in my pajamas. How he got into my pajamasI’ll never know..’

This sentence can be interpreted in two different ways, either: I shot the elephant when Iwas wearing my pyjamas or I shot the elephant, who was wearing my pyjamas. When thissentence is used in spaCy’s dependency parser we get the result seen in figure 3.3.

Figure 3.3: Syntactic ambiguity

This means that spaCy is interpreting this sentence as: I shot the elephant when I waswearing my pyjamas. ‘Shot’ refers to ‘in’ which refers to ’pyjamas’. In the other case‘elephant’ would have referred to ‘pyjamas’. Although syntactic ambiguities don’t occur sooften, it is something to take into consideration for this experiment. So far this experimentis only able to detect one interpretation and therefore it might not be able to detect somecontradictions. When we transform the sentence above into a more realistic version we get:

‘The burglar shot someone in his pyjamas’

which translates in:

‘The burglar shot someone, while the burglar was wearing his pyjamas’

Now a sentence that might look like a contradiction:

‘Someone was not shot wearing his pyjamas’,

this might look like a contradiction, but it is not about the same event. Nobody was shotwhile wearing pyjamas, instead somebody shot while wearing pyjamas.

So after both sentences are parsed, they should get their corresponding dependencygraphs.

13

Page 14: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

3.1.2 Dependency graphs

A dependency graph is a graph which represents dependencies of various objects (in thiscase, words) to each other. Each graph contains as much information as possible about thesemantic structure of the sentence. Each sentence is split up into words (nodes) and eachedge show the grammatical relationship between the words. In the program, there are threedifferent dependency graphs that are shown to the user. POS, tag and NER. From the officialspaCy documentation: POS represents the word-class of a token. It is a coarse-grained andless detailed tag. Tag is different, it is fine-grained and more detailed than POS. It representsnot only the word-class but also some standard morphological information about the token.Tag is used for the syntactic parser because they are language and treebank dependent.The tagger has the ability to predict these fine-grained tags and a mapping table is usedto reduce them to the coarse-grained .pos tags.[9] At last, NER stands for named-entityrecognition, which are names of persons, places, companies and other known entities.

The program allocates each word to a single node. However, some words are auxiliaryverbs which are verbs that are used in forming the tenses of other verbs. Examples are:‘were lost’ and ‘must go’. A auxiliary verb is often attached to a main verb, which showsthe semantic content of the sentence. Another example: ‘I did not complete my homework’,the main verb is ‘complete’ while ‘did not’ is used to support it. The program sees thesetwo words as one word and makes sure to only allocate it to one single node. The output ofthis graph is:

I did not complete my homework

[’PRP’, ’VB’, ’PRP$’, ’NN’]

[’PRON’, ’VERB’, ’ADJ’, ’NOUN’]

[’’, ’’, ’’, ’’,’’]

The next step is to align the two graphs to each other to find the similarities anddifferences.

3.2 Alignment between dependency graphs

After each sentence is transformed into a dependency graph, they are aligned to each other.Alignment between graphs is the concept of mapping two graphs with each other, to makethem as similar as possible. For contradiction detection it is used to map words (nodes)from sentence A to similar words in sentence B. If a word does not have any similar words,it is ignored.

The idea is to obtain a specific score based on the alignment. Synonyms and antonymswill get the highest score while words that have no similarity (irrelevant words) will get thelowest score.

The similarity score is based on the cosine metric. This is an similarity measurementbetween two vectors that evaluates the cosine of the angel between them. The spaCy toolkituses the word2vec model vectors produced by Levy et al.[5], and those vectors are used forthe cosine metric.

If two nodes are identical, they get a score of 100% (1.000). Nodes also get a score of1.0000 if the node is a substring of another node. To avoid matching a substring of a word,a space is prepended and appended to the node. This is done to make sure that nodes arenot matched partially. For example, the words ‘automobile’ and ‘mobile’ should not match.

Experiments showed the importance of merging named entities. In the sentences

‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose 46 per-cent’

‘Mitsubishi Motors Corp.’ and ‘Mitsubishi’ should be seen as the same corporation.Originally, the program parsed Mitsubishi Motor Corp into three different tokens and com-pared those to ‘Mitsubishi’. Part of the output of finding this similarity can be seen below.

similarity of Mitsubishi (mitsubishi) and Mitsubishi (mitsubishi) = 1.0000

similarity of Mitsubishi (mitsubishi) and Motors (motors) = 0.5534

similarity of Mitsubishi (mitsubishi) and Corp. (corp.) = 0.0000

similarity of Mitsubishi (mitsubishi) and sales (sale) = 0.2112

similarity of Mitsubishi (mitsubishi) and percent (percent) = 0.2821

similarity of Mitsubishi (mitsubishi) is the highest (1.0000)

14

Page 15: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Merging named entities only applies for PROPN (proper noun), thus names for persons,companies and places. Merging named entities is limited to PROPN, to prevent NUM-PERCENT pairs from merging. Consider the example below:

‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose morethan 40 percent’

If merging is not limited to PROPN, the NUM-PERCENT pair will be merged. Thiswill result in a low entailment score, because it sees ‘40 percent; and ‘46 percent’ as totallydifferent words with no similarity:

similarity of 40 percent (40 percent) and 46 percent (46 percent) = 0.0000

However when merging named entities is turned off for NUM (cardinal numbers) thetotal alignment score will be much higher:

similarity of 40 (40) and 46 (46) = 0.7749

similarity of 46 (46) is the highest (0.7749)

--- percent ---

similarity of percent (percent) and Mitsubishi Motors Corp. (Mitsubishi Motors Corp.) = 0.0000

similarity of percent (percent) and sales (sale) = 0.3118

similarity of percent (percent) and percent (percent) = 1.0000

similarity of percent (percent) is the highest (1.0000)

Mathematically speaking: f(‘40 percent’, ‘46 percent’) < f(‘40’,‘46’)+f(‘percent’, ‘percent’)

where f is the similarity function. Computing this equation results in 0 < 0.7749 + 1.0000.The words which have the best similarity and thus the highest score will be used to determinethe total alignment score.

3.3 Filter non co-referent sentences

Some sentence pairs might obtain a high alignment score but are not contradiction at all.An important requirement of contradictions is that both sentences are about the same event,called co-reference. A good example to illustrate this is:

‘The palace of the Dutch royal family is in Amsterdam’ vs ‘The palace of theBritish royal family is in London’

This will get a very high alignment score because most words are the same. Amster-dam and London are different places and therefore the program should see this pair as acontradiction, because the palace of the royal family can only be in one country.

similarity alignment score

palace → palace 1.0000family → family 1.0000is → is 1.0000Amsterdam → London 0.6464

However this is about two different royal families, so it is not a contradiction at all. Herepeople can see the importance of filtering non co-referent sentences. This thesis will onlytry to detect contradictions from antonyms and negations in a given text. Contradictionsbased on world knowledge are very hard to detect without a reliable database containingworld knowledge. Therefore this part won’t be used further.

3.4 Logistic Regression

When the alignment between two graphs have resulted in a high score, logistic regressionwill be the final step in this experiment. It will determine if two sentences are entailed witheach other. Entailment occurs when sentence A needs the truth of sentence B to be trueitself. So if A is true, then B is true: A |= B. If there is no entailment in this stage, itmeans that sentence A and B have a high possibility of being a contradiction. An exampleof entailment:

‘The child broke the glass’ vs ‘The glass is broken’

15

Page 16: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

In this example it can be seen that if the first sentence is true, the second sentence is trueas well. If the child broke the glass it means that the glass is broken.

Logistic regression is a mathematical model which determines the probability if a specificevent will occur or not. It uses previously given data to predict the outcome of new data.The goal is to let a computer make the decisions while it is not programmed to do a taskspecifically. The model will learn from a training set, which consists of a matrix X anda vector y. X contains all the features, while vector y is the decision. In matrix X, eachcolumn shows a single feature and each row shows the value of that feature.

A mathematical approach for logistic regression starts with the following formula:

logp

1− p = β0 + β1x1 + β2x2 + . . .+ βnxn

where x1, x2, . . . , xn are the elements in the feature vector. The bias term β0 is usedto vertically translate the decision boundary to make sure it does not have to intersect theorigin. Since there is no feature x0, it has a weight of 1 (1 · β0 = β0). Therefore everyfeature vector starts with weight 1, called the bias term. The remaining β1, β2 and βn arethe weights that correspond to the feature vector elements

The probability p notates the chance of something to happen, which is always between0 and 1. The odds of the dependent variables are log p

1−p. So the odds for Y:

P (Y = 1)

P (Y = 0)=

P (Y = 1)

1− P (Y = 1)

In the case of this experiment, the data that is used is based on features which, if usedtogether, will determine if two sentences entail each other. Stanford University tried asimilar experiment, it uses 28 features to recognize entailment using specific patterns.[6].The features that are used in this thesis mainly focus on antonyms, negations, switchingof subjects and objects, alignment and possible more in the future. If a pair of sentencescontain antonyms or many negations, the possibility of a contradiction is high and there isno entailment.

This experiment uses a three class classifier to determine entailment. The three classesare: ‘yes’, ‘no’, and ‘unknown’.

A famous three class classifier is the iris dataset. The iris dataset contains the mea-surements of four attributes of 150 iris flowers from three different types of irises: setosa,virginica and versicolor[4]. The four attributes are sepal length in cm, sepal width in cm,petal length in cm and petal width in cm

Some data points in figure 3.5 are incorrectly classified because the decision boundary isin the center of a cluster. The two features are not distinctive enough to separate the twoclasses. The conclusion is that it is very important to have distinctive features in order topredict the right class.

3.5 Features

For this experiment we chose a few features to detect contradictions:

• Amount of antonyms

• Switching of object and subject

• Alignment

• Amount of negations

3.5.1 Antonyms feature

This feature will search for antonyms in a sentence pair. When antonyms occur in twosentences, there is often no entailment and thus a high change of contradiction. An examplethat contains an antonym is:

‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose 46 per-cent’

16

Page 17: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Figure 3.4: An example of a 2D plot for contradiction features. The red dots means a lowprobability of contradiction while the blue squares show a high probability of contradiction.Feature one could be the amount of negations while feature two could be the amount of antonyms.A line will separate the non-entailment from the entailment. All sentences above the line willhave no entailment, and thus have a high chance of being a contradiction. Everything below theline will have high entailment and therefore a low chance of being a contradiction. This line, alsocalled the decision boundary, is obtained after fitting the model using training data.

‘Fell’ and ‘rose’ are not direct antonyms like ‘good’ and ‘bad’. However the synonyms of‘fell’ and ‘rose’ are antonyms. The program will first check if word A and B are antonyms, ifnot, it will check if word A is an antonym of a synonym of B or vice versa. To improve thedetection of antonyms, the lemma of a word is used. A lemma is the result of canonicalizationof a word. In this case the lemma of ‘fell’ is ‘fall’ and the lemma of ‘rose’ is ‘rise’.

The word ’fall’ can have different meanings. It could be a synonym for ‘autumn’ or asynonym for ‘descend’. In this example the program will compare ‘rise’ (meaning: to go up)with a synonym of fall: descend (meaning: to go down). The output of the program showswhich synonyms are used:

### are_antonyms(fell, rose):

lemma1: Lemma(’fall.v.01.fall’)

lemma2: Lemma(’rise.v.01.rise’)

===========

synonyms: {Lemma(’decrease.v.01.decrease’),

Lemma(’descend.v.01.come_down’), Lemma(’precipitate.v.03.precipitate’),

Lemma(’fall.v.32.settle’), Lemma(’decrease.v.01.lessen’),

Lemma(’hang.v.05.flow’), Lemma(’fall.v.21.return’),

Lemma(’fall.v.20.light’), Lemma(’fall.v.04.come’),

Lemma(’decrease.v.01.diminish’), Lemma(’fall.v.23.fall_down’),

Lemma(’fall.v.01.fall’), Lemma(’hang.v.05.hang’),

Lemma(’fall.v.08.shine’), Lemma(’fall.v.21.pass’),

Lemma(’descend.v.01.go_down’), Lemma(’fall.v.21.devolve’),

Lemma(’accrue.v.02.accrue’), Lemma(’fall.v.08.strike’),

Lemma(’descend.v.01.descend’)}

17

Page 18: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Figure 3.5: Two features of the iris dataset. This dataset is used to classify each type of irisbased on the given measurements. There is a cluster in the blue area and a second cluster inthe brown/red area. The blue one contains Iris setosa, while the other two types of flowers aregrouped together in the second cluster.

antonyms: [Lemma(’descend.v.01.fall’)]

fell rose => True

The string of each lemma consists of four parts. First part is a synonym, followed by aletter which indicates if it is a verb (v), noun (n), adjective (a), adjective satellite (s) oradverb (r). Next is a two digit hexadecimal integer which shows the amount of words usedin the synset. The last part consists of another synonym and the dot symbol is used as aseparator. The function are antonyms returns either ‘true’ or ‘false’ to see if the two wordsare antonyms of each other.

3.5.2 Switching of object and subject feature

In some cases the subject and object are switched in the sentences. This feature detects whenan object becomes a subject and vise versa. If it detects a switch, there is no entailment andthus a higher chance of being a contradiction. Since the filtering of non co-referent sentencesis not implemented, it can not clearly be said that they contradict each other because theymight be about different events.

‘CD Technologies announced that it has closed the acquisition of Datel, Inc.’ vs‘Datel acquired CD Technologies’

In this example, ‘CD Technologies’ was the subject in the first sentence, but the object inthe second sentence. So in this case the sentences contradict each other.

3.5.3 Alignment feature

This feature uses the alignment score to determine if a pair of sentences contradict eachother. When the alignment score is high, it is possible to predict if sentences entail eachother. If the alignment score is low, there is a high chance that entailment is unknown.Sentence pairs need an alignment score higher than 4. This number is empirically chosen.

3.5.4 Negation feature

Negations are words like ‘not’ (since ‘didn’t’ and ‘haven’t’ parse as two nodes). The firstfeature will count the number of negations in the first graph, while the second feature willdetect negations in the next graph. In this way it will try to recognize patterns regardingnegations.

18

Page 19: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

CHAPTER 4

Experiments

In this experiment, various RTE datasets are used to detect entailment, which contain thefollowing numbers [3]:

Dataset # of contradictions # of total pairs

RTE1 dev1 48 287RTE1 dev2 55 280RTE1 test 149 800RTE2 dev 111 800RTE3 dev 80 800RTE3 test 72 800

These datasets contain many sentence pairs divided in three classes based on entailment:‘yes’, ‘no’ and ‘unknown’. Unknown entailment is often the result of non co-referent sen-tences. An example of a non co-referent sentence pair found in the RTE1 dev2 3ways datasetis:

‘The Irish Sales, Services and Marketing Operation of Microsoft was establishedin 1991’ vs ‘Microsoft was established in 1991’

The first sentence is about a specific department of Microsoft, the second sentence is aboutMicrosoft itself. Although they are similar, they are not about the same event and thereforeentailment is unknown. To detect entailment, 10 pairs containing antonyms and negationswere chosen from the RTE1 dev1 3ways and RTE1 dev2 3ways datasets and 16 more werechosen from the RTE1 test 3ways

Dataset # of pairs # cont. antonyms # cont. negations % accurate

RTE1 dev 10 5 2 90%RTE1 test 16 11 5 62.5%

The table above shows the total results. For the RTE1 dev datasets, 10 pairs were cho-sen of which 5 contained antonyms, 2 contained negations and 2 contained none of them.Together with the features, the program achieved an accuracy of 90%. When looking atthe RTE1 test dataset, 16 pairs were chosen. There were 11 pairs containing antonyms and5 pairs containing negations. Here we achieved an accuracy of 62.5%. Some other resultsshowing the output of individual sentence pairs are shown below:

=== id=13 entails=1 length=None task=IR ===

T: iTunes software has seen strong sales in Europe.

H: Strong sales for iTunes in Europe.

alignment: 3.0

features: [ 0. 0. 0. 0. 0.]

This output shows that the alignment score is ‘3.0’. There are three words (‘sales’, ‘iTunes’and ‘Europe’) that are exactly the same. Exact matches are 100% the same and therefore geta score of ‘1.0000’ each. The summation of these values gives the final alignment score. Theword ‘strong’ is not a PROPN, NOUN or VERB and therefore not considered for matching.

The next number is the antonym feature and since there are no antonyms in both sen-tences, the value is 0. The last number is the switch between object and subject feature.

19

Page 20: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Since there is no verb in the second sentence, there are no objects and subjects to switch.Therefore the object and subject switch feature has a value of 0.

=== id=148 entails=0 length=None task=RC ===

T: The Philippine Stock Exchange Composite Index rose 0.1 percent to 1573.65.

H: The Philippine Stock Exchange Composite Index dropped.

alignment: 5.50424838962

features: [ 1. 1. 0. 0. 0.]

In this example the alignment score is high, although it should be lower. The PhilippineStock Exchange Composite Index should be seen as one organization and therefore only getan alignment score of ‘1.0000’. The words ‘rose’ and ‘dropped’ are antonyms and thereforealso get a high alignment score. Since there is an antonym, the second value is now 1. Thismeans that there is no entailment en thus a high probability of contradictions.

=== id=177 entails=0 length=None task=RC ===

T: Increased storage isn’t all Microsoft will be offering its Hotmail users

--they can also look forward to free anti-virus protection.

H: Microsoft won’t offer increased storage to its users.

alignment: 4.99999963253

n’t

n’t

features: [ 1. 0. 1. 1. 1.]

This case shows a high entailment and the change of subject and object. In the first sentence‘increased storage’ is the object and ‘Microsoft’ is the subject. This is a false positive because‘Microsoft’ is not switched as an object, only its position in the sentence is switched. In bothsentences, negations are detected. Although there is no antonym, there is no entailment sothe probability of contradictions are high.

=== id=969 entails=0 length=None task=PP ===

T: Doug Lawrence bought the impressionist oil landscape by J. Ottis

Adams in the mid-1970s at a Fort Wayne antiques dealer.

H: Doug Lawrence sold the impressionist oil landscape by J. Ottis Adams

alignment: 4.78047287886

features: [ 1. 1. 0. 0. 0.]

This is a clear example of an antonym.

=== id=971 entails=0 length=None task=PP ===

T: Mitsubishi Motors Corp.’s new vehicle sales in the US fell 46 percent in June

H: Mitsubishi sales rose 46 percent

alignment: 4.48539948256

features: [ 1. 1. 0. 0. 0.]

This sentence pair used in the thesis earlier is also a clear example of antonym.

=== DEV DATASET =============================

number of pairs in dev: 10

logreg score on dev: 0.9

pair ids: [13, 46, 52, 148, 177, 227, 956, 969, 971, 1950]

answers: [ 1. 2. 1. 0. 0. 0. 0. 0. 0. 0.]

predicted: [ 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]

This output shows that 10 sentence pairs were used for training purposes. When predictingthe same sentences pairs using the training model, a 90% mean accuracy is achieved. Thisis done to verify that the examples are learned correctly. The list of numbers after pairs arethe ID’s in the RTE1 dataset. The list of numbers after answers describes the entailmentvalue of the dataset. 0 is no, 1 is yes and 2 is unknown.

In this case, the second predicted value is not the same because pair 13 and 46 share thesame feature vectors [0, 0, 0, 0, 0] and that causes a mismatch between the predicted valueand the answer. Pair 13 has value 1 and pair 46 has value 2.

=== id=1984 entails=0 length=None task=PP ===

T: Those accounts were not officially confirmed by the Uzbek or American governments.

H: The Uzbek or American governments confirmed those accounts.

alignment: 4.0

20

Page 21: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

not

features: [ 0. 0. 1. 1. 0.]

=== id=1981 entails=0 length=None task=PP ===

T: The bombers had not managed to enter the embassy compounds.

H: The bombers entered the embassy compounds.

alignment: 4.00000032203

not

features: [ 1. 0. 0. 1. 0.]

These two sentence pair both contain negations. The first sentence pair does not get a valueof 1 at the alignment feature because the score is not larger than 4.

=== TEST DATASET ============================

number of pairs in test: 16

logreg score on test: 0.625

pair ids: [1370, 2167, 2019, 934, 1847, 1990, 1984, 1421, 1445, 1981, 1960, 2088,

1044, 986, 1078, 1077]

answers: [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]

predicted: [ 0. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 1. 1. 0. 0. 0.]

It can be seen that not all pairs are correct. We get an accuracy of 62,5%. All of the incorrectpairs have the same feature vector [0, 0, 0, 0, 0] and thus automatically get entailment 1. Morefeatures need to be implemented to get a higher accuracy.

When running the program using the sentence pairs found in the BBC and Pravda weget:

Mr joulstra said Pro-Russian rebels were in charge of the area from where

the missile that hit MH17 had been fired

Group representatives confirmed that the plane was shot down from the territory

controlled by the official Kiev

alignment: 2.31372259356

features: [ 0. 0. 0. 0. 0.]

predicted: [ 1.]

Unfortunately the program detects entailment in this sentence pair while there should beno entailment. The problem of the low alignment score is that ‘MH17’ and ‘plane’ are notseen as synonyms. This is considered world knowledge and therefore it should manually putin a database. The synonyms found were ‘fired’ - ‘shot down’ and ‘area’ - ‘territory’.

21

Page 22: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

22

Page 23: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

CHAPTER 5

Conclusion and discussion

Contradiction detection is hard due to all the differences in contradictions and the difficultyof language in general. This thesis focused on recognizing antonyms and negations usingfour phases. The first phase uses spaCy as a dependency parser. The second phase alignedto two dependency graphs obtained from spaCy. The third phase, filtering non co-referentsentences, was not implemented because this experiment does not focus on contradictionscontaining world knowledge. In the last phase, logistic regression is applied using five fea-tures. These features are based on antonyms, negations, alignment score and switchingbetween objects and subjects.

In the experiments, we achieved an accuracy of 90% on the dev set and 62.5% on thetest set. Antonyms and negations are detected but when all of the features have value 0, thepredictions are wrong. When using the program to detect entailment in the sentence pairsfound regarding MH17, the result is unfortunately not accurate. A database containingworld knowledge should be used. For example, ‘MH17’ and ‘plane’ should be seen as asynonym.

In the future, this program could be extended with more features to detect a wider rangeof contradictions. Detecting numerical differences could be achieved by adding words like‘no’, ‘some’, ‘many’, ‘most’, and ‘all’ to a database containing numbers. If implementedcorrectly it should identify ‘more than 500’ as technically equal to ‘700’.

To detect contradictions containing world knowledge another database should be incor-porated. This database should answer queries related to specific events. For example, ‘bornin’ has to be related to one specific place since a person can only be born in one place.

Sentence pairs containing geographical places should make use of holonymys. These arewords that have a semantic relationship with other words, for example ‘House’ is a holonomyof ‘door’ and ‘window’ and ‘capital of the Netherlands’ is a holonomy for ‘Amsterdam’. Thisdatabase should therefore be expanded with places and their relationship to other places.

When making these kind of databases, containing information about numbers, specificfunctions and holonyms, one should be able to detect a larger amount of contradictions.

23

Page 24: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

24

Page 25: Contradictiondetectionbetween news articles · When the MH17 plane disaster occurred two years ago, the Western media immediately pointed their ngers at the Russians. The Russians

Bibliography

[1] Daniel Cer. Aligning semantic graphs for textual inference and machine reading.

[2] Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan Roth. Recognizing textual entail-ment: Rational, evaluation and approaches–erratum. Natural Language Engineering,16(01):105–105, 2010.

[3] Marie-Catherine De Marneffe, Anna N Rafferty, and Christopher D Manning. Findingcontradictions in text. In ACL, volume 8, pages 1039–1047, 2008.

[4] Ravindra Koggalage and Saman Halgamuge. Reducing the number of training samplesfor fast support vector machine classification. Neural Information Processing-Lettersand Reviews, 2(3):57–65, 2004.

[5] Omer Levy and Yoav Goldberg. Dependency-based word embeddings. In ACL (2),pages 302–308, 2014.

[6] Bill MacCartney, Trond Grenager, Marie-Catherine de Marneffe, Daniel Cer, andChristopher D Manning. Learning to recognize features of valid textual entailments. InProceedings of the main conference on Human Language Technology Conference of theNorth American Chapter of the Association of Computational Linguistics, pages 41–48.Association for Computational Linguistics, 2006.

[7] Alan Ritter, Doug Downey, Stephen Soderland, and Oren Etzioni. It’s a contradiction—no, it’s not: a case study using functional relations. In Proceedings of the Conferenceon Empirical Methods in Natural Language Processing, pages 11–20. Association forComputational Linguistics, 2008.

[8] spaCy. spacy css demo, 2015. [Online; accessed 15-May-2016].

[9] spaCy. spacy documentation, 2015. [Online; accessed 15-May-2016].

[10] universaldependencies. Universal pos tags, 2014. [Online; accessed 26-May-2016].

25