What the brain can tell us about language: evidence from machine learning, WordNet...

46
What the brain can tell us about language: evidence from machine learning, WordNet and latent semantic analysis Colleen Crangle 2 Marcos Perreau-Guimaraes 1 , Patrick Suppes 1 1 Center for the Study of Language and Information, Stanford University, California, USA 2 US-UK Fulbright Fellow 2013, Department of Computing and Communications, Lancaster University, UK Converspeech LLC, Palo Alto, California, USA Earlier version presented at the 18th Annual Cognitive Neuroscience Meeting, April 2-5, 2011, San Francisco, California 1 © Colleen E Crangle 2013

Transcript of What the brain can tell us about language: evidence from machine learning, WordNet...

Page 1: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

What the brain can tell us about

language evidence from machine

learning WordNet and latent

semantic analysis

Colleen Crangle2 Marcos Perreau-Guimaraes1 Patrick Suppes1

1Center for the Study of Language and Information Stanford

University California USA

2US-UK Fulbright Fellow 2013 Department of Computing and Communications Lancaster University UK

Converspeech LLC Palo Alto California USA

Earlier version presented at the 18th Annual Cognitive Neuroscience Meeting April 2-5 2011 San Francisco California

1 copy Colleen E Crangle 2013

Electroencephalography (EEG) measures the electric potential generated by the synchronous activity of thousands or millions of neurons that have similar spatial orientation allowing brain wave activity to be detected EEG has very high temporal resolution on the order of milliseconds with a sampling rate of 500 to 1000 Hz common in neuro-cognitive studies ndash that is 500 to 1000 readings a second

Fifteen seconds of EEG data From httpsccnucsdedu

15-22-128 channels of information 2 copy Colleen E Crangle 2013

Simple Experiment

one two three four five six seven eight nine ten left right yes no

Repeated visually or auditorily many times while EEG recordings are

madehellip

Data

Extract segments of brain waveforms corresponding to the time at which each individual word was heard (or

read)

Question

Can we tell which brain data sample is associated with which word

That is can we train a classifier to correctly classify the brain data

samples

Fifteen seconds of EEG data From httpsccnucsdedu

3 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict

which wordsentenceword within a sentence the participant

is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

4 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it

to make predictions for other participants

1999 mdashInvariance between subjects of brain wave

representations of language

5 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other

participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words

and use that classifier to make predictions for words

presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best

independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with

independent component analysis

2007 mdashSingle-trial classification of MEG recordings

6 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 2: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Electroencephalography (EEG) measures the electric potential generated by the synchronous activity of thousands or millions of neurons that have similar spatial orientation allowing brain wave activity to be detected EEG has very high temporal resolution on the order of milliseconds with a sampling rate of 500 to 1000 Hz common in neuro-cognitive studies ndash that is 500 to 1000 readings a second

Fifteen seconds of EEG data From httpsccnucsdedu

15-22-128 channels of information 2 copy Colleen E Crangle 2013

Simple Experiment

one two three four five six seven eight nine ten left right yes no

Repeated visually or auditorily many times while EEG recordings are

madehellip

Data

Extract segments of brain waveforms corresponding to the time at which each individual word was heard (or

read)

Question

Can we tell which brain data sample is associated with which word

That is can we train a classifier to correctly classify the brain data

samples

Fifteen seconds of EEG data From httpsccnucsdedu

3 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict

which wordsentenceword within a sentence the participant

is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

4 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it

to make predictions for other participants

1999 mdashInvariance between subjects of brain wave

representations of language

5 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other

participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words

and use that classifier to make predictions for words

presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best

independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with

independent component analysis

2007 mdashSingle-trial classification of MEG recordings

6 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 3: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Simple Experiment

one two three four five six seven eight nine ten left right yes no

Repeated visually or auditorily many times while EEG recordings are

madehellip

Data

Extract segments of brain waveforms corresponding to the time at which each individual word was heard (or

read)

Question

Can we tell which brain data sample is associated with which word

That is can we train a classifier to correctly classify the brain data

samples

Fifteen seconds of EEG data From httpsccnucsdedu

3 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict

which wordsentenceword within a sentence the participant

is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

4 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it

to make predictions for other participants

1999 mdashInvariance between subjects of brain wave

representations of language

5 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other

participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words

and use that classifier to make predictions for words

presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best

independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with

independent component analysis

2007 mdashSingle-trial classification of MEG recordings

6 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 4: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Can we train a classifier of EEG data so that we can predict

which wordsentenceword within a sentence the participant

is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

4 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it

to make predictions for other participants

1999 mdashInvariance between subjects of brain wave

representations of language

5 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other

participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words

and use that classifier to make predictions for words

presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best

independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with

independent component analysis

2007 mdashSingle-trial classification of MEG recordings

6 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 5: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it

to make predictions for other participants

1999 mdashInvariance between subjects of brain wave

representations of language

5 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other

participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words

and use that classifier to make predictions for words

presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best

independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with

independent component analysis

2007 mdashSingle-trial classification of MEG recordings

6 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 6: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other

participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words

and use that classifier to make predictions for words

presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best

independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with

independent component analysis

2007 mdashSingle-trial classification of MEG recordings

6 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 7: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that

classifier to make predictions for pictures depicting what

the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple

visual images and their names

7 copy Colleen E Crangle 2013

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 8: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Can we train a classifier of EEG data so that we can predict which wordsentenceword within a

sentence the participant is seeing or hearing

1997 mdashBrain-wave recognition of words

1998 mdashBrain-wave recognition of sentences

Can we train the classifier on some participants and use it to make predictions for other participants

1999 mdashInvariance between subjects of brain wave representations of language

Can we train the classifier using visually presented words and use that classifier to make predictions

for words presented auditorily (and vice versa)

2004 mdashClassification of individual trials based on the best independent component of EEG-recorded sentences

2005 mdashRecognition of Words from the EEG Laplacian

2006 mdashMultichannel classification of single EEG trials with independent component analysis

2007 mdashSingle-trial classification of MEG recordings

Can we train the classifier using words and use that classifier to make predictions for pictures

depicting what the word refers to (and vice versa)

1999 mdashInvariance of brain-wave representations of simple visual images and their names

Answering yes to these questions establishes that we are

recognizing the meaning of the word or the idea or concept

behind the word and not for instance the sound of the

word or its orthography or its idiosyncratic use by one

person

8 copy Colleen E Crangle 2013

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 9: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

EXPERIMENTAL SETUP for recognizing words within

sentences

A computer presented 48 spoken sentences to each of 9 participants in 10 randomized blocks with all 48 sentences in each block So there were 480 trials for each participant

The sentences were about the geography of Europe Half were true half false half positive half negative

The capital of Italy is Paris F

Paris is not east of Berlin T

Spain is west of Russia T

Participants were asked to determine the truth or falsity of each statement while EEG recordings were made

9 copy Colleen E Crangle 2013

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 10: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

X ϵ Berlin London Moscow Paris Rome Warsaw Madrid Vienna Athens)

Y ϵ France Germany Italy Poland Russia Austria Greece Spain)

W ϵ the capital the largest city

Z ϵ north south east west

W of Y is [not] X X is [not] W of Y

X is [not] Z of X Y is [not] Z of Y

10 copy Colleen E Crangle 2013

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 11: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

From httpwww2leacuk

Paris is north of Berlin

The capital of Germany is Berlin

Spain is east of France

London is not the capital of France

Vienna is east of Moscow

The largest city of Poland is Athens

Fifteen seconds of EEG data From httpsccnucsdedu

LANGUAGE BRAIN

11 copy Colleen E Crangle 2013

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 12: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

What our work doeshellip

Compares the brain data and the language data and finds structural similarities between them

HOW DO WE REPRESENT THE EEG DATA

HOW DO WE REPRESENT THE LANGUAGE DATA

HOW DO WE COMPARE THE TWO

Berlin London Moscow Paris

Rome Warsaw Madrid Vienna

Athens France Germany Italy

Poland Russia Austria Greece

Spain north south east west

12 copy Colleen E Crangle 2013

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 13: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Paris is north of Berlin

Spain is east of

France

The capital of

Germany is

Berlin

13 copy Colleen E Crangle 2013

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 14: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Paris

east of

Germany

14 copy Colleen E Crangle 2013

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 15: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Machine learning approach to the study of

brain and language

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

We used a 5-fold linear discriminant model with principal component analysis for blind source separation to classify the segments of EEG data obtained from the individual trials

15 copy Colleen E Crangle 2013

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 16: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Beyond machine learning hellip

For the10 geography words London Moscow Paris north south east west Germany Poland Russia we have 640 EEG data samples for each participant

We want to classify these 640 samples into the correct 10 classes

640 EEG samples for each participant

Use 580 brain samples and their associated words to train the classifier

Test the remaining 60 samples

Do this many times each time using a different set of training samples

Find the average classification rate make sure it is statistically significant

Obtained classification rates in the range 25 to 29 with a mean classification rate of around 245 p lt 10Eminus10 Significantly higher than chance (10)

THEN

look at the MIS-CLASSIFICATIONS and build a

CONFUSION MATRIX

16 copy Colleen E Crangle 2013

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 17: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Classify 640 brain data samples s1 s2 hellip s640 into 10 classes ω1 ω2

hellip ω10 of the finite set A

M = (miq) is the confusion matrix for a given classification where miq

is the number of test samples from class ωi classified as belonging to

class ωq

London Moscow Paris north south east west Germany Poland Russia

London 8 14 11 7 6 3 3 10 8 10

Moscow 8 24 14 6 2 6 4 4 7 15

Paris 6 18 12 4 3 5 8 6 11 7

north 4 2 5 11 9 7 10 1 8 3

south 1 4 3 14 14 9 11 4 7 3

east 4 3 5 9 12 12 7 1 4 3

west 4 3 2 12 13 11 10 2 8 5

Germany 2 2 3 2 2 1 0 9 11 8

Poland 2 3 4 0 4 1 4 9 9 4

Russia 7 7 5 1 5 1 1 8 4 11

17 copy Colleen E Crangle 2013

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 18: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

The relative frequencies miq are an N-by-N estimate for the

conditional probability densities minus designated by the matrix P = (piq) minus

that a randomly chosen test sample from class ωi will be classified as

belonging to class ωq

119846119842119850119850

Conditional probability density estimates from the classification of brain wave data for London Moscow Paris north south east west Germany Poland Russia

ωi

ωq ndash predicted classified as

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

18 copy Colleen E Crangle 2013

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 19: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

02 03 04 05 06

London

Paris

Moscow

Germany

Poland

Russia

north

south

west

east

Hierarchical cluster tree (similarity tree) computed from the conditional probability

density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

19 copy Colleen E Crangle 2013

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 20: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

WordNet a lexical database of English organized around sets of cognitive synonyms restricted to the senses that are relevant to the geography sentences Latent Semantic Analysis (LSA) a statistical method to extract measures of word similarity from selected sets of documents such as novels newspaper articles textbooks

LANGUAGE BRAIN

London Moscow Paris north south east west Germany Poland Russia

London 025 0163 0138 0075 0038 005 0063 0088 0075 0063

Moscow 0144 0333 0133 0033 0011 0056 0056 0089 0067 0078

Paris 0175 0188 0125 0038 0038 0038 005 01 0138 0113

north 0067 0017 0017 0283 0167 015 01 01 0067 0033

south 0014 0014 0071 0114 0271 0171 0157 0029 0086 0071

east 005 0 0 0133 015 0383 015 005 0 0083

west 0057 0043 0 0171 0114 0143 0229 0086 01 0057

Germany 0075 0175 01 005 0125 0025 005 0275 0025 01

Poland 015 01 0125 005 005 0 0025 015 025 01

Russia 016 004 008 004 01 002 012 01 008 026

20 copy Colleen E Crangle 2013

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 21: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

WordNet is a human-annotated lexical database of English in which nouns

verbs adjectives and adverbs are grouped into sets of cognitive synonyms

(called synsets) each synset expressing a distinct concept

See Miller (1995) Fellbaum (1997) and httpwordnetprincetonedu

The synsets are related to each other primarily through the hypernymy and

hyponymy relations for nouns Other relations in WordNet are part-whole

(holonym) member of (meronym) has instance and so on

Hyponymy often referred to as the isndasha relation in computational discussions --

is defined as follows a concept represented by a lexical item Li is said to be a

hyponym of the concept represented by a lexical item Lk if native speakers of

English accept sentences of the form An Li is a kind of Lk Conversely Lk is the

hypernym of Li A hypernym is therefore a more general concept and a hyponym

a more specific concept

WORDNET

21 copy Colleen E Crangle 2013

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 22: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

WORDNET

(n) Paris1 City of Light1 French capital1 capital of France1 (the capital and largest city of France and international center of culture and commerce)

(n) Paris2 genus Paris1 (sometimes placed in subfamily Trilliaceae)

(n) Paris3 ((Greek mythology) the prince of Troy who abducted Helen from her husband Menelaus and provoked the Trojan War)

(n) Paris4 (a town in northeastern Texas)

Multiple word senses

Organized into sets of cognitive synonyms called ldquosynsetsrdquo

(n) city1 metropolis1 urban center1 (a large and densely populated urban area may include several independent administrative districts) direct hyponym full hyponym part meronym has instance (n) city2 (an incorporated administrative district established by state charter) (n) city3 metropolis2 (people living in a large densely populated municipality)

(n) east1 due east1 eastward1 E3 (the cardinal compass point that is at 90 degrees) (n) East2 Orient1 (the countries of Asia) (n) East3 eastern United States1 (the region of the United States lying to the north of the Ohio River and to the east of the Mississippi River) (n) east4 (the direction corresponding to the eastward cardinal compass point) (n) east5 (a location in the eastern part of a country region or city)

22 copy Colleen E Crangle 2013

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 23: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

LATENT SEMANTIC ANALYSIS (LSA)

LSA is a statistical technique for extracting from large collections of documents a measure of how similar two words are to each other in terms of patterns of their co-occurrences within those documents See Deerwester et al 1990 Landauer and Dumais 1997 Landauer et al 1998

The underlying idea is that if for each word you take into account all the contexts in which it does and does not appear

you get for all the words a set of mutual constraints that represent how similar any two words are to each other

The similarity judgments produced by latent semantic analysis have been shown to correspond to some extent to human judgments of similarity After training on about 2000 pages of English text it scored as well as average test-takers on the synonym portion of the Test of English as a Foreign Language After training on a psychology textbook it achieved a passing score on a multiple-choice exam We used the application at httplsacoloradoedu to compute similarity matrices in term space for our set of words The computation was based on ~ 38000 college-level texts (novels newspaper articleshellip) A maximum of 300 factors was permitted in the analysis

23 copy Colleen E Crangle 2013

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 24: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

London Moscow Paris north south east west Germany Poland Russia

London 1 017 037 013 012 012 014 017 016 016

Moscow 017 1 018 013 008 022 014 037 065 069

Paris 037 018 1 01 004 008 009 05 031 036

north 013 013 01 1 089 06 061 007 009 014

south 012 008 004 089 1 05 055 002 005 005

east 012 022 008 06 05 1 085 022 029 03

west 014 014 009 061 055 085 1 024 026 023

Germany 017 037 05 007 002 022 024 1 085 081

Poland 016 065 031 009 005 029 026 085 1 087

Russia 016 069 036 014 005 03 023 081 087 1

Semantic similarity matrix derived from LSA for the set of words

London Moscow Paris north south east west Germany

Poland Russia

24 copy Colleen E Crangle 2013

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 25: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

01 02 03 04 05 06

east

west

north

south

London

Paris

Moscow

Germany

Poland

Russia

Hierarchical cluster tree computed from the pair-wise Latent Semantic

Analysis (LSA) scores of similarity for London Moscow Paris north

south east west Germany Poland Russia based on ~ 38000 college-

level texts (novels newspaper articleshellip)

25 copy Colleen E Crangle 2013

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 26: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

LSA provides straightforward measure of similarity between words

For WORDNET several different measures of similarity have been devised Eg Path length between synsets Information content a corpusndashbased measure of the specificity of a concept measured in terms of the frequency of occurrence of the concept in the corpus the human-annotated sensendashtagged corpus SemCor (Miller et al 1993) which links every word in the Brown Corpus to its appropriate WordNet sense Scaled various ways Vector-space models -- works by forming second-order co-occurrence vectors from the WordNet definitionsof concepts known as glosses We used five measures in our computations of similarity and took the average score using each of the relevant senses

26 copy Colleen E Crangle 2013

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 27: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Semantic similarity matrix derived from WordNet for the set of

words London Moscow Paris north south east west Germany

Poland Russia using senses relevant to the geography of Europe

and five measures of similarity wup (path length) lin and jcn

(information content) and gv and pgv (vector space measures)

London Moscow Paris north south east west Germany Poland Russia

London 1 0396 0466 0106 0103 0076 0078 0322 0299 0303

Moscow 0396 1 0393 0095 0094 0062 007 0286 0281 0288

Paris 0466 0393 1 0106 0104 0074 0077 0327 0308 0307

north 0106 0095 0106 1 0228 0179 021 0123 0132 0111

south 0103 0094 0104 0228 1 0172 0212 0115 0107 0109

east 0076 0062 0074 0179 0172 1 0216 0093 008 0077

west 0078 007 0077 021 0212 0216 1 0087 0082 0083

Germany 0322 0286 0327 0123 0115 0093 0087 1 0589 0409

Poland 0299 0281 0308 0132 0107 008 0082 0589 1 0403

Russia 0303 0288 0307 0111 0109 0077 0083 0409 0403 1

27 copy Colleen E Crangle 2013

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 28: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

02 03 04 05 06

Germany

Poland

Russia

London

Paris

Moscow

north

south

west

east

Hierarchical cluster tree computed from pairwise WordNet-based semantic

similarity scores for London Moscow Paris north south east west

Germany Poland Russia restricted to senses related to the geography of

Europe

28 copy Colleen E Crangle 2013

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 29: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

BACK TO THE BRAIN hellip

London Moscow Paris north south

east west Germany Poland Russia

29 copy Colleen E Crangle 2013

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 30: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Now letrsquos see how to compare the EEG data

and the language datahellip

30 copy Colleen E Crangle 2013

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 31: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Some of the similarity trees show remarkable congruence

between the brain and semantic data

Where exactly does that congruence lie

Can we devise a quantitative measure of the nature and

strength of that congruence

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

02 03 04 05 06

Germany Poland Russia London Paris Moscow north south west east

LANGUAGE DATA

BRAIN DATA

31 copy Colleen E Crangle 2013

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 32: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

WordNet-based semantic similarities and EEG conditional probability

estimates for London relative to London (L) Moscow (M) Paris (P) north

(n) south (s) east (e) west (w) Germany (G) Poland (Po) and Russia reg

The Spearman rank correlation for the two sequences in the figure is 099 with one-sided significance of 184E-10

32 copy Colleen E Crangle 2013

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 33: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

For each word ω we compute from the conditional probability

density estimates a ternary relation R such that R( ω ω1 ω2 ) if

and only if with respect to word ω the conditional probability for word

ω1 is smaller than the conditional probability for word ω2 that is if and

only if ω1s similarity difference with ω is smaller than ω2s similarity

difference with ω

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

BRAIN DATA

33 copy Colleen E Crangle 2013

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 34: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

For each word ω we compute from the semantic similarity matrix

a ternary relation R such that R ( ω ω1 ω2 ) if and only the

similarity difference of ω1 with ω is smaller than the similarity

difference of ω2 with ω that is ω1 is more similar to ω than is ω2

R is an ordinal relation of similarity differences a partial order

that is irreflexive asymmetric and transitive

LANGUAGE DATA

34 copy Colleen E Crangle 2013

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 35: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

London

Language data Brain data London 1000 London 0275 Paris 0466 Paris 0133 Moscow 0396 Moscow 0108 Germany 0322 Germany 0075 Russia 0303 north 0042 Poland 0299 Russia 0033 north 0106 Poland 0025 south 0103 east 0008 west 0078 south 0000

east 0076 west 0000

Partial orders for London derived from the WordNet

semantic similarities of Table 2 and the conditional

probability estimates for the brain data of Table 5

Poland

north

south

west

east

north

Poland

east

south

west

35 copy Colleen E Crangle 2013

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 36: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Following the approach described in Suppes (1974) for the axiomatization of

the theory of differences in utility preference or the theory of differences in

psychological intensity Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et

al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in

Pure Mathematics 25 Providence RI American Mathematical Society pp 465-

479

The relational structure (A R)

constructed from R and the finite

set A of classes ω1 ω2 hellip ωN

together with the N partial orders

constructed from the N-by-N

estimate for the conditional

probability densities

The relational structure (A R) constructed from R and the finite set A of classes ω1 ω2 hellip ωN together with the N partial orders constructed from the N-by-N similarity matrix

Brain data

Language data

For each ω1 we compare the partial order of the brain data with the partial order of the language data using Spearmanrsquos rank correlation coefficient which we interpret in the usual way to determine if we have a statistically significant correlation or not

36 copy Colleen E Crangle 2013

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 37: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

London Moscow

Paris

north

south east

west

Germany Poland

Russia

Significant Invariance - Paris - Spearman 088 (p=66795e-004)

London Moscow Paris

north

south east

west

Germany Poland Russia

Significant Invariance - Paris - Spearman 090 (p=38716e-004)

For those instances in which the brain

and language partial orders are

significantly correlated we find the

partial order that is invariant with

respect to the brain and language data

Here are two more examples

37 copy Colleen E Crangle 2013

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 38: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

02 03 04 05 06

London Paris Moscow Germany Poland Russia north south west east

Another hierarchical cluster tree (similarity tree) computed from the conditional probability density estimates for the classification of 640 brain wave samples for London Moscow Paris north south east west Germany Poland Russia

Every single-trial classification produces its own conditional probability density estimates giving rise to its own similarity tree hellip

01 02 03 04 05

north east south west London Paris Germany Moscow Russia Poland

Brain data

38 copy Colleen E Crangle 2013

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 39: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

1 We compute M (=30) single-trial classifications of the data (640 data samples for our 10 words) using random

resampling with replacement

2 For each classification we find for each word the partial orders of the brain and language data that are significantly

correlated

3 For each of these highly correlated partial order pairs we find the partial order invariant with respect to both

We performed 60 classifications ndash that is we recomputed the classifications of the brain data using random resampling with

replacement

For half of these 60 classifications we compared the brain data to the

WordNet data

and for the other half we compared the brain data to the

LSA data

And we plotted the results

39 copy Colleen E Crangle 2013

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 40: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

0

50

100

150

200

250

300

S10 S24 S25 S27 S16 S26 S12 S13 S18

WordNet

LSA

Partial Orders of Similarity Differences Invariant between EEG-

recorded Brain Data and Semantic Representations of Language

Significant

Invariant

Partial

Orders

Test Participants

40 copy Colleen E Crangle 2013

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 41: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

A higher number of significant structural similarities was

found between the brain data and the WordNet data than

the LSA data

This stronger structural similarity between the

brain data and the WordNet-derived data

supports the contention that during language

comprehension for the complex cognitive task

of assessing truth or falsity the representation

of words in the brain has a WordNet-like quality

This stronger structural similarity is the first

evidence we have seen of WordNet-like

representations in the brain

41 copy Colleen E Crangle 2013

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 42: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Banerjee S and Pedersen T (2003) Extended Gloss Overlaps as a Measure of Semantic Relatedness In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence pp 805-810 August 9-15 2003 Acapulco Mexico Banerjee S and Pedersen T (2002) An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics pp 136-145 February 17-23 2002 Mexico City Baroni M B Murphy E Barbu M Poesio (2010) Strudel A Corpus-Based Semantic Model Based on Properties and Types Cognitive Science - COGSCI vol 34 no 2 pp 222-254 2010 DOI 101111j1551-6709200901068x Blei D Ng A amp Jordan M (2003) Latent Dirichlet allocation Journal of Machine Learning Research 3 993ndash1022 Brants T and Franz A (2006) wwwldcupenneduCatalogCatalogEntryjspcatalogId=LDC2006T13 Linguistic Data Consortium Philadelphia Budanitsky A Hirst G (2001) Semantic distance in WordNet an experimental application-oriented evaluation of five measures In Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources Pittsburgh June 2001 httpwwwseassmuedu~radamwnwpapersWNW-NAACL-101pdfgz Collins AM Loftus EF (1975) A spreading-activation theory of semantic processing Psychological Review 1975 Nov Vol 82(6) 407-428 Crestani F (1997) Application of Spreading Activation Techniques in Information Retrieval Artificial Intelligence Review 11 453ndash482 Deerwester S Dumais ST Furnas GW Landauer TK Harshman R (1990) Indexing By Latent Semantic Analysis Journal of the American Society for Information Science 41 391-407 Devereux B C Kelly A Korhonen (2010) Using fMRI activation to conceptual stimuli to evaluate methods for extracting conceptual representations from corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 70mdash78 Fellbaum C (Ed) (1998) WordNet An Electronic Lexical Database Cambridge MA MIT Press Hauk O Davis MH Ford M Pulvermuumlller F Marslen-Wilson WD (2006) The time course of visual word recognition as revealed by linear regression analysis of ERP data Neuroimage 2006 May 130(4)1383-400 Epub 2006 Feb 7

42 copy Colleen E Crangle 2013

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 43: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Hirst G (1988) Resolving lexical ambiguity computationally with spreading activation and Polaroid Words In Small S L Cottrell G W Tanenhaus M K (editors) Lexical ambiguity resolution Perspectives from psycholinguistics neuropsychology and artificial intelligence San Mateo CA Morgan Kaufmann Publishers 1988 73ndash107 Jelodar A B M Alizadeh S Khadivi (2010) WordNet Based Features for Predicting Brain Activity associated with meanings of noun Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 18mdash26 Jiang J and Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy In Proceedings of International Conference on Research in Computational Linguistics Taiwan 19-33 Just MA Cherkassky VL Aryal S Mitchell TM (2010) A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes PLoS ONE 5(1) e8622 doi101371journalpone0008622 Karp RM (1972) Reducibility among combinatorial problems In Complexity of Computer Computations Proc Sympos IBM Thomas J Watson Res Center Yorktown Heights NY (Ed R E Miller and J W Thatcher) New York Plenum pp 85-103 Kelly C B Devereux A Korhonen (2010) Acquiring Human-like Feature-Based Conceptual Representations from Corpora Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 61mdash69 Kriegeskorte N Mur M Bandettini PA (2008) Representational similarity analysis ndash connecting the branches of systems neuroscience Frontiers in Systems Neuroscience doi103389neuro060042008 Kutas M Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association Nature 307 161-163 Kutas M Federmeier KD (2011) Thirty years and counting Finding meaning in the N400 component of the event related brain potential (ERP) Annual Review of Psychology 2011 62 pp 621-647 Landauer TK Dumais ST (1997) A solution to Platos problem The Latent Semantic Analysis theory of the acquisition induction and representation of knowledge Psychological Review 104 211-240 Landauer TK Foltz PW Laham D (1998) Introduction to Latent Semantic Analysis Discourse Processes 25 259-284

43 copy Colleen E Crangle 2013

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 44: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Leacock C Chodorow M (1998) Combining local context and WordNet similarity for word sense identification In Fellbaum (Ed) WordNet An Electronic Lexical Database Cambridge MA MIT Press pp 265-283 Leech G Rayson P Wilson A (2001) Word Frequencies in Written and Spoken English based on the British National Corpus Longman London Lin D (1998) An information-theoretic definition of similarity In Proc 15th International Conf on Machine Learning Morgan Kaufmann San Francisco CA p 296--304 Luce RD (1956) Semiorders and a theory of utility discrimination Econometrica 24 178ndash191 MR0078632 httpwwwjstororgstable1905751 Matsunaga T Yonemori C Tomita E Muramatsu M (2009) Clique-based data mining for related genes in a biomedical database BMC Bioinformatics 2009 Jul 110205 Martin Ph (2003) Correction and Extension of WordNet 17 ICCS 2003 11th International Conference on Conceptual Structures (copy Springer Verlag LNAI 2746 pp 160-173) Dresden Germany July 21-25 2003 Miller GA Nicely P (1955) An analysis of perceptual confusions among some English consonants JAcoustSocAm 272 Miller GA Leacock C Tengi R Bunker RT (1993) A Semantic Concordance In Proceedings of the 3 DARPA Workshop on Human Language Technology Miller GA (1995) WordNet A Lexical Database for English Communications of the ACM Vol 38 No 11 39-41 Mitchell TM SV Shinkareva A Carlson KM Chang VL Malave R A Mason and M A Just (2008) Predicting Human Brain Activity Associated with the Meanings of Nouns Science 320 1191 May 30 2008 DOI 101126science1152876 Murphy B Baroni M amp Poesio M (2009) EEG responds to conceptual stimuli and corpus semantics In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing pages 619ndash627 Singapore 6-7 August 2009 c 2009 ACL and AFNLP Murphy B amp Poesio M (2010) Detecting semantic category in simultaneous EEGMEG recordings In First workshop on computational neurolinguistics NAACL HLT 2010 (pp 36ndash44) Los Angeles Association for Computational Linguistics Murphy B Poesio M Bovolo F Bruzzone L Dalponte M Lakany H (2011) EEG decoding of semantic category reveals distributed representations for single concepts Brain Lang 2011 Apr117(1)12-22 doi 101016jbandl201009013

44 copy Colleen E Crangle 2013

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 45: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Pedersen T Patwardhan S Michelizzi J (2004) WordNetSimilarity - Measuring the Relatedness of Concepts In Proceedings of Fifth Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2004) pp 38-41 Boston May 2004 Pereira F M Botvinick G Detre (2010) Learning semantic features for fMRI data from definitional text Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics June 2010Los Angeles USA Association for Computational Linguistics 1mdash9 httpwwwaclweborganthologyW10-0601 Perreau-Guimaraes M Wong DK Uy ET Grosenick L Suppes P (2007) Single-trial classification of MEG recordings IEEE Transactions on Biomedical Engineering 54 436ndash443 Rabinovitch I (1977) The Scott-Suppes theorem on semiorders J Mathematical Psychology 15 (2) 209ndash212 MR0437404 Resnik P (1995) Using information content to evaluate semantic similarity In Proceedings of the 14th International Joint Conference on Artificial Intelligence pages 448-453 Montreal Roach BJ Mathalon DH (2008) Event-related EEG time-frequency analysis an overview of measures and an analysis of early gamma band phase locking in schizophrenia Schizophr Bull 2008 Sep34(5)907-26 Epub 2008 Aug 6 Scott D Suppes P (1958) Foundational aspects of theories of measurement The Journal of Symbolic Logic 23 113ndash128 MR0115919 Suppes P (1972) Axiomatic Set Theory New York Dover Suppes P (1974) The axiomatic method in the empirical sciences L Henkin et al (Eds) Proceedings of the Tarski Symposium Proceedings of Symposia in Pure Mathematics 25 Providence RI American Mathematical Society pp 465-479 Suppes P Lu Z-L Han B (1997) Brain wave recognition of words Proceedings of the National Academy of Sciences 95 14965ndash14969 Suppes P Han B Lu Z-L (1998) Brain-wave recognition of sentences Proceedings of the National Academy of Sciences 95 15861ndash15866 Suppes P Han B Epelboim J Lu Z-L (1999a) Invariance between subjects of brain wave representations of language Proceedings of the National Academy of Sciences 96 12953ndash12958 Suppes P Han B Epelboim J Lu Z-L (1999b) Invariance of brain-wave representations of simple visual images and their names Proceedings of the National Academy of Sciences 96 14658-14663 Suppes P Han B (2000) Brain-wave representation of words by superposition of a few sine waves Proceedings of the National Academy of Sciences 97 8738ndash8743 Suppes P Perreau-Guimaraes M Wong DK (2009) Partial Orders of Similarity Differences Invariant Between EEG-Recorded Brain and Perceptual Representations of Language Neural Computation 21 3228ndash3269

45 copy Colleen E Crangle 2013

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013

Page 46: What the brain can tell us about language: evidence from machine learning, WordNet …ucrel.lancs.ac.uk/crs/attachments/UCRELCRS-2013-01-24-Cr... · 2013-05-08 · What the brain

Suppes P de Barros JA Oas G (2012) Phase-Oscillator computations as neural models of stimulus-response conditioning and response selection Journal of Mathematical Psychology 56 95-117 Turney P (2001) Mining the Web for Synonyms PMI-IR versus LSA on TOEFL In L De Raedt amp P Flach (Eds) Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp 491-502) Freiburg Germany Vassilieva E Pinto G de Barros JA Suppes P (2011) Learning Pattern Recognition Through Quasi-Synchronization of Phase Oscillators IEEE Transactions on Neural Networks Vol 22 No 1 January 2011 Vigliocco G Warren J Siri S Arciuli J Scott S Wise R (2006) The role of semantics and grammatical class in the neural representation of words Cereb Cortex 2006 Dec16(12)1790-6 Epub 2006 Jan 18 Wong DK Perreau-Guimaraes M Uy ET Suppes P (2004) Classification of individual trials based on the best independent component of EEG-recorded sentences Neurocomputing 61 479-484 Wong DK Uy ET Perreau-Guimaraes M Yang W Suppes P (2006) Interpretation of perceptron weights as constructed time series for EEG classification Neurocomputing 70 373-383 Wong DK Grosenick L Uy ET Perreau-Guimaraes M Carvalhaes CG Desain P Suppes P (2008) Quantifying inter-subject agreement in brain-imaging analyses NeuroImage 39 1051ndash1063 Woods W (1975) Whatrsquos in a link Foundations for semantic networks In Bobrow D Collins A eds Representation and understanding New York Academic Press 197535-82 Wu Z Palmer M (1994) Verb Semantics and Lexical Selection In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics Las Cruces New Mexico pp 133--138

46 copy Colleen E Crangle 2013