Ekaterina Vylomova/Brown Bag seminar presentation

35

description

Associative thesari, Russian Associative Thesauri

Transcript of Ekaterina Vylomova/Brown Bag seminar presentation

Page 1: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Associative thesauri: structure and analysis

Brown bag seminar

Ekaterina Vylomova

Fulbright scholar at Montclair State University

February 21, 2014

E. Vylomova Associative thesauri

Page 2: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Brief bio

Brief Bio

2011: MSc, Bauman Moscow State Technical University

2009: BSc, Bauman Moscow State Technical University

2009: Yandex School of Data Analysis (Moscow Institute of

Physics & Technology)

E. Vylomova Associative thesauri

Page 3: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Brief bio

Brief Bio

2011: MSc, Bauman Moscow State Technical University

2009: BSc, Bauman Moscow State Technical University

2009: Yandex School of Data Analysis (Moscow Institute of

Physics & Technology)

E. Vylomova Associative thesauri

Page 4: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Brief bio

Brief Bio

2011: MSc, Bauman Moscow State Technical University

2009: BSc, Bauman Moscow State Technical University

2009: Yandex School of Data Analysis (Moscow Institute of

Physics & Technology)

E. Vylomova Associative thesauri

Page 5: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's AE?

Associative Experiments

What's AE?

Associative experiment is one of methods of psycholinguistics. It's

based on method of free associations.

Sir Francis Galton conducted the �rst experiment in 1879.

Types of AE

Single Free Association

Multiple Free Associations

Single Controlled Association (synonym, noun, verb, hyponym,

etc.)

Multiple Controlled Associations

E. Vylomova Associative thesauri

Page 6: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's AE?

Associative Experiments

What's AE?

Associative experiment is one of methods of psycholinguistics. It's

based on method of free associations.

Sir Francis Galton conducted the �rst experiment in 1879.

Types of AE

Single Free Association

Multiple Free Associations

Single Controlled Association (synonym, noun, verb, hyponym,

etc.)

Multiple Controlled Associations

E. Vylomova Associative thesauri

Page 7: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

What's associative thesaurus?

E. Vylomova Associative thesauri

Page 8: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

Example of data

EAT Word Associations

CAT stimulated the following associations:

DOG 49 0.52

MOUSE 8 0.08

BLACK 4 0.04

MAT 3 0.03

ANIMAL 2 0.02

EYES 2 0.02

GUT 2 0.02

KITTEN 2 0.02

E. Vylomova Associative thesauri

Page 9: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

AT for di�erent languages

English

The Structure of Associations in Language and Thought

(Deese, 1965)

Word association (Cramer, 1968)

An associative thesaurus of English and its computer analysis

(Kiss et al., 1973)

Word Association, rhyme and fragment norms (Nelson,

McEvoy & Schreiber, 1999)

E. Vylomova Associative thesauri

Page 10: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

AT for di�erent languages

Dutch

Word association norms with response times (De Groot, 1988)

Word associations: Norms for 1,424 Dutch words in a

continuous task (De Deyne & Storms, 2008)

Swedish

A Swedish Associative Thesaurus (Lonngren, 1998)

E. Vylomova Associative thesauri

Page 11: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

AT for di�erent languages

Japanese

Construction of associative concept dictionary with distance

information, and comparison with electronic concept dictionary

(Okamoto & Ishizaki, 2001)

Building a word association database for basic Japanese

vocabulary (Joyce, 2005)

Korean

Network analysis of Korean Word Associations(Jung et al.,

2010)

E. Vylomova Associative thesauri

Page 12: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

AT for di�erent languages

Czech

Volne slovni parove asociace v cestine (Novak, 1988)

Hebrew

Free association norms in the Hebrew language (Rubinsten,

2005)

E. Vylomova Associative thesauri

Page 13: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

What's associative thesaurus?Example of dataAT for di�erent languagesSlavic Associative Thesauri

Slavic Associative Thesauri

Dictionary of associative norms in Russian (Leontiev,1973)

Russian Associative Thesaurus (Karaulov et al.,2002)

Slavic Associative Thesaurus(Russian, Belorussian,Bulgarian,

Ukrainian) (U�mtseva et al., 2004)

Normas asociativas del espanol y del ruso(Sanchez

Puig,Karaulov,Cherkasova, 2000)

E. Vylomova Associative thesauri

Page 14: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DataResearch

Russian associative experiment description

Time frame: 1988-1998

Participants: 11,000 1st-3rd year students; 34 specialities

Stimuli: 6,624(initial list: 1,277)

Associative pairs:1,032,522 (di�erent - 462,500)

Reactions:102,926

Subset used for analysis

Stimuli: 6,577

Reactions:21,312

Associative pairs:102,516

Dataset

Set of triplets: < ci , rj ,wij >, where wij =freqij∑nj=1 freqij

.

E. Vylomova Associative thesauri

Page 15: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DataResearch

Comparison with frequency dictionary of Russian language

Frequency dictionary

Frequency dictionary of modern Russian language (Lyashevskaya,

Sharov, 2009).

Based on the texts from Russian National Corpus

(www.ruscorpora.ru) and includes information about 20,000 most

common words in Russian language.

RAT Lemmatisation

RAT->MyStem(Segalovich, 2003)->lemmas

E. Vylomova Associative thesauri

Page 16: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DataResearch

Comparison with frequency dictionary of Russian language

TOP-11 Nouns

RAT FreqDict

Human Year

Home, House Human

Money Time

Day Business

Friend Life

Home Day

Male Hand

Fool Work

Business Word

Life Place

Illness FriendE. Vylomova Associative thesauri

Page 17: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DataResearch

Comparison with frequency dictionary of Russian language

Semantic primes?

Concept "Human": "human "child "friend "male"

Concept "Time": "day "time"

Adjectives: "good "bad "big".

These concepts don't change over the time.

Positive correlation with semantic primes (Wierzbicka)

E. Vylomova Associative thesauri

Page 18: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DescriptionAssociative Network based on RAT'98Network analysis

Description

Nodes correspond to words(lemmas)

Edges correspond to associations

Edge's weight correspond to association strength

E. Vylomova Associative thesauri

Page 19: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DescriptionAssociative Network based on RAT'98Network analysis

Main characteristics of the network

Nodes: |V | = 23, 195, among them:

nodes with outgoing edges(stimuli): |S | = 1, 883

nodes with incoming edges(reactions): |R| = 16, 618

nodes with both types of edges: |SR| = 4, 694

Edges: |E | = 102, 516

E. Vylomova Associative thesauri

Page 20: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DescriptionAssociative Network based on RAT'98Network analysis

Table of network characteristics

Sign Description Directed Undirected

N Number of nodes 23,195 23,195

L average shortest path length 3.98 3.83

D Diameter 9 8

<k> Average node degree 4.42 8.83

ψ Degree distribution (P(k)) par. 2.2 1.85

Directed to undirected

w̃ij = w̃ji = wij + wji

Degree distribution function

P(k) ≈ k−ψ

E. Vylomova Associative thesauri

Page 21: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DescriptionAssociative Network based on RAT'98Network analysis

Small-world networks

De�nition

Introduced by Milgram, 1967 ("The small world problem")

L ∝ log(N),i.e. distance L between two randomly chosen nodes

grows proportionally to the logarithm of the number of nodes N in

the network

Also known as "Six degrees of separation"

Examples

World Wide Web (WWW; Adamic, 1999; Albert, Jeong, &

Barabasi, 1999), networks of scienti�c collaboration (Newman,

2001),metabolic networks in biology (Jeong, Tombor, Albert,

Oltval, & Barabasi, 2000)

E. Vylomova Associative thesauri

Page 22: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DescriptionAssociative Network based on RAT'98Network analysis

Scale-free networks

Description

Amaral, Scala et al., 2000 studied small-world networks and

compared degree distribution function P(k).2 types of distribution:

exponential(power grid system in USA, neural system of

C.elegans)

power law(WWW, metabolic networks): P(k) = k−ψ,ψ ∈ (2..4)

Scale-free networks provide better signal propagation.

E. Vylomova Associative thesauri

Page 23: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

DescriptionAssociative Network based on RAT'98Network analysis

Scale-free networks

Other examples

Similar results were obtained for Roget thesaurus(Roget,

1911),WordNet and associative networks(Steyvers and Tenenbaum,

2005).

E. Vylomova Associative thesauri

Page 24: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Three Models of Associative Network

Concept-based model

Vector-based models

Multidimensional scaling(Torgerson,1958)Latent Semantic analysis(Landauer, Dumais, 1997)

E. Vylomova Associative thesauri

Page 25: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Data

Core of the network: 4,692 lemmas with 59,392 connections

The structure is similar to associative network(nodes-lemmas, edges

- associations)

Activity accumulation

1. Initial state: random activity

2. Spreading of activation: S ti = S t−1

i +∑

j wijSt−1j , where S t

i is

activity of neuron i at the moment t.

3. Activation exceeds the threshold => produce the reaction.

S ti = 0.

E. Vylomova Associative thesauri

Page 26: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Pros and cons

Pros

very simple model

easy to understand

easy to modify(no need in reevaluation of the model)

Cons

unclear how to choose the threshold value(we did series of

experiment to �nd optimal value)

once activation is released, should we also do modi�cation for

neighbouring neurons?

E. Vylomova Associative thesauri

Page 27: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Multidimensional scaling

From concept to vector

Distance matrix:

4 =

δ1,1 δ1,2 · · · δ1,Iδ2,1 δ2,2 · · · δ2,I...

.... . .

...

δI ,1 δI ,2 · · · δI ,I

where I means number of objects(words).

Our goal is to �nd such vectors x1, ..., xI ∈ RN that ‖xi − xj‖ ≈ δijfor all i , j ∈ I .

In other words:

minx1,..,xI∑

i<j (‖xi − xj‖ − δij)2.E. Vylomova Associative thesauri

Page 28: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Latent Semantic Analysis

From concept to vector-2

Technique of analysing relationships between a set of documents

and the terms they contain by producing a set of concepts related

to the documents and terms.

In my case

Terms are lemmas, document is a set of associations for a given

stimulus.

Inputs: term-document matrix with TF*IDF values

Term frequency: TF = wij =freqij∑Nj=1 freqij

, Inverse document

frequency: IDF = log|S|

|s∈S:r∈s| , |S | is a total number of

stimuli.Singular Value Decomposition => vector representations.

E. Vylomova Associative thesauri

Page 29: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Clustering

k-means

So, we've got vectors. What's next?

Let's evaluate similarity:

First, set a distance metric, e.g. dij =r

√∑Nk=1 |xik − xjk |r

And use it with k-means clustering:

min∑k

i=1

∑xj∈Si

(xj − µi )2,

where k is a number of clusters, Si are evaluated clusters,µi are

centers of the clusters.

So, the technique is based on �nding the nearest cluster.

E. Vylomova Associative thesauri

Page 30: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Clustering

E. Vylomova Associative thesauri

Page 31: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

Pros and cons

Pros

easy to operate with vectors: add, multiple, subtract, etc.

possible to set preferred dimensionality and visualize

Cons

problem with storage: matrices are huge

complexity: MDS and LSA are based on SVD; it takes O(n3)

choosing optimal number of clusters and dimensionality

E. Vylomova Associative thesauri

Page 32: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

"Tip of the tongue"application

Data&Method

Data: RAT+Abramov's synonym dictionary

Method: LSA+k-means

E. Vylomova Associative thesauri

Page 33: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Models of Associative NetworkConcept-based model

"Tip of the tongue"application

Data&Method

The tip-of-the-tongue(TOT) phenomenon is the failure to retrieve a word from memory, combined with partial recall and the feeling that retrieval is imminent. People in a tip-of-the-tongue state can often recall one or more features of the target word, such as the first letter, its syllabic stress, and words similar in sound and/or meaning.•TOT appears to be universal (Brennenet al. 2007)•An occasional tip-of-the-tongue state is normal for people of all ages•TOT becomes more frequent as people age.R. Braun, D. McNeill and A. Luria consider the processes ofrecalling and naming the words as processes of probabilistic choice of a word from involuntary associations’ chain and relate them to the construction of human semantic memory.

1. Expand synonym and associative thesauri with new ones2. Add first letter filtering (see above)3. Add hyponyms and hyperonyms

Abramov. Dictionary of Russian synonyms and similar expressions, 1890-1999 19,297 words & phrases 18,136 synonym articles Karaulov Y.,, Tarasov E., Sorokin Y., Ufimtseva N., Cherkasova G.. 1999. Associative thesaurus of modern russian language. RAS, Moscow. 56,540 associative pairs 50,923 associative pairs (after lemmatization) 26,803 lemmasOverall (synonym and associative pairs combined together) 316,018

DATA

METHODOLOGY

Usage of associative thesauri for solving tasks related to the “tip of the tongue” phenomenonEkaterina VylomovaBauman Moscow State Technical University, Moscow State University of Printing Arts

Associative thesauri+Abramov dictionary: Комильфо - приличие

INTRODUCTION

RATRAT

RATAbramov

dict.

LemmatizationLemmatization

RATRATRATRATRAT

RATAbramovAbramovdict.dict.

AbramovAbramovdict.dict.

RATAbramov

dict.RAT

RATRAT&&

AbramovAbramovlemmaslemmas

RATRAT&&

AbramovAbramovlemmaslemmas

LSA & k-NNLSA & k-NN

Lemmatization using Yandex mystem stemmer

Apply Latent Semantic Algorithm to get vector representation of words

and k-nearest neighbours for clustering

Clusters containing similar by meaning and association words

REFERENCES1. Brown, R., and McNeill, D. (1966). The "tip of the tongue" phenomenon. Journal of Verbal Learning and Verbal Behavior 5, 325-337.2. Караулов Ю.Н., Тарасов Е.Ф., Сорокин Ю.А., Уфимцева Н.В., Черкасова Г.А. (1999). Ассоциативный тезаурус современного русского языка. РАН. (russian)3. Лурия А.Р. (1979). Язык и сознание.//под редакцией Хомско Е.Д., МГУ, Москва - 320 стр.(russian)

RHF #12-04-12039BE-mail: [email protected]

не выходить из пределов благопристойности 0.001

степенный 0.591

чинный 0.591

благочинный 0.646

бонтонный 0.646

комильфотный 0.646

пристойный 0.646

благонравный 0.684

благоприличный 0.684

благопристойный 0.684

корректный 0.684

After clustering:

EXAMPLE

FUTURE PLANS

Hmm...What's the name of that Ukranian food?

Hmm...What's the name of that Ukranian food?

E. Vylomova Associative thesauri

Page 34: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

RAT'10

Time frame: 20 years after the �rst one(2009-2010)

Location: di�erent regions of Russia.

Stimuli included 1000 most frequent words in Russian language.

The participants: young people at the age of 17-25.

E. Vylomova Associative thesauri

Page 35: Ekaterina Vylomova/Brown Bag seminar presentation

IntroductionAssociative Experiments

Associative ThesauriRussian Associative Thesaurus'98

Associative Network(Graph)Modelling of Associative Network

Future work

Thank you!

Questions?

E. Vylomova Associative thesauri