Multi-Relational Latent Semantic Analysis

Post on 21-Feb-2016

90 views 0 download

Tags:

description

Multi-Relational Latent Semantic Analysis. Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research. Natural Language Understanding. Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text - PowerPoint PPT Presentation

Transcript of Multi-Relational Latent Semantic Analysis

Multi-Relational Latent Semantic Analysis

Kai-Wei ChangJoint work with Scott Wen-tau Yih, Chris MeekMicrosoft Research

Natural Language UnderstandingBuild an intelligent system that can interact with human using natural language

Research challengeMeaning representation of textSupport useful inferential tasks

Semantic word representation is the foundation

Language is compositionalWord is the basic semantic unit

Continuous Semantic Representations

A lot of popular methods for creating word vectors!

Vector Space Model [Salton & McGill 83]Latent Semantic Analysis [Deerwester+ 90]Latent Dirichlet Allocation [Blei+ 01]Deep Neural Networks [Collobert & Weston 08]

Encode term co-occurrence informationMeasure semantic similarity well

Continuous Semantic Representations

sunnyrainy

windycloudy

car

wheel

cab sad

joy

emotion

feeling

Semantics Needs More Than SimilarityTomorrow

will be rainy. Tomorrow

will be sunny.

rainy, sunny?

rainy, sunny?

Leverage Linguistic ResourcesCan’t we just use the existing linguistic resources?

Knowledge in these resources is never completeOften lack of degree of relations

Create a continuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations (not just similarity)

RoadmapIntroductionBackground

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

RoadmapIntroductionBackground

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Latent Semantic Analysis [Deerwester+ 1990]

Data representationEncode single-relational data in a matrix

Co-occurrence (e.g., from a general corpus)Synonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

Cosine Score

Encode Synonyms in MatrixInput: Synonyms from a thesaurus

Joyfulness: joy, gladdenSad: sorrow, sadden

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 0 0 0

Group 2: “sad” 0 0 1 1 0Group 3: “affection”

0 0 0 0 1

Target word: row-vector

Term: column-vector

SVD generalizes the original dataUncovers relationships not explicit in the thesaurusTerm vectors projected to -dim latent space

Word similarity: cosine of two column vectors in

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕𝑇≈

𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛

𝚺terms

Problem: Handling Two Opposite RelationsSynonyms & AntonymsLSA cannot distinguish antonyms [Landauer

2002]“Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]

Polarity Inducing LSA [Yih, Zweig, Platt 2012]

Data representationEncode two opposite relations in a matrix using “polarity”

Synonyms & antonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in MatrixJoyfulness: joy, gladden; sorrow, sadden

Sad: sorrow, sadden; joy, gladdenInducing polarity

Cosine Score:

Target word: row-vector

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in MatrixJoyfulness: joy, gladden; sorrow, sadden

Sad: sorrow, sadden; joy, gladdenInducing polarityTarget word: row-

vector

Cosine Score:

Limitation of the matrix representationEach entry captures a particular type of relation between two entities, orTwo opposite relations with the polarity trick

Encoding other binary relationsIs-A (hyponym) – ostrich is a birdPart-whole – engine is a part of car

Problem: How to Handle More Relations?

Encode multiple relations in a 3-way tensor (3-dim array)!

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Represent word relations using a tensorEach slice encodes a relation between terms and target words.

0 0 0 00 0 1 01 0 0 00 0 0 0

joy gladd

ensa

dden

feelin

gjoyfulness

gladdensad

anger

gladd

en

joy sadd

enfee

ling

joyfulnessgladden

sadanger

Synonym layer Antonym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

Construct a tensor with two slices

Encode Multiple Relations in Tensor

Can encode multiple relations in the tensor

0 0 0 10 0 0 00 0 0 10 0 0 1

gladd

en

joy sadd

enfee

ling

joyfulnessgladden

sadanger

Hyponym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

1 1 0 01 1 0 00 0 1 00 0 0 0

Encode Multiple Relations in Tensor

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

Tensor Decomposition – Analogy to SVD

𝑤 1,𝑤

2,…,𝑤

𝑑

𝑡1 , 𝑡2 , …, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟𝑟

𝑟

𝑤 1,𝑤

2,…,𝑤

𝑑

latent representation of words

𝑤 1,𝑤

2,…,𝑤

𝑑

𝑡1 , 𝑡2 , …, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟𝑟

𝑟

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

~~ × ×

𝑟𝑟

𝑅1 ,𝑅

2 ,…,𝑅

𝑚 𝑟

latent representation of words

latent representation of a relation

Tensor Decomposition – Analogy to SVD

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Measure Degree of RelationSimilarity

Cosine of the latent vectors

Other relation (both symmetric and asymmetric)

Take the latent matrix of the pivot relation (synonym)Take the latent matrix of the relationCosine of the latent vectors after projection

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 00 0 1 01 0 0 00 0 0 0

joy gladd

ensa

dden

fellin

gjoyfulness

gladdensad

anger

gladd

en

joy sadd

enfel

ling

joyfulnessgladden

sadanger

Synonym layer Antonym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

Measure Degree of RelationRaw Representation

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 00 0 1 01 0 0 00 0 0 0

joy gladd

ensa

dden

fellin

gjoyfulness

gladdensad

anger

gladd

en

joy sadd

enfel

ling

joyfulnessgladden

sadanger

Synonym layer Antonym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

Measure Degree of RelationRaw Representation

𝐻𝑦𝑝𝑒𝑟 ( joy , feeling )=cos (𝑾 : , joy ,𝑠𝑦𝑛 ,𝑾 : ,feeling , h𝑦𝑝𝑒𝑟 )

joy gladd

ensa

dden

fellin

gjoyfulness

gladdensad

angerSynonym layer

1 0 0 01 1 0 00 0 1 00 0 0 0

Estimate the Degree of a RelationRaw Representation

0 0 0 10 0 0 00 0 0 10 0 0 1

gladd

en

joy sadd

enfee

ling

joyfulnessgladden

sadanger

Hypernym layer

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑊 : , w 𝑖 , 𝑠𝑦𝑛 ,𝑊 : , w 𝑗 ,𝑟𝑒𝑙 )

Measure Degree of RelationRaw Representation

w 𝑗w𝑖

Synonym layer

The slice of the specific relation

Cos ( , )

~~ × ×

𝑣1 ,𝑣2, …,𝑣𝑛

𝑟𝑟

𝑆,𝑆

2 ,…,𝑆𝑚 𝑟~~

× ×

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑺:, : ,𝑠𝑦𝑛𝐕𝑖 , :𝑇 ,𝑺: , :, 𝑟𝑒𝑙𝐕 𝑗 ,:

𝑇 )

Measure Degree of RelationLatent Representation

w 𝑗w𝑖

R 𝑠𝑦𝑛

R 𝑟𝑒𝑙 v 𝑗 v 𝑖

RoadmapIntroductionBackground

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Experiment: Data for Building MRLSA ModelEncarta Thesaurus

Record synonyms and antonyms of target wordsVocabulary of 50k terms and 47k target words

WordNetHas synonym, antonym, hyponym, hypernym relationsVocabulary of 149k terms and 117k target words

Goals:MRLSA generalizes LSA to model multiple relationsImprove performance by combing heterogeneous data

Example Antonyms Output by MRLSA

Target High Score Wordsinanimate

alive, living, bodily, in-the-flesh, incarnate

alleviate exacerbate, make-worse, in-flame, amplify, stir-up

relish detest, abhor, abominate, despise, loathe

* Words in blue are antonyms listed in the Encarta thesaurus.

Results – GRE Antonym TestTask: GRE closest-opposite questions

Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct

1 2 3 40.5

0.550.6

0.650.7

0.750.8

0.85

0.64

0.56

0.74 0.77

Accu

racy

Example Hyponyms Output by MRLSATarget High Score Wordsbird ostrich, gamecock, nighthawk, amazon,

parrotautomobile

minivan, wagon, taxi, minicab, gypsy cab

vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli

Results – Relational Similarity (SemEval-2012)Task: Class-Inclusion Relation ( is-a kind of )

Most/least illustrative word pairs(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown

1 2 30.3

0.350.4

0.450.5

0.550.6

0.340.37

0.56

Accu

racy

ConclusionsContinuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations

ApproachesBetter data representationMatrix/Tensor decomposition

Challenges & Future WorkCapture more types of knowledge in the modelSupport more sophisticated inferential tasks