Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and...

47

Transcript of Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and...

Page 1: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Bilingual Word Embeddings: Supervision,Modeling, Evaluation, Application

Ivan Vuli¢

University of Cambridge

South England NLP Meetup; London; September 22, 2016

[email protected]

1 / 47

Page 2: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Applications: Embeddings are Everywhere(Monolingual)

Intrinsic (semantic) tasks:

Semantic similarity and synonym detection, word analogiesConcept categorization

Selectional preferences, language grounding

Extrinsic (downstream) tasks:

Named entity recognition, part-of-speech tagging, chunkingSemantic role labeling, dependency parsing

Sentiment analysis

Beyond NLP:

Information retrieval, Web search

Question answering, dialogue systems2 / 47

Page 3: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Applications: Embeddings are Everywhere(Bilingual)

Intrinsic (semantic) tasks:

Cross-lingual and multilingual semantic similarity

Bilingual lexicon learning, multi-modal representations

Extrinsic (downstream) tasks:

Cross-lingual SRL, POS tagging, NERCross-lingual dependency parsing and sentiment analysisCross-lingual annotation and model transfer

Statistical/Neural machine translation

Beyond NLP:

(Cross-Lingual) Information retrieval

(Cross-Lingual) Question answering3 / 47

Page 4: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Word Embeddings

Dense representations → real-valued low-dimensional vectors

Word embedding induction→ learn word-level features which generalise well across tasks andlanguagesWord embeddings capture interesting and universal regularities:

4 / 47

Page 5: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Word Embeddings

Dense representations → real-valued low-dimensional vectors

Word embedding induction→ learn word-level features which generalise well across tasks andlanguagesWord embeddings capture interesting and universal regularities:

5 / 47

Page 6: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Motivation

The NLP community has developed useful features for several tasks but �ndingfeatures that are...

1. task-invariant (POS tagging, SRL, NER, parsing, ...)

(monolingual word embeddings)

2. language-invariant (English, Dutch, Chinese, Spanish, ...)

(bilingual word embeddings → this talk)

...is non-trivial and time-consuming (20+ years of feature engineering...)

6 / 47

Page 7: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Motivation

The NLP community has developed useful features for several tasks but �ndingfeatures that are...

1. task-invariant (POS tagging, SRL, NER, parsing, ...)

(monolingual word embeddings)

2. language-invariant (English, Dutch, Chinese, Spanish, ...)

(bilingual word embeddings → this talk)

...is non-trivial and time-consuming (20+ years of feature engineering...)

Learn word-level features which generalise across tasks andlanguages

7 / 47

Page 8: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Learning Word Representations

Key idea

Distributional hypothesis → words with similar meanings are likelyto appear in similar contexts

[Harris, Word 1954]

shouts:

�Meaning as use!�

calmly states:

�You shall know a word by the company it keeps.�8 / 47

Page 9: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Word Embeddings

9 / 47

Page 10: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Word Embedding Models

Models di�er based on:

1. How they compute the context- bag-of-words, positional, dependency-based,...

2. How they map context to target wt = f(con)- linear, bilinear, non-linear,...

3. How they measure the loss between wt and f(con) and howthey are trained- negative sampling, sampled rank loss, squared-error,...

10 / 47

Page 11: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Word Embeddings

Similar contexts → �similar� embeddings?

For a �xed context, all distributionally similar words will/should getupdated towards a common point

11 / 47

Page 12: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Monolingual...

Skip-gram with negative sampling (SGNS)[Mikolov et al.; NIPS 2013]

12 / 47

Page 13: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Monolingual...

Skip-gram with negative sampling (SGNS)[Mikolov et al.; NIPS 2013]

13 / 47

Page 14: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Still Monolingual...

Skip-gram with negative sampling (SGNS)[Mikolov et al.; NIPS 2013]

Learning from the set D of (word, context) pairs observed in a corpus:

(w, v) = (w(t), w(t± i)); i = 1, ..., cs; cs = context window size

SG learns to predict the context of the pivot word

John saw a cute gray huhblub running in the �eld.

D = (huhblub, cute), (huhblub, gray), (huhblub, running), (huhblub, in)

vec(huhblub) = [−0.23, 0.44,−0.76, 0.33, 0.19, . . .]

14 / 47

Page 15: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Still Monolingual...

Representation of each word w ∈ V :

vec(w) = [f1, f2, . . . , fdim]

Word representations in the same shared semantic (or embedding) space!

Image courtesy of [Gouws et al., ICML 2015]15 / 47

Page 16: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Bilingual Word Embeddings (BWEs)

Representation of a word wS1 ∈ V S :

vec(wS1 ) = [f11 , f

12 , . . . , f

1dim]

Exactly the same representation for wT2 ∈ V T :

vec(wT2 ) = [f21 , f

22 , . . . , f

2dim]

Language-independent word representations in the same sharedsemantic (or embedding) space

16 / 47

Page 17: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Bilingual Word Embeddings

Monolingual vs. Bilingual

Q1 → How to align semantic spaces in two di�erent languages?

Q2 → Which bilingual signals are used for the alignment?

See also:

[Upadhyay et al.: Cross-Lingual Models of Word Embeddings: An Empirical

Comparison; ACL 2016]17 / 47

Page 18: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Bilingual Signals?

(a) Word alignments?

(b) Sentence alignments?

(c) Word translation pairs?

(d) Document alignments?

18 / 47

Page 19: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Bilingual Word Embeddings

Two desirable properties:

P1 → Leverage (large) monolingual training sets tied togetherthrough a bilingual signal

P2 → Use as inexpensive bilingual signal as possible in orderto learn a shared space in a scalable and widely applicable manneracross languages and domains

19 / 47

Page 20: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

BWEs and Bilingual Signals

(Type 1) Jointly learn and align BWEs using parallel-only data

[Hermann and Blunsom, ACL 2014; Chandar et al., NIPS 2014]

(Type 2) Jointly learn and align BWEs using monolingual and parallel data

[Gouws et al., ICML 2015; Soyer et al., ICLR 2015, Shi et al., ACL 2015]

(Type 3) Learn BWEs from comparable document-aligned data[Vuli¢ and Moens, ACL 2015, JAIR 2016]

(Type 4) Align pretrained monolingual embedding spaces using seed lexicons[Mikolov et al., arXiv 2013; Lazaridou et al., ACL 2015]

20 / 47

Page 21: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

BWEs and Bilingual Signals

(Type 1) Jointly learn and align BWEs using parallel-only data

[Hermann and Blunsom, ACL 2014; Chandar et al., NIPS 2014]

(Type 2) Jointly learn and align BWEs using monolingual and parallel data

[Gouws et al., ICML 2015; Soyer et al., ICLR 2015, Shi et al., ACL 2015]

(Type 3) Learn BWEs from comparable document-aligned data[Vuli¢ and Moens, ACL 2015, JAIR 2016]

(Type 4) Align pretrained monolingual embedding spaces using seed lexicons[Mikolov et al., arXiv 2013; Lazaridou et al., ACL 2015]

21 / 47

Page 22: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 1: Parallel-Only

22 / 47

Page 23: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 1: Example 1

[Hermann and Blunsom, ACL 2014]

Ebi(a, b) = ||f(a)− g(b)||2Ehl(a, b, n) = [m+ Ebi(a, b)− Ebi(a, n)]+

23 / 47

Page 24: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 1: Example 2

[Luong et al., NAACL 2015]

→ Using hard word alignments

→ E�ectively: training 4 SGNS models, L1 → L1, L1 → L2, L2 → L1,

L2 → L2

24 / 47

Page 25: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

BWEs and Bilingual Signals

(Type 1) Jointly learn and align BWEs using parallel-only data

[Hermann and Blunsom, ACL 2014; Chandar et al., NIPS 2014]

(Type 2) Jointly learn and align BWEs using monolingual and parallel data

[Gouws et al., ICML 2015; Soyer et al., ICLR 2015, Shi et al., ACL 2015]

(Type 3) Learn BWEs from comparable document-aligned data[Vuli¢ and Moens, ACL 2015, JAIR 2016]

(Type 4) Align pretrained monolingual embedding spaces using seed lexicons[Mikolov et al., arXiv 2013; Lazaridou et al., ACL 2015]

25 / 47

Page 26: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 2: Joint Mono+CL

A (very) simple formulation:

γ1MonolingualS + γ2MonolingualT + δBilingual

26 / 47

Page 27: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 2 Example: BilBOWA

[Gouws et al., ICML 2015]

27 / 47

Page 28: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 2 Example: BilBOWA

[Gouws et al., ICML 2015]

28 / 47

Page 29: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

BWEs and Bilingual Signals

(Type 1) Jointly learn and align BWEs using parallel-only data

[Hermann and Blunsom, ACL 2014; Chandar et al., NIPS 2014]

(Type 2) Jointly learn and align BWEs using monolingual and parallel data

[Gouws et al., ICML 2015; Soyer et al., ICLR 2015, Shi et al., ACL 2015]

(Type 3) Learn BWEs from comparable document-aligned data[Vuli¢ and Moens, ACL 2015, JAIR 2016]

(Type 4) Align pretrained monolingual embedding spaces using seed lexicons[Mikolov et al., arXiv 2013; Lazaridou et al., ACL 2015]

29 / 47

Page 30: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 3 Example

[Vuli¢ and Moens, ACL 2015, JAIR 2016]

→ Length-Ratio Shu�e: Training a SGNS (or any other monolingualmodel!) on shu�ed �pseudo-bilingual� documents

30 / 47

Page 31: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 4 Example

Post-Hoc Mapping with Seed Lexicons

31 / 47

Page 32: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 4 Example

Post-Hoc Mapping with Seed Lexicons

Learn to transform the pre-trained source language embeddingsinto a space where the distance between a word and its translationpair is minimised

Bilingual signal → word translation pairs32 / 47

Page 33: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Type 4 Example

Monolingual WE model on monolingual corpora

Bilingual signal → N word translation pairs (xi, yi) , i = 1, . . . , N

Transformation between spaces → we assume linear mapping[Mikolov et al., arXiv 2013; Dinu et al., ICLR WS 2015]

minW∈RdS×dT

||XW −Y||2F + λ||W||2F

X → Source language vectors for words from a training set

Y → Target language vectors for words from a training set

W → �Translation� (or transformation) matrix

33 / 47

Page 34: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Other Models/Upgrades

→ A Hybrid Model: Type 3 + Type 4 → A type-hybrid procedure whichretain only highly reliable translation pairs obtained by a Type 3 model as aseed lexicon for Type 4 models.[Vuli¢ and Korhonen, ACL 2016]

→ Bilingual or Multilingual? → Type 1 and Type 2 models encoding morethan 2 languages in their objectives.[Coulmance et al., EMNLP 2015]

→ Beyond the word level? → Learning cross-lingual representations forphrases and sentences.[Soyer et al., ICLR 2015]

→ Sentence IDs and Document IDs as Bilingual Signals → Inverted indexingplus dimensionality reduction.[Sogaard et al., ACL 2015, Levy and Sogaard, arXiv 2016]

→ ...

34 / 47

Page 35: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Ranked Lists in Bilingual Spaces

Spanish-English (ES-EN) Italian-English (IT-EN) Dutch-English (NL-EN)

(1)reina

(2)reina

(3)reina

(1)madre

(2)madre

(3)madre

(1)schilder

(2)schilder

(3)schilder

(Spanish) (English) (Combined) (Italian) (English) (Combined) (Dutch) (English) (Combined)

rey queen(+) queen(+) padre mother(+) mother(+) kunstschilderpainter(+) painter(+)trono heir rey moglie father padre schilderij painting kunstschildermonarca throne trono sorella sister moglie kunstenaar portrait paintingheredero king heir �glia wife father olieverf artist schilderijmatrimonio royal throne �glio daughter sorella olieverfschilderijcanvas kunstenaarhijo reign monarca fratello son �glia schilderen impressionistportraitreino succession heredero casa friend �glio frans cubism olieverfreinado princess king amico childhood sister nederlands art olieverfschilderijregencia marriage matrimonio marito family fratello componist poet schilderenduque prince royal donna cousin wife beeldhouwerdrawing artist

−−−→reina−−−−−−→woman+−−→man ≈ −→rey

−−−→queen−−−−−→mujer +−−−−−→hombre ≈

−−→king

−−−→reina−−−−−→mujer +

−−−−−→hombre ≈ −→rey

35 / 47

Page 36: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example: Cross-Lingual IR

Beyond the level of words: We learn word embeddings:vec(huhblub) = [−0.23, 0.44,−0.76, 0.33, 0.19, . . .]vec(�u�y) = [0.31, 0.02,−0.11,−0.28, 0.52, . . .]

→ How to build document and query embeddings?vec(huhblup is �u�y) = ??

Adapting the framework from compositional distributional

semantics: [Mitchell and Lapata, ACL 2008; Socher et al., EMNLP 2011;

Milajevs et al., EMNLP 2014] and many more...

A generic composition with a bag-of-words assumption:(d = {w1, w2, . . . , w|Nd|})

−→d = −→w1 ?

−→w2 ? . . . ?−−−→w|Nd|

? = compositional vector operator (addition, multiplication, tensor product,..)36 / 47

Page 37: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example: Cross-Lingual IR

Document and Query Embeddings

A general framework → in this work the simple and e�ectiveadditive composition:[Mitchell and Lapata, ACL 2008]

−→d = −→w1 +

−→w2 + . . .+−−−→w|Nd|

The dim-dimensional document embedding in the same bilingualword embedding space:

−→d = [fd,1, . . . , fd,k, . . . , fd,dim]

37 / 47

Page 38: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example: Cross-Lingual IR

Document and Query Embeddings

→ The same principles with queries

−→Q = −→q1 +−→q2 + . . .+−→qm

The dim-dimensional query embedding in the same bilingualword embedding space:

−→Q = [fQ,1, . . . , fQ,k, . . . , fQ,dim]

38 / 47

Page 39: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example: Cross-Lingual IR I

Final (Simple) CLIR Model[Vuli¢ and Moens, SIGIR 2015]

1 Induce a bilingual embedding space using any BWE model.

2 Given is a target document collection DC = {d′1, . . . , d′N ′}.Compute dim-dimensional document embeddings

−→d′ for

each d′ ∈ DC using the dim-dimensional WEs obtained in theprevious step and a semantic composition model.

3 After the query Q = {q1, . . . , qm} is issued in language LS ,compute a dim-dimensional query embedding.

39 / 47

Page 40: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example: Cross-Lingual IR II

4 For each d′ ∈ DC, compute the semantic similarity scoresim(d′, Q) which quanti�es each document's relevance to thequery Q:

sim(d′, Q) = SF (d′, Q) =

−→d′ ·−→Q

|−→d′ | · |

−→Q |

5 Rank all documents from DC according to their similarityscores from the previous step.

40 / 47

Page 41: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example Experiment: BLL

Task → Bilingual lexicon learning (BLL)

Goal → to build a non-probabilistic bilingual lexicon of word translations

Test Sets → ground truth word translation pairs built for three language pairs:Spanish (ES)-, Dutch (NL)-, Italian (IT)-English (EN)

[Vuli¢ and Moens, NAACL 2013, EMNLP 2013](Similar relative performance on other BLL test sets)

Evaluation Metric → Top 1 accuracy (Acc1)

(Similar model rankings with Acc5 and Acc10)

41 / 47

Page 42: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Example Experiment: BLL

Example Experiment: BLL(5K seed lexicons)

Model ES-EN NL-EN IT-EN

BiCVM (Type 1) 0.532 0.583 0.569BilBOWA (Type 2) 0.632 0.636 0.647BWESG (Type 3) 0.676 0.626 0.643

BNC+GT (Type 4) 0.677 0.641 0.646

ORTHO 0.233 0.506 0.224BNC+HYB+ASYM 0.673 0.626 0.644BNC+HYB+SYM 0.681 0.658* 0.663*(3388; 2738; 3145)HFQ+HYB+ASYM 0.673 0.596 0.635HFQ+HYB+SYM 0.695* 0.657* 0.667*

42 / 47

Page 43: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Evaluation and Comparison?

1. Cross-lingual document classi�cation (CLDC)

- Problematic? Word-level embeddings → document-levelevaluation?- Limited? Only for English-German; more language pairs required

2. Bilingual lexicon extraction

- Limited? Often ad-hoc and manually designed datasets.- Problematic? Contradictory results.

3. Cross-lingual POS tagging

- Too coarse-grained? Can it capture subtle di�erences between themodels?

...We need to �nd a better way of evaluating bilingual wordembeddings.

43 / 47

Page 44: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Evaluation and Comparison?

We need better evaluation protocols for cross-lingual embeddings...

How can we evaluate:

- phrase-level cross-lingual representations?

- sentence-level cross-lingual representations?

- document-level cross-lingual representations?

44 / 47

Page 45: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Applications: Embeddings are Everywhere(Bilingual)

Intrinsic (semantic) tasks:

Cross-lingual and multilingual semantic similarity

Bilingual lexicon learning, multi-modal representations

Extrinsic (downstream) tasks:

Cross-lingual SRL, POS tagging, NERCross-lingual dependency parsing and sentiment analysisCross-lingual annotation and model transfer

Statistical/Neural machine translation

Beyond NLP:

(Cross-Lingual) Information retrieval

(Cross-Lingual) Question answering45 / 47

Page 46: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Language Technology Lab @Cam

46 / 47

Page 47: Bilingual Word Embeddings: Supervision, Modeling ... · !Bilingual or Multilingual? !ypTe 1 and ypTe 2 models encoding more than 2 languages in their objectives. [Coulmance et al.,

Questions?

47 / 47