Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez...

11
NLU lecture 5: Word representations and morphology Adam Lopez [email protected] Essential epistemology Word representations and word2vec Word representations and compositional morphology Reading: Mikolov et al. 2013, Luong et al. 2013 Essential epistemology Exact sciences Empirical sciences Engineering Deals with Axioms & theorems Facts & theories Artifacts Truth is Forever Temporary It works Examples Mathematics C.S. theory F.L. theory Physics Biology Linguistics Many, including applied C.S. e.g. NLP Essential epistemology Exact sciences Empirical sciences Engineering Deals with Axioms & theorems Facts & theories Artifacts Truth is Forever Temporary It works Examples Mathematics C.S. theory Physics Biology Linguistics Many, including applied C.S. e.g. MT

Transcript of Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez...

Page 1: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

NLU lecture 5: Word

representations and m

orphologyAdam

Lopez alopez@

inf.ed.ac.uk

•Essential epistem

ology

•W

ord representations and word2vec

•W

ord representations and compositional

morphology

Reading: Mikolov et al. 2013, Luong et al. 2013

Essential epistemology

Exact sciencesEm

pirical sciences

Engineering

Deals withAxiom

s & theorem

sFacts & theories

Artifacts

Truth isForever

Temporary

It works

Examples

Mathem

atics C.S. theory F.L. theory

Physics Biology

Linguistics

Many, including applied C.S.

e.g. NLP

Essential epistemology

Exact sciencesEm

pirical sciences

Engineering

Deals withAxiom

s & theorem

sFacts & theories

Artifacts

Truth isForever

Temporary

It works

Examples

Mathem

atics C.S. theory

Physics Biology

Linguistics

Many, including applied C.S.

e.g. MT

Page 2: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Essential epistemology

Exact sciencesEm

pirical sciences

Engineering

Deals withAxiom

s & theorem

sFacts & theories

Artifacts

Truth isForever

Temporary

It works

Examples

Mathem

atics C.S. theory

Physics Biology

Linguistics

Many, including applied C.S.

e.g. MT

morphological

properties of words (facts)

Essential epistemology

Exact sciencesEm

pirical sciences

Engineering

Deals withAxiom

s & theorem

sFacts & theories

Artifacts

Truth isForever

Temporary

It works

Examples

Mathem

atics C.S. theory

Physics Biology

Linguistics

Many, including applied C.S.

e.g. MT

morphological

properties of words (facts)

Optim

ality theory

Essential epistemology

Exact sciencesEm

pirical sciences

Engineering

Deals withAxiom

s & theorem

sFacts & theories

Artifacts

Truth isForever

Temporary

It works

Examples

Mathem

atics C.S. theory

Physics Biology

Linguistics

Many, including applied C.S.

e.g. MT

morphological

properties of words (facts)

Optim

ality theory

Optim

ality theory is

finite-state

Essential epistemology

Exact sciencesEm

pirical sciences

Engineering

Deals withAxiom

s & theorem

sFacts & theories

Artifacts

Truth isForever

Temporary

It works

Examples

Mathem

atics C.S. theory

Physics Biology

Linguistics

Many, including applied C.S.

e.g. MT

morphological

properties of words (facts)

Optim

ality theory

Optim

ality theory is

finite-state

We can

represent m

orphological properties of words with finite-state autom

ata

Page 3: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Remem

ber the bandwagon

Word representations

Feedforward model

p(e)=

|e|Yi=

1

p(ei |

ei�

n+1 ,...,e

i�1 )

p(ei |

ei�

n+1 ,...,e

i�1 )

=

ei

ei�

1

ei�

2

ei�

3

CCC

WV

tanhsoftm

ax

x=x

Page 4: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

p(e)=

|e|Yi=

1

p(ei |

ei�

n+1 ,...,e

i�1 )

p(ei |

ei�

n+1 ,...,e

i�1 )

=

ei

ei�

1

ei�

2

ei�

3

CCC

WV

tanhsoftm

ax

x=x

Every word is a vector (a one-hot vector)

The concatenation of these vectors is

an n-gram

Feedforward model

p(e)=

|e|Yi=

1

p(ei |

ei�

n+1 ,...,e

i�1 )

p(ei |

ei�

n+1 ,...,e

i�1 )

=

ei

ei�

1

ei�

2

ei�

3

CCC

WV

tanhsoftm

ax

x=x

Word em

beddings are vectors: continuous

representations of each word.

Feedforward model

p(e)=

|e|Yi=

1

p(ei |

ei�

n+1 ,...,e

i�1 )

p(ei |

ei�

n+1 ,...,e

i�1 )

=

ei

ei�

1

ei�

2

ei�

3

CCC

WV

tanhsoftm

ax

x=x

n-grams are

vectors: continuous

representations of n-gram

s (or, via recursion, larger

structures)

Feedforward model

p(e)=

|e|Yi=

1

p(ei |

ei�

n+1 ,...,e

i�1 )

p(ei |

ei�

n+1 ,...,e

i�1 )

=

ei

ei�

1

ei�

2

ei�

3

CCC

WV

tanhsoftm

ax

x=x

a discrete probability distribution over V

outcomes is a vector:

V non-negative reals sum

ming to 1.

Feedforward model

Page 5: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

p(e)=

|e|Yi=

1

p(ei |

ei�

n+1 ,...,e

i�1 )

p(ei |

ei�

n+1 ,...,e

i�1 )

=

ei

ei�

1

ei�

2

ei�

3

CCC

WV

tanhsoftm

ax

x=x

No matter what we

do in NLP, we’ll (alm

ost) always have words…

Can we reuse these vectors?

Feedforward model

Design a POS tagger using

an RRNLM

Design a POS tagger using

an RRNLM

What are som

e difficulties with this?

What lim

itation do you have in learning a POS

tagger that you don’t have when learning a LM?

Design a POS tagger using

an RRNLM

What are som

e difficulties with this?

What lim

itation do you have in learning a POS

tagger that you don’t have when learning a LM?

One big problem

: LIM

ITED DATA

Page 6: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

–John Rupert Firth (1957)

“You shall know a word by the company it

keeps”

Learning word representations using language m

odeling

•Idea: we’ll learn word representations using a language m

odel, then reuse them in our PO

S tagger (or any other thing we predict from

words).

•Problem

: Bengio language model is slow. Im

agine com

puting a softmax over 10,000 words!

Continuous bag-of-words (CBO

W)

Skip-gram

Page 7: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Skip-gramLearning skip-gram

Learning skip-gramW

ord representations capture som

e world knowledge

Page 8: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Continuous Word

Representations

man

woman king

queen

Semantics

walk

walks

Syntactic

read

reads

Will it learn this?

(Additional) limitations of

word2vec•

Closed vocabulary assumption

•Cannot exploit functional relationships in learning

?

Is this language?

A Lorillard spokeswoman said, “This is an old story.”

A UNK UNK said, “This is an old story.”

What our data contains:

What word2vec thinks our data contains:

Page 9: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Is it ok to ignore words?Is it ok to ignore words?

What w

e know about linguistic structure

Morphem

e: the smallest m

eaningful unit of language

“loves”

root/stem: love

affix: -s

morph. analysis: 3rd.SG

.PRES

love +s

What if we em

bed morphem

es rather than words?

Basic idea: compute

representation recursively from

children

Vectors in green are morphem

e embeddings (param

eters)Vectors in grey are com

puted as above (functions)

f is an activation function (e.g. tanh)

Page 10: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Train compositional m

orpheme m

odel by m

inimizing distance to reference vector

Target output: reference vector p

r

contructed vector is pc

Minim

ize:

Or, train in context using

backpropagation

Vectors in green are morphem

e embeddings (param

eters)Vectors in grey are com

puted as above (functions)

Vectors in blue are word or n-gram em

beddings (parameters)

(Basically a feedforward LM

)

Where do we get m

orphemes?

•Use an unsupervised m

orphological analyzer (we’ll talk about unsupervised learning later on).

•How m

any morphem

es are there?

New stems are invented

every day!

fleeking, fleeked, and fleeker are all

attested…

Page 11: Essential epistemology morphology esentations and d · d esentations and morphology Adam Lopez alopez@inf.ed.ac.uk • Essential epistemology • d2vec • esentations and compositional

Representations learned by com

positional morphology m

odelSum

mary

•Deep learning is not m

agic and will not solve all of your problem

s, but representation learning is a very powerful idea.

•W

ord representations can be transferred between models.

•W

ord2vec trains word representations using an objective based on language m

odeling—so it can be trained on unlabeled data.

•Som

etimes called unsupervised, but objective is supervised!

•Vocabulary is not finite.

•Com

positional representations based on morphem

es make our

models closer to open vocabulary.