Lecture 8: NLP and Word Embeddingsfall97.class.vision/slides/8.pdf · Transfer learning and word...

Lecture 8 -SRTTU – A.Akhavan 1 ۱۳۹۷آبان ۱۹شنبه،

Lecture 8: NLP and Word Embeddings

Alireza Akhavan Pour

CLASS.VISION

Lecture 8 -SRTTU – A.Akhavan

NLP and Word EmbeddingsWord representation

2 ۱۳۹۷آبان ۱۹شنبه،

V = [a, aaron, ..., zulu, <UNK>]1-hot representation

𝑽 = 𝟏𝟎, 𝟎𝟎𝟎

I want a glass of orange ______ .

I want a glass of apple ______ .

𝑂5391 𝑂9853

𝑗𝑢𝑖𝑐𝑒

مشکل؟.فاصله اقلیدسی تمام بردارها یکسان است.کندgeneralizeنمیتواند از روی کلماتی که در آموزش دیده است

Featurized representation: word embedding

1-جنسیت 1 -0.95 0.97 0.00 0.01

0.94سلطنتی 0.93 -0.01 0.000.020.01

0.71سن 0.69 0.03 -0.020.020.03

0.02خوراکی 0.00 0.96 -0.97-0.010.01

سایززنده

قیمتفعل

𝒆𝟔𝟐𝟓𝟕

𝒆𝟒𝟓𝟔

I want a glass of orange ______ .

I want a glass of apple ______ .

𝑗𝑢𝑖𝑐𝑒

Visualizing word embeddings

[van der Maaten and Hinton., 2008.Visualizing data using t-SNE]

Using word embeddings:Named entity recognition example

Robert Lin is an apple farmer

Sally Johnson is an orange farmer

Robert Lin is a durian cultivator

Now if you have tested your model with this sentence "Robert Lin is a durian cultivator“ the network should learn the name even if it hasn't seen the word durian before (during training). That's the power of word representations.

The algorithms that are used to learn word embeddings can examine billions of words of unlabeled text - for example, 100 billion words and learn the representation from them.

Using word embeddings:Named entity recognition example

Transfer learning and word embeddings

I. Learn word embeddings from large text corpus (1-100 billion of words).

Or download pre-trained embedding online.

II. Transfer embedding to new task with the smaller training set (say, 100k words).

III. Optional: continue to finetune the word embeddings with new data. You bother doing this if your smaller training set (from step 2) is big enough.

10.000مثال به جای وکتور . یک ویژگی مثبت دیگر کاهش ابعاد ورودی است.بعدی کار خواهیم کرد300با یک وکتور one-hotبعدی

Relation to face encoding (Embeddings)

[Taigman et. al., 2014. DeepFace: Closing the gap to human level performance]

Word embeddings have an interesting relationship to the face recognition task:o In this problem, we encode each face into a vector and then check how similar

are these vectors.o Words encoding and embeddings have a similar meaning here.

In the word embeddings task, we are learning a representation for each word in our vocabulary (unlike in image encoding where we have to map each new image to some n-dimensional vector).

Properties of word embeddings

• Analogies

[Mikolov et. al., 2013, Linguistic regularities in continuous space word representations]

Can we conclude this relation: Man ==> Woman King ==> ??

𝒆𝑴𝒂𝒏 𝒆𝑾𝒐𝒎𝒂𝒏 𝒆𝑲𝒊𝒏𝒈 𝒆𝑸𝒖𝒆𝒆𝒏

eMan – eWoman ≈

−2000

eKing – eQ𝐮𝐞𝐞𝐧 ≈

−2000

Analogies using word vectors

𝑭𝒊𝒏𝒅 𝒘𝒐𝒓𝒅 𝒘: 𝑎𝑟𝑔max𝑤

൯𝑠𝑖𝑚(𝑒𝑤, 𝑒𝑘𝑖𝑛𝑔 − 𝑒𝑚𝑎𝑛 + 𝑒𝑤𝑜𝑚𝑎𝑛

𝒆𝒘

Cosine similarity

𝑠𝑖𝑚 𝑢, 𝑣 =𝑈𝑇𝑉

𝑢 2 𝑣 2

൯𝑠𝑖𝑚(𝑒𝑤, 𝑒𝑘𝑖𝑛𝑔 − 𝑒𝑚𝑎𝑛 + 𝑒𝑤𝑜𝑚𝑎𝑛

𝑈𝑉

𝑈 − 𝑉 2

:فاصله اقلیدسی

Ottawa:Canada as Iran:Tehran

Man:Woman as boy:girl

Big:bigger as tall:taller

Yen: Japan as Ruble:Russia

Embedding matrix

here … orange example ... <UNK>

-0.2 … -0.67 -0.2 … 0.2

0.7 … 0.3 -0.5 … 0.1

0.85 … 0.25 0.3 … 1

-0.04 … -0.18 0.33 … -0.1

0.5 … 1 0.3 … 0.2

10.000

0000.:1.:00

𝑂6257

Embedding matrix

-0.2 … -0.67 -0.2 … 0.2

0.7 … 0.3 -0.5 … 0.1

0.85 … 0.25 0.3 … 1

-0.04 … -0.18 0.33 … -0.1

0.5 … 1 0.3 … 0.2

10.000

0000.:1.:00

𝑂6257

𝑬 . 𝑶𝟔𝟐𝟓𝟕 = 𝒆𝟔𝟐𝟓𝟕

Embedding matrix

-0.2 … -0.67 -0.2 … 0.2

0.7 … 0.3 -0.5 … 0.1

0.85 … 0.25 0.3 … 1

-0.04 … -0.18 0.33 … -0.1

0.5 … 1 0.3 … 0.2

𝑬 .𝑶𝟔𝟐𝟓𝟕 = 𝒆𝟔𝟐𝟓𝟕

300x10k

0000.:1.:00

10k x 1 300 x 1

If O6257 is the one hot encoding of the word orange of shape (10000, 1), thennp.dot(E,O6257) = e6257 which shape is (300, 1).

Generally np.dot(E, Oj) = ej (embedding for word j)

Embedding matrix

The Embedding layer is best understood as a dictionary mapping integer indices (which stand for specific words) to dense vectors. It takes as input integers, it looks up these integers into an internal dictionary, and it returns the associated vectors. It's effectively a dictionary lookup.

https://keras.io/layers/embeddings/

Lecture 8 -SRTTU – A.Akhavan 16 ۱۳۹۷آبان ۱۹شنبه،

منابع

• https://www.coursera.org/specializations/deep-learning

• https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.1-using-word-embeddings.ipynb

• https://github.com/mbadry1/DeepLearning.ai-Summary/

Lecture 8: NLP and Word Embeddingsfall97.class.vision/slides/8.pdf · Transfer learning and word...

Documents

Transcript of Lecture 8: NLP and Word Embeddingsfall97.class.vision/slides/8.pdf · Transfer learning and word...

Evaluating the Stability of Embedding-based Word Similarities · 2017-11-12 · 1 Introduction Word embeddings are a popular technique in natural language processing (NLP) in which

NLP and Word Embeddings - cs230.stanford.edu · Andrew Ng Visualizing word embeddings fish dog cat apple grape one orange three two four king man queen woman [van der Maaten and Hinton.,

Word Embeddings - w4nderlustw4nderlu.st/content/4-teaching/3-word-embeddings/... · Word Embeddings: hot trend in NLP ... Different languages use different ... Zellig Harris, Mathematical

From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Embeddings with HD Computing/VSA

Mixed Membership Word Embeddings for Computational Social Scienceproceedings.mlr.press/v84/foulds18a/foulds18a.pdf · 2018-08-21 · Mixed Membership Word Embeddings for Computational

International Journal of Approximate Reasoning · years. Within NLP, many works with deep learning models have focused on learning word embeddings (a. k. a. word vec-tors) with neural

September 2019 Introduction to cross-lingual word ... · Introduction to cross-lingual word-embeddings Diego Sáez-Trumper, Wikimedia Research September 2019. ... Word embeddings

Dynamic Word Embeddings - arxiv.org

Word Embeddings: History & Behind the Scene

Natural Language Processingai-nlp-ml/resources/lectures/DL-nlp-16jun17.pdf · Word embeddings used n Word2vec: continuous bag-of-words and skip-gram (Mikolov et al., 2013) n December

Word embeddings for social goods

NLP Guest Lecture: Morphological Word Embeddingsmorphological analyzer (MADAMIRA), train word and lemma embeddings w(t-2) w(t-1) w(t+1) w(t+2) PROJECTION OUTPUT w(t) l(t) INPUT 9 Word

Word Meaning Vector Semantics & Embeddings

Understanding the Downstream Instability of Word Embeddings · modern natural language processing (NLP) pipelines—pre-trained word embeddings—affects the instability of downstream

Dense Word Embeddings

Deep Learning for NLP: An Introduction to Neural Word Embeddings

Improving the Compositionality of Word Embeddings · 2020-05-08 · Improving the Compositionality of Word Embeddings We present an in-depth analysis of four popular word embeddings

Understanding the semantic content of sparse word ... · 1 Introduction Word embeddings developed into a major tool in NLP applications. An important problem – receiving much attention

Dependency-Based Word Embeddings