Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Survey on Text Representation&

Shallow Sentence Embedding: Fixed-SizeOrdinally-Forgetting Encoding Approach

Ahmed H. AlGhidaniMSc Student in Computer Science at Cairo University

Research and SDE at RDI Egypt

ahmed.hani@rdi-eg.com

Agenda

• Word Representation- 1-hot Encoding

• Word Embedding- Word2Vec- GloVe

• Sentence Representation- Bag of Words (BoW)

• Sentence Embedding- Doc2Vec- Fixed-Size Ordinally-Forgetting Encoding (FOFE)- Others

Word Representation

• Transform a word to vector space model

• Each word has a unique vector that differsit from others

• Vector dimensions is varies according tothe method used for transformation

Word Representation (Cont.)

• 1-hot Encoding

Word Representation (Cont.)

• Pros- Easy to understand and implement

• Cons- Depends on vocabulary size (Memoryissues)- Doesn’t represent the semanticrepresentation of words

Word Embedding

• A vector space model

• Each word has a fixed-size unique vectorrepresentation

• It shows the semantic relations betweenwords

Word Embedding (Cont.)

• Word2Vec (Mikolov, 2013)

• To represent a word, we need to use itscontext

• Group of models that are about usingshallow Neural Networks

• We will talk about Skip-gram andContinuos Bag of Words (CBOW)

• Skip-gram model

• 2 layers of Multi-layer perceptron (MLP)neural netwoek

• Input is a word and output is group ofwords that maybe occured if that word isgiven

• Softmax layer to get the probability ofgetting an output word

• Continuos Bag of Words (CBOW)

• 2 layers of MLP neural network

• Input is a word’s context and output is theword

• Softmax layer to get the probability ofgetting an output word

• Pros- Fast and effcient- Fixed-size Dimension

• Cons- We don’t know why it works (includingMikolov itself!)

• Global Vectors (GloVe)

• We build a co-occurence matrix for thewhole corpus

• Factorize this matrix to word vectors andcontext vectors

Corpus: A D C E A D F E B A C E DWindow-size: 2 (bi-grams)

Co-occurence matrix XA B C D E

A 0 1 3 2 3B 1 0 1 0 1C 3 1 0 2 2D 2 0 2 0 4E 3 1 2 4 0

We want to use the co-occuerenceinformation to produce the word vectors, so,we use this function for pair of words

Our target is to minimize this objectivefunction all over the corpus words

Where,

• Pros- Statistical approach- Combines statistics methos and skip-gram model to produce the word vector

Sentence Representation

• Given a context of words, the target is tovectorize the whole context to a vectorspace model

Sentence Representation(Cont.)

• Bag of Words Model (BoW)

• Depends on term-frequency in the context

• The vector dimensions varies according tothe size of corpus’s unique words

Corpus:Sentence 1: “The cat sat on the hat”Sentence 2: “The dog ate the cat and thehat”

Unique words:{the, cat, sat, on, hat, dog, ate, and}

Sentence 1: [2, 1, 1, 1, 1, 0, 0, 0]Sentence 2: [3, 1, 0, 0, 1, 1, 1, 1]

• Pros- Easy to understand and implement

• Cons- Depends on vocabulary size (Memoryissues)- Doesn’t represent the semanticrepresentation of words

Sentence Embedding

• A vector space model

• Each document has a fixed-size uniquevector representation

• It shows the semantic identity betweendocuments

Sentence Embedding (Cont.)

• Doc2Vec (Mikolov, 2014)

• Predicting words that follows the documentsemantic.

• Group of models that are about usingshallow Neural Networks

• We will talk about (PV-DM) and (PV-DBOW) models

• Distributed Memory Model of ParagraphVector (PV-DM)

• Input is a paragraph matrix and contextwords and output is a predicted word giventhe paragraph and context

• Distributed Bag of Words of ParagraphVector (PV-DBOW)

• Input is a paragraph and output is group ofwords that maybe occured if thatparagraph vector is given (extractingkeywords)

Sentence Embedding (Cont.)• Fixed-Size Ordinally-Forgetting Encoding (FOFE) (2015)

• Produces a fixed-size of vector dimension given avocublary

• The produced vectors are mostly unique, that helps todifferes them in semantic way

• Used mainly in NLP Language Modeling task usingregular Neural Network

Vocabulary{A, B, C}

InitiallyA = [1, 0, 0]B = [0, 1, 0]C = [0, 0, 1]

Sentence 1: {ABC}Sentence 2: {ABCBC}

TargetGet a fixed-size vector that represents eachsentence

Encoding Function

where,

A = [1, 0, 0], B = [0, 1, 0], C = [0, 0, 1]Sentence 1: {ABC},

Sentence 2: {ABCBC}

• Deep Sentence Embedding Using LongShort-Term Memory Networks

References

• https://en.wikipedia.org/wiki/Vector_space_model

• https://en.wikipedia.org/wiki/One-hot• https://arxiv.org/pdf/1301.3781v3.pdf• http://www-

personal.umich.edu/~ronxin/pdf/w2vexp.pdf

• https://www.tensorflow.org/extras/candi-date_sampling.pdf

• http://www-nlp.stanford.edu/pubs/glove.pdf

References (Cont.)

• http://www.foldl.me/2014/glove-python/• https://en.wikipedia.org/wiki/Bag-of-

words_model• https://iksinc.wordpress.com/2015/06/23/h

ow-to-use-words-co-occurrence-statistics-to-map-words-to-vectors/

• https://arxiv.org/pdf/1502.06922.pdf• http://cs.stanford.edu/~quocle/paragraph_v

ector.pdf• http://www.aclweb.org/anthology/P15-2081

Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Documents

Transcript of Text Representation & Fixed-Size Ordinally-Forgetting Encoding Approach

Forgetting, Non-Forgetting and Quasi-Forgetting: Public Policy and Corporate Practice Colin J. Bennett, Adam Molnar and Christopher Parsons Department.

Forgetting. Encoding Failure Encoding failure Encoding Failure Encoding failure.

REMEMBERING & FORGETTING

Forgetting Theory

Forgetting evaluation

Forgetting and Memory Construction. Information Processing Model Encoding – process of getting information into the memory system Storage - retention.

Forgetting Theories Encoding failure Encoding failure Role of time Role of time Interference theories Interference theories.

(Spiral) Forgetting

Forgetting Jesus

Resistance to forgetting associated with hippocampus ...€¦ · memories depends on the hippocampus1,2, successful encoding alone ... associated with a potential monetary reward.

Forgetting the past

Theories Of The Spacing Effect - University of California ...tdlc.ucsd.edu/events/sfi/Mike_Mozer_Lecture.pdf · Theories Of The Spacing Effect ... Encoding Variability Explains Forgetting

Ordinally Scale Variables

Ordinally Bayesian Incentive Compatible Stable Matching · Ordinally Bayesian Incentive Compatible Stable Matching Dipjyoti Majumdar1 Décembre 2003 Cahier n° 2003-028 Résumé:

Forgetting Gallipoli

Forgetting intro 2009

Forgetting Aim Identify explanations of forgetting

Retrieval and Forgetting AP Psychology. Forgetting An inability to retrieve information due to poor encoding, storage, or retrieval. Biological Reasons.

Forgetting persentation

Forgetting, Non-Forgetting and Quasi-Forgetting: Canadian Policy and Corporate Practice