Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning for NLP
-
Upload
amit-kapoor -
Category
Data & Analytics
-
view
497 -
download
0
Transcript of Deep Learning for NLP
Deep Learningfor Natural Language Processing
Bargava Subramanian @bargava
Amit Kapoor @amitkaps
1
Put these adjectives in order: [adj.] + [Knife]
— old
— French
— lovely
— green
— rectangular
— whittling
— silver
— little3
Which order is correct?
lovely old silver rectangular green little French whittling knife
old lovely French rectangular green little whittling silver knife
lovely little old rectangular green French silver whittling knife
4
Grammar has rules
opinion - size - age - shape - colour - origin -material - purpose [Noun]
The right version:lovely little old rectangular green French silver whittling knife
5
Natural Language Programming Problems
— Summarization
— Text Classification (e.g. spam)
— Sentiment / Emotion Analysis
— Topic Modelling
— Recommendations
— Text Evaluation (e.g. grading)
8
Plan for this Session
— Moving beyond Statistical Learning
— Take first steps in NLP with Deep Learning
— Showcase an example
— Practical challenges to overcome
9
NLP Learning Process___
[1] Frame: Problem definition[2] Acquire: Text ingestion [3] Refine: Text wrangling[4] Transform: Feature creation [5] Explore: Feature selection [6] Model: Model selection[7] Insight: Solution communication
10
Demonetisation in India
On Nov 8th, 2016, the National Government announced that existing INR 1000 and INR 500 notes are no longer legal.
12
Traditional way of framing
1. Someone has to write a tweet.
2. Run it on the classifier
3. If probability is high, post it.
4. Else, goto step 1
The prediction will be a probability of a new tweet to go viral or not?
16
Generating tweets
— Can we learn from historical tweets algorithmically to generate a viral tweet?
— Not possible to do using traditional methods
17
Get Tweets on #demonetisation
Write your own twitter api client to get json file or use a python package like Tweepy, but need to manage rate limiting etc.
We used tweezer - an open source project to get twitter data
Raw dataset - 30,000+ tweets from past 1 week.
20
Simple Approach for Labelling
IF retweets + favourites > = 100THEN Label = viralELSE Label = normal
22
Traditional methods to covert text to numeric
— TF-IDF: Measures importance of a word in a document relative to the corpus
— Bag-of-Word: Count of occurrences of a word in a document
— n-grams: Count of every 1-word, 2-word, etc combinations in a document
— entity & POS tagging: Transform sentence to parts-of-speech, extract entities and encode
25
Challenges in traditional methods of encoding
— Sparse inputs
— Input data space explodes
— Context lost in encoding
A quiet crowd entered the historic church !=A historic crowd entered the quiet church
26
Deep Learning Approach
Low-dimensional dense vectors for representation.
— Tokenise characters (Faster)
— Tokenise words (More accurate, but needs more memory)
27
Word Embedding
— Learn high-quality word vectors
— Similar words needs to be close to each other
— Words can have multiple degrees of similarity
28
Word Embedding using word2vec
Combines two approaches
— skip-gram: Predicting word given its context
— continuous bag-of-words: Predicting context given a word
29
word2vec: Example
vec[queen] − vec[king] = vec[woman] − vec[man]
1
1 https://www.tensorflow.org/versions/r0.12/tutorials/word2vec/index.html
30
Feature Selection
— Manual process in Traditional Approach
— Feature selection happens automatically in Deep Learning
32
Recurrent Neural Network (RNN)
— Network with loops
— Allows information to persist
— Enables connecting previous information to present task
— Context preserved
I grew up in Brazil and I speak ______________. portuguese
34
Unrolling over Time____
[1] Think sequences - in input & output - Recognize Image -> Explain in words- Sentence(s) -> Sentiment Analysis- English - Spanish Translation- Video - task classification
35
Unrolled RNN
[2] Multiple copies of the same network[3] Each pass message to its successor
2
2 http://colah.github.io/posts/2015-08-Understanding-LSTMs/
36
Deep Learning Challenges
— Data Size: RNN doesn't generalize well on small datasets
— Relevant Corpus: Required to create domain specific word embedding
— Deeper Networks: Empirically deeper networks have better accuracy
— Training Time: RNNs take a long time to learn.
40
Tools to get started: Software
Python Stack- Use spacy for NLP preprocessing- Use gensim for word2vec training- Start with keras- Have tensorflow as backend
Use pre-trained models like word2vec for word embedding and similarly for RNNs
42
Tools to get started: Hardware
Work on GPUs - Nvidia TitanX (suitable for consumers)- Tesla K80 (suitable for professionals)
For detailed hardware choices:http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/
43
Reference: Deep Learning for NLP
Notebooks and Material @ https://github.com/rouseguy/DeepLearningNLP_Py
- What is deep learning?- Motivation: Some use cases- Building blocks of Neural Networks (Neuron, Activation Function)- Backpropagation Algorithm- Word Embedding- word2vec- Introduction to keras- Multi-layer perceptron- Convolutional Neural Network- Recurrent Neural Network- Challenges in Deep Learning
45
Contact___
Bargava Subramanian @bargava
Amit Kapoor @amitkapsamitkaps.com
46