Deep Learning for NLP

46
Deep Learning for Natural Language Processing Bargava Subramanian @bargava Amit Kapoor @amitkaps 1

Transcript of Deep Learning for NLP

Deep Learningfor Natural Language Processing  

   

Bargava Subramanian @bargava

Amit Kapoor @amitkaps

1

Language Challenge

2

Put these adjectives in order: [adj.] + [Knife]

— old

— French

— lovely

— green

— rectangular

— whittling

— silver

— little3

Which order is correct?

lovely old silver rectangular green little French whittling knife

old lovely French rectangular green little whittling silver knife

lovely little old rectangular green French silver whittling knife

4

Grammar has rules

opinion - size - age - shape - colour - origin -material - purpose [Noun]

The right version:lovely little old rectangular green French silver whittling knife

5

We speak the grammar, yet we don't know it

6

Natural Language Problems are hard

7

Natural Language Programming Problems

— Summarization

— Text Classification (e.g. spam)

— Sentiment / Emotion Analysis

— Topic Modelling

— Recommendations

— Text Evaluation (e.g. grading)

8

Plan for this Session

— Moving beyond Statistical Learning

— Take first steps in NLP with Deep Learning

— Showcase an example

— Practical challenges to overcome

9

NLP Learning Process___

[1] Frame: Problem definition[2] Acquire: Text ingestion [3] Refine: Text wrangling[4] Transform: Feature creation [5] Explore: Feature selection [6] Model: Model selection[7] Insight: Solution communication

10

Simple Case

Demonetisation in India

11

Demonetisation in India

On Nov 8th, 2016, the National Government announced that existing INR 1000 and INR 500 notes are no longer legal.

12

13

Reactions on Twitter

People started tweeting with the tag: #demonetisation

14

[1] Frame

Create a viral tweet on #demonetisation

15

Traditional way of framing

1. Someone has to write a tweet.

2. Run it on the classifier

3. If probability is high, post it.

4. Else, goto step 1

The prediction will be a probability of a new tweet to go viral or not?

16

Generating tweets

— Can we learn from historical tweets algorithmically to generate a viral tweet?

— Not possible to do using traditional methods

17

Revised framing for Text Generation

Generate a tweet algorithmically, that is likely to go viral

18

[2] Acquire

Get the raw tweets data

19

Get Tweets on #demonetisation

Write your own twitter api client to get json file or use a python package like Tweepy, but need to manage rate limiting etc.

We used tweezer - an open source project to get twitter data

Raw dataset - 30,000+ tweets from past 1 week.

20

[3] Refine

How to categorise a tweet as viral or not?

21

Simple Approach for Labelling

IF retweets + favourites > = 100THEN Label = viralELSE Label = normal

22

Sanitizing Tweets

— Stopword

— Stemming

— Remove urls

— Remove 'RT'

— Remove '\n'

23

[4] Transform

Creating Features from Text

24

Traditional methods to covert text to numeric

— TF-IDF: Measures importance of a word in a document relative to the corpus

— Bag-of-Word: Count of occurrences of a word in a document

— n-grams: Count of every 1-word, 2-word, etc combinations in a document

— entity & POS tagging: Transform sentence to parts-of-speech, extract entities and encode

25

Challenges in traditional methods of encoding

— Sparse inputs

— Input data space explodes

— Context lost in encoding

A quiet crowd entered the historic church !=A historic crowd entered the quiet church

26

Deep Learning Approach

Low-dimensional dense vectors for representation.

— Tokenise characters (Faster)

— Tokenise words (More accurate, but needs more memory)

27

Word Embedding

— Learn high-quality word vectors

— Similar words needs to be close to each other

— Words can have multiple degrees of similarity

28

Word Embedding using word2vec

Combines two approaches

— skip-gram: Predicting word given its context

— continuous bag-of-words: Predicting context given a word

29

word2vec: Example

vec[queen] − vec[king] = vec[woman] − vec[man]

1

1 https://www.tensorflow.org/versions/r0.12/tutorials/word2vec/index.html

30

[5] Explore

Features Selection

31

Feature Selection

— Manual process in Traditional Approach

— Feature selection happens automatically in Deep Learning

32

[6] Model

Model Selection

33

Recurrent Neural Network (RNN)

— Network with loops

— Allows information to persist

— Enables connecting previous information to present task

— Context preserved

I grew up in Brazil and I speak ______________.                                                        portuguese

34

Unrolling over Time____

[1] Think sequences - in input & output    - Recognize Image -> Explain in words- Sentence(s) -> Sentiment Analysis- English - Spanish Translation- Video - task classification

35

Unrolled RNN

[2] Multiple copies of the same network[3] Each pass message to its successor

2

2 http://colah.github.io/posts/2015-08-Understanding-LSTMs/

36

Architecture Overview

37

[7] Model

Solution Communication

38

Generated Tweets

39

Deep Learning Challenges

— Data Size: RNN doesn't generalize well on small datasets

— Relevant Corpus: Required to create domain specific word embedding

— Deeper Networks: Empirically deeper networks have better accuracy

— Training Time: RNNs take a long time to learn.

40

Use case: Chat Bots

— Bookings

— Customer Support

— Help Desk Automation

— ...

41

Tools to get started: Software

Python Stack- Use spacy for NLP preprocessing- Use gensim for word2vec training- Start with keras- Have tensorflow as backend

Use pre-trained models like word2vec for word embedding and similarly for RNNs

42

Tools to get started: Hardware

Work on GPUs - Nvidia TitanX (suitable for consumers)- Tesla K80 (suitable for professionals)

For detailed hardware choices:http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/

43

Closing thoughts

44

Reference: Deep Learning for NLP

Notebooks and Material @ https://github.com/rouseguy/DeepLearningNLP_Py

- What is deep learning?- Motivation: Some use cases- Building blocks of Neural Networks (Neuron, Activation Function)- Backpropagation Algorithm- Word Embedding- word2vec- Introduction to keras- Multi-layer perceptron- Convolutional Neural Network- Recurrent Neural Network- Challenges in Deep Learning

45

Contact___

Bargava Subramanian @bargava

Amit Kapoor @amitkapsamitkaps.com

46