Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus...

26
Copyright © SAS Institute Inc. All rights reserved. Deep Learning for Text Analytics SAS User Group Malaysia 12 th April 2018

Transcript of Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus...

Page 1: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Deep Learning for Text AnalyticsSAS User Group Malaysia

12th April 2018

Page 2: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Using Deep Learning in Natural Language

Page 3: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

SAS in a Chatbot

Page 4: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Natural LanguageInteraction

Natural Language Processing (NLP)

Natural Language Understanding (NLU)

Natural Language Generation (NLG)

Natural Language Interaction (NLI)

Page 5: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Natural Language Processing

NLP Layer(Natural Language

Processing)

Knowledge Base

(Source Content)

Data Storage(Interaction History &

Analytics)

Page 6: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Recurrent Neural Network

Page 7: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Deep LearningRecurrent Neural Network (RNN)

• Type of Neural Network

• Recurring over time

• Sequential data• Words

• Time

• Common Methods

• GRU (Gated Recurrent Unit)

• LSTM (Long Short Term Memory)

Output

Input

Page 8: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Recurrent Neural Network (RNN)Word Vector

Unlabeled Corpus

The 15th American President

The 16th American President

The 17th American President

Alex reads this sentence

Alex read this sentence

Alex is reading this sentence

15th

17th

16th

read

reading

reads

Word Vector Algorithm

Words with similar context should have

similar vectors

Page 9: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Recurrent Neural Network (RNN)Text Classification

Text Classification

Raw Text Document

Technology

Politics

Sport

Page 10: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Recurrent Neural Network (RNN)Text Classification

Text Classification

The 16th American President

number

order

entity

context

Page 11: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Recurrent Neural Network (RNN)Text Generation

Translating vector back to

text

Convert text into vectorized

input

RNN

Calculate vector weight

Vector representing a sentence

based on the text

Use weight vector to refine model

Page 12: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Text GenerationText Structure – Word Order

Who is the 16th American President

The 16th President who is American

Page 13: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Text StructureWord Order

Two random corpus

• I don’t like this director but I like this movie

• I like this director but I don’t like this movie

• Specific words can be strong indicators

• Sentiment – boring, exciting

• Topic – deep learning, Siri

Positive sentiment

Negative sentiment

Page 14: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Creating, Training, Scoring an RNNUsing Deep Learning

Page 15: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelLoading the Action Sets

Page 16: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelThe Dataset

Page 17: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelText Classification Model

Page 18: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelTraining the Model

Page 19: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelScoring the Model

Page 20: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelScoring the Model

Page 21: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelText Generation Model

Page 22: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelTraining the Model

Page 23: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelScoring the Model

Page 24: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Sample RNN ModelText Generation Output

Page 25: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Useful Links

• What’s New In SAS Deep Learning (Documentation)

http://go.documentation.sas.com/?docsetId=casdlpg&docsetTarget=n0gv3jjm5obouun1uvducbzl8nlf.htm&docsetVersion=8.2&locale=en

• Understanding Recurrent Neural Networks

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

• RNN Simplified

https://www.youtube.com/watch?v=_aCuOwF1ZjU

Page 26: Deep Learning for Text Analytics - Sas Institute · Text Structure Word Order Two random corpus •I don’t like this director but I like this movie •I like this director but I

sas.com

Copyright © SAS Inst itute Inc. A l l r ights reserved.

Thank You