NLP with H2O

38
Michal Kurka Megan Kurka

Transcript of NLP with H2O

Page 1: NLP with H2O

MichalKurkaMeganKurka

Page 2: NLP with H2O

Agenda

Page 3: NLP with H2O

Today’s Talk

• WhatisH2O?• H2OinR

H2OOverview

• Word2VecAlgorithm• ConvertingTexttoVectors:AnExample

NaturalLanguageProcessing

• IntroductiontoMachineLearning• OverviewofSupervisedLearningAlgorithms• TrainingModelsinRandFlow• Hyper-ParameterTuning

SupervisedLearning

Page 4: NLP with H2O

HighLevel Architecture

Page 5: NLP with H2O

H2O inR

Page 6: NLP with H2O

UsingH2OwithR

• RestAPIdrivesH2OfromR

• Allcomputationsareperformedinperformance-optimizedJavacodeintheH2Ocluster

Page 7: NLP with H2O

ReadingData intoH2OwithR

Page 8: NLP with H2O

ReadingDatafromHDFS intoH2OwithR

Page 9: NLP with H2O

ReadingDatafromHDFS intoH2OwithR

Page 10: NLP with H2O

Natural LanguageProcessing

Page 11: NLP with H2O

Word2Vec

• Learnsvectorrepresentationsofwordsbyanalyzinglargetextcorpus• WordEmbeddings=mappingofwordstovectorsfromahighdimensionalspace(100-1000)• Textsources:GoogleNews,Wikipedia,Tweets,…

• Embeddingscapturemeaningoftheword• Semanticallysimilarwordsareclosetoeachother• Canbeusedtofindsynonyms&analogies• Famousexample:

» manistowomanaskingisto??? (queen)» Vec(king)– Vec(man)+Vec(woman)=Vec(queen)

• Differentvariationsofword2vec• Skip-Gram,CBOW• HierarchicalSoftmax,NegativeSampling

car

alien-0.09 -0.14 -0.12 -0.06 0.16

-0.38 -0.11 0.10 -0.26 -0.24

Page 12: NLP with H2O

Word2Vec

Page 13: NLP with H2O

Word2Vec

Source:twitter.com/DanilBaibak/status/844647217885581312

Page 14: NLP with H2O

Word2VecAlgorithm

InputLayer

HiddenLayer

OutputLayer

• word2vectrainsaNeuralNetwork• Singlehiddenlayer• Numberofneuronsinhiddenlayer=lengthofembeddings

• Usesatricktoformulatetheproblemasasupervisedproblem• Thenetworkistrainedtopredicttargetwordsbasedongiven

inputwords(windowslidingoverthetext)• Theweightsontheedgesconnectinginputlayertothe

hiddenlayerconstitutethewordembedding(weightmatrix)• Similarapproachtoauto-encoders

• Optimizations• HierarchicalSoftmax – binarytreetorepresentoutput

probabilities(implementedinH2O)• NegativeSampling– sampletheoutputvectorstobeupdated• HogWild!- lock-freeparallelization(ignoresconflicts)

Page 15: NLP with H2O

Word2VecAlgorithm

”Aseeminglyindestructiblehumanoidcyborg issentfrom2029to1984toassassinateawaitress,whoseunbornsonwillleadhumanityinawaragainstthemachines,whileasoldierfromthatwarissenttoprotectheratallcosts.”

Predict:indestructible andcyborg

humanoid

indestructible

cyborg

InputLayer

HiddenLayer

OutputLayer• H2OsupportsSkip-Gramarchitecture

• Networktriestopredictsurroundingwordsbased(context)onagiveninputword

• Ittreatseachcontext-targetpairasanewobservation

• Skip-Gramworkswellwithlargertrainingdata• Representswellevenrarewordsandphrases

• CBOWsupportisplanned

Given:humanoid

Page 16: NLP with H2O

Word2Vec- Usage

• word2vecembeddings aretypicallyusedinapre-processingstepofanotherMLalgo• weneedtoaggregatethewordembeddings forwordsequencesofvariablelength(sentences,

paragraphs,...)• pooling

• averaging(aka“meanpooling”)• (TF-IDF)weightedaveraging

• Notgoodforusecaseswhenorderofwordsisimportant(SentimentAnalysis)• http://jxieeducation.com/static/research/documentembedding_poster.pdf

• PCA&FisherVectors:http://www.cs.tau.ac.il/~wolf/papers/qagg.pdf• simplyconcatenateembeddings

• forshortinputs,eg.jobtitles• useanextensionofword2vec

• ParagraphVectors/doc2vec• Adds“paragraphtoken”asaninputofthetrainingprocess,itactsasamemorythatrememberswhat

ismissingfromthecurrentcontext– orthetopicoftheparagraph• https://cs.stanford.edu/~quocle/paragraph_vector.pdf

Page 17: NLP with H2O

Word2Vec inanExampleMLWorkf low

1. word2vecworkflow1. Tokenizeplots:breakupplotsintoseparatewords2. Filterwords:removestopwordslike“and”,“the”,”of”;removetooshortwords3. Trainaword2vecmodel4. Sanitycheck:findsynonymsbasedonthewordembeddings5. (optional)Evaluatesemanticaccuracyofthemodel

• Butthebestmeasureisalwaystheperformanceoftheoverallproblem(howwellwepredictthegenres)

2. Usemodeltotransformplotstovectors3. Useplotvectorrepresentationswithasupervisedmachinelearningalgorithmtopredictthegenre

UseCase:Predictmoviegenrefromamoviesynopsis/plotsummary.

Page 18: NLP with H2O

Word2Vec inH2O– KeyFunct ions

# Break job titles into sequence of wordswords <- h2o.tokenize(movies$plot, “\\\\W+”)

# Build word2vec modelw2v_model <- h2o.word2vec(words, epochs = 10)

# Sanity check - find synonyms for the word 'teacher'h2o.findSynonyms(w2v_model, "teacher", count = 5)

# Transform words into embeddings and aggregate for each plotplot_vecs <- h2o.transform(w2v_model, words, aggregate = “AVERAGE”)

Page 19: NLP with H2O

MachineLearning

Page 20: NLP with H2O

RuleBasedModel

MoviePlots

• Dog• Magic• Princess

• Alien• Future• Scientist

IfPlotContains

Sci-Fi

Family

Page 21: NLP with H2O

MachineLearningModel

0

50

100

Sci-Fi

Family

Page 22: NLP with H2O

MachineLearningModel

MoviePlotsSci-Fi

Family

Page 23: NLP with H2O

SupervisedLearning

Page 24: NLP with H2O

SupervisedLearning

0

50

100

Regression

Classification

Howmuchwillamoviecosttomake?

Whatisthegenreofamovie?IsitaComedy?Drama?Action?

Page 25: NLP with H2O

Overfi t t ing

FindingtheSignalintheData MemorizingtheData

Page 26: NLP with H2O

DemoTime

Page 27: NLP with H2O

Establ ishaBasel ine

print(”Build Gradient Boosted Machine with default parameters")

gbm_model <- h2o.gbm(x = myX, y = "genre", training_frame = train, validation_frame = test)

InR:

InFlow:

Page 28: NLP with H2O

PerformanceMetr ics in F low

HitRatioScoringHistory

Page 29: NLP with H2O

PerformanceMetr ics in F low

ConfusionMatrixonValidationData

Page 30: NLP with H2O

ExaminingResults

WeseethatthemodelpredictedalotofmoviesthatwereThrillerasDrama.

Why?

PlotofaDramafromtheTrainingData:"CordeliaGrayisthereluctantownerofaramshackleinvestigationagencyfollowingthesuicideofherboss.Watchingoverherasshehuntsdowncluesinthemurkyandsinisterworldofcrime,isherstraight-lacedandintuitiveofficeassistantEdithSparshott."

Page 31: NLP with H2O

DemoTime

Page 32: NLP with H2O

AddingNewFeatures

Features MeanPerClassError• Pre-trainedWordEmbeddings 0.637

• Pre-trainedWordEmbeddings• H2OWordEmbeddings

0.542

Page 33: NLP with H2O

Examine Results

VariableImportance WordswithHighC58

• spirituals

• percussion

• philharmonic

• orchestral

• concerto

• sonata

• hadyn

Page 34: NLP with H2O

Examine Results

VariableImportance WordswithHighC74

• einsteins

• patrolmen

• upperclassman

• speakeasies

• razzle

Page 35: NLP with H2O

Early Stopping

Earlystoppingoncethevalidationmeanperclasserrordoesn’timprovebyatleast0.01%for5consecutivescoringevents

• stopping_rounds = 5• stopping_tolerance = 1e-4• stopping_metric = “mean_per_class_error”

EarlyStopping

Overfitting

Page 36: NLP with H2O

ExamineResults

Model MeanPerClassError• Pre-trainedWordEmbeddings 0.637

• Pre-trainedWordEmbeddings• H2OWordEmbeddings

0.542

• Pre-trainedWordEmbeddings• H2OWordEmbeddings• EarlyStopping

0.503

Page 37: NLP with H2O

GridSearch

grid <- h2o.grid(hyper_params = list(max_depth = c(1:20)), search_criteria = list(strategy = "RandomDiscrete", max_runtime_secs = 3600),algorithm = "gbm“, ..)

InR:

InFlow:

Page 38: NLP with H2O

Questions?