Making Deep Learning Understandable for Analyzing ... · Making Deep Learning Understandable for...
Transcript of Making Deep Learning Understandable for Analyzing ... · Making Deep Learning Understandable for...
MakingDeepLearningUnderstandableforAnalyzingSequentialDataabout
GeneRegulation
Dr.YanjunQiDepartmentofComputerScience
UniversityofVirginia
8/29/18 YanjunQi/UVACS 1
Tutorial@ACMBCB-2018
Today
• MachineLearning:aquickreview• DeepLearning:aquickreview• BackgroundBiology:aquickreview• DeepLearningforanalyzingSequentialDataaboutRegulation:
• DeepChrome• AttentiveChrome• DeepMotif
8/29/18 2
https://www.deepchrome.org
https://qdata.github.io/deep2Read/
YanjunQi/UVACS
8/29/18 YanjunQi/UVACS 3
• Biomedicine• Patient records, brain imaging, MRI & CT scans, …• Genomic sequences, bio-structure, drug effect info, …
• Science• Historical documents, scanned books, databases from
astronomy, environmental data, climate records, …
• Social media• Social interactions data, twitter, facebook records, online
reviews, …
• Business• Stock market transactions, corporate sales, airline traffic,
…
OUR DATA-RICH WORLD
Challenge of Data Explosion in Biomedicine
Molecular signatures oftumor / blood sample
Signs &Symptoms
Genetic Data
Public HealthData
Patient MedicalHistory &Demographics
Medical Images
Mobile medicalsensor data
TraditionalApproaches
Data-DrivenApproaches
MachineLearning
48/29/18 YanjunQi/UVACS
BASICS OF MACHINE LEARNING
• �The goal of machine learning is to build computer systems that can learn and adapt from their experience.� – Tom Dietterich
• �Experience� in the form of available dataexamples (also called as instances, samples)
• Available examples are described with properties (data points in feature space X)
8/29/18 YanjunQi/UVACS 5
8/29/18 YanjunQi/UVACS 6
e.g. SUPERVISED LEARNING• Find function to map input space X to output
space Y
• So that the difference between y and f(x) of each example x is small.
Ibelievethatthisbookisnotatallhelpfulsinceitdoesnotexplainthoroughlythematerial.itjustprovidesthereaderwithtablesandcalculationsthatsometimesarenoteasilyunderstood…
x
y-1
InputX:e.g.apieceofEnglishtext
OutputY:{1/Yes,-1/No}e.g.Isthisapositiveproduct review?
e.g.
SUPERVISED Linear Binary Classifier
• NowletuscheckoutaVERYSIMPLEcaseof
8/29/18 YanjunQi/UVACS 7
e.g.:Binaryy /Linearf/XasR2
f x y
f(x,w,b) = sign(wT x + b)
X =(x_1,x_2)
SUPERVISED Linear Binary Classifier
f x y
f(x,w,b) = sign(wT x + b)
wT x +b<0
CourtesyslidefromProf.AndrewMoore’stutorial
?
?
wTx +b>0
denotes +1 pointdenotes -1 pointdenotes future points
?
8/29/18 YanjunQi/UVACS 8
X =(x_1,x_2)
x_1
X_2
8/29/18 YanjunQi/UVACS 9
• Training (i.e. learning parameters w,b ) • Training set includes
• available examples' feature represenation: x1,…,xL• available corresponding labels y1,…,yL
• Find (w,b) by minimizing loss (i.e. difference between y and f(x) on available examples in training set)
(W, b) = argmin
W, b
Basic Concepts
• Testing (i.e. evaluating performance on �future�points)
• Difference between true y? and the predicted f(x?) on a set of testing examples (i.e. testing set)
• Key: example x? not in the training set
• Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”new dataexamples
8/29/18
Basic Concepts
YanjunQi/UVACS 10
8/29/18 YanjunQi/UVACS 11
• Loss function • e.g. hinge loss for binary
classification task
• Regularization • E.g. additional information addedon loss function to control f
Basic Concepts
MaximizeSeparationMargin=>Minimize
TYPICAL MACHINE LEARNING SYSTEM
8/29/18
Low-level sensing
Pre-processing
Feature Extract
Feature Select
Inference, Prediction, Recognition
Label Collection
YanjunQi/UVACS 15
Evaluation
TYPICAL MACHINE LEARNING SYSTEM
8/29/18
Low-level sensing
Pre-processing
Feature Extract
Feature Select
Inference, Prediction, Recognition
Label Collection
Data Complexity in X
Data Complexity
in Y
YanjunQi/UVACS 16
Evaluation
UNSUPERVISED LEARNING : [ COMPLEXITY OF Y ]
• No labels are provided (e.g. No Y provided)• Find patterns from unlabeled data, e.g. clustering
8/29/18
e.g.clustering=>tofind�natural� groupingofinstancesgivenun-labeleddata
YanjunQi/UVACS 17
Structured Output Prediction: [ COMPLEXITY in Y ]
• Many prediction tasks involve output labels having structured correlations or constraints among instances
8/29/18
Manymorepossible structuresbetweeny_i ,e.g.spatial,temporal, relational…
Thedogchasedthecat
APAFSVSPASGACGPECA…
TreeSequence GridStructured Dependency between Examples’ Y
Input
Output
CCEEEEECCCCCHHHCCC…
YanjunQi/UVACS 18
Original Space Feature Space
Structured Input: Kernel Methods [ COMPLEXITY OF X ]
Vectorvs.Relationaldata
e.g.Graphs,Sequences,3Dstructures,
8/29/18 YanjunQi/UVACS 19
�
�
�
More Recent: Representation Learning[ COMPLEXITY OF X ]
Deep Learning Supervised Embedding
8/29/18
Layer-wise Pretraining
YanjunQi/UVACS 20
WhentouseMachineLearning?
• 1.Extractknowledgefromdata• Relationshipsandcorrelationscanbehiddenwithinlargeamountsofdata• Theamountofknowledgeavailableaboutcertaintasksissimplytoolargeforexplicitencoding(e.g.rules)byhumans
• 2.Learntasksthataredifficulttoformalise• Hard todefinewell,exceptbyexamples(e.g.facerecognition)
• 3.Createsoftwarethatimprovesovertime• Newknowledgeisconstantlybeingdiscovered.• Ruleorhumanencoding-basedsystemisdifficulttocontinuouslyre-design�byhand�.
228/29/18 YanjunQi/UVACS
Today
• MachineLearning:aquickreview• DeepLearning:aquickreview• BackgroundBiology:aquickreview• DeepLearningforanalyzingSequentialDataaboutRegulation:
• DeepChrome• AttentiveChrome• DeepMotif
8/29/18 24
https://www.deepchrome.org
https://qdata.github.io/deep2Read/
YanjunQi/UVACS
• DeepLearning• Whyisthisabreakthrough?• Basics• History• AFewRecenttrends
8/29/18 25
https://qdata.github.io/deep2Read/
YanjunQi/UVACS
Whybreakthrough?
8/29/18 27
DeepLearning DeepReinforcementLearning
GenerativeAdversarialNetwork(GAN)
YanjunQi/UVACS
Breakthrough from 2012 Large-Scale Visual Recognition Challenge (ImageNet)
In one �very large-scale� benchmark competition(1.2 million images [X] vs.1000 different word labels [Y])
288/29/18
10%improvewithdeepCNN
YanjunQi/UVACS
DNNshelpusbuildmoreintelligentcomputers
• Perceivetheworld,• e.g.,objectiverecognition,speechrecognition,…
•Understandtheworld,• e.g.,machinetranslation,textsemanticunderstanding
• Interactwiththeworld,• e.g.,AlphaGo,AlphaZero,self-drivingcars,…
• Beingabletothink/reason,• e.g.,learntocodeprograms,learntosearchdeepNN,…
• Beingabletoimagine/tomakeanalogy,• e.g.,learntodrawwithstyles,……
8/29/18 30YanjunQi/UVACS
DeepLearningWay: LearningRepresentationfromdata
Feature Engineering ü Most critical for accuracy ü� Account for most of the computation ü �Most time-consuming in development cycle ü� Often hand-craft and task dependent in practice
Feature Learning ü Easily adaptable to new similar tasks ü Learn layerwise representation from data
318/29/18 YanjunQi/UVACS
Basics
•BasicNeuralNetwork(NN)• singleneuron,e.g.logisticregressionunit• multilayerperceptron(MLP)• variouslossfunction
• E.g.,whenformulti-classclassification,softmax layer• trainingNNwithbackprop algorithm
8/29/18 32YanjunQi/UVACS
One“Neuron”:ExpandedLogisticRegression
x1
x2
x3
Σ
+1
z
z = wT . x + b
y = sigmoid(z) =33
ez
1 + ez
p = 3
w1
w2
w3
b1SummingFunction
SigmoidFunction
Multiplybyweights
ŷ = P(Y=1|x,w)
Input x
E.g.,ManyPossibleNonlinearityFunctions(akatransferoractivationfunctions)
x w
34https://en.wikipedia.org/wiki/Activation_function#Comparison_of_activation_functions
Name Plot Equation Derivative(w.r.tx )
usuallyworksbest inpractice
ez
1 + ez
One“Neuron”:ExpandedLogisticRegression
x1
x2
x3
Σ
+1
z
z = wT . x + b
y = sigmoid(z) =35
p = 3
w1
w2
w3
b1SummingFunction
SigmoidFunction
Multiplybyweights
ŷ = P(Y=1|x,w)
Input x
=>“NeuronView”
Multi-LayerPerceptron(MLP)- (Feed-ForwardNN)
36
1st
hiddenlayer
2nd
hiddenlayer
Outputlayer
x1
x2
x3
x ŷ
3-layerMLP-NN
W1
w3
W2
Historyè Perceptron:1-NeuronUnitwithStep
−FirstproposedbyRosenblatt(1958)−Asimpleneuronthatisusedtoclassifyitsinputintooneoftwocategories.−Aperceptronusesa stepfunction
φ(z)= +1ifz ≥0
−1ifz <0⎧⎨⎩
8/29/18 37
x1
x2
x3
Σ
SummingFunction
Step Function
w1
w2
w3
+1
b1
z
Multiplybyweights
YanjunQi/UVACS
z1
z2
z3
38
x1
x2
x3
x
Σ
Σ
Σ
ŷ1
ŷ2
ŷ3
E.g.,Cross-EntropyLossforMulti-ClassClassification
“Softmax”function. Normalizingfunctionwhichconvertseachclassoutputtoaprobability.
EW (ŷ,y) = loss = - yj ln ŷjΣj = 1.. .K
= P( ŷi = 1 | x )
W1 W3
W2
ŷi
Cross-entropyloss
K = 3
“BlockView”
x
1sthidden layer
2ndhidden layer Output layer
39
*W1
*W2
*W3
z1 z2 z3h1 h2
LossModule
“Softmax”
E (ŷ,y)ŷ
TrainingNeuralNetworks
41
Howdowelearntheoptimal weightsWL forour task??● StochasticGradientdescent:
LeCunet.al.EfficientBackpropagation. 1998
WLt = WL
t-1 - ! " E" WL
Buthowdowegetgradientsoflowerlayers?● Backpropagation!
○ Repeatedapplicationofchainruleofcalculus○ Locallyminimizetheobjective○ Requiresall“blocks”ofthenetworktobedifferentiable
x ŷ
W1
w3
W2
EW (ŷ,y)
– MainIdea:errorinhidden layers
IllustratingObjectiveLossFunction(extremelysimplified)andGradientDescent(2Dcase)
8/29/18 YanjunQi/UVACS 42
EW
W1 W2
E{xi,yi}(W1, W2)
Thegradientpointsinthedirection(inthevariablespace)ofthegreatestrateofincreaseofthefunctionanditsmagnitude istheslopeofthesurfacegraphinthatdirection
ImportantBlock:ConvolutionalNeuralNetworks(CNN)
• Prof.Yann LeCun inventedCNNin1998• FirstNNsuccessfullytrainedwithmanylayers
44Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner,Gradient-basedlearningappliedtodocument recognition,ProceedingsoftheIEEE86(11):2278–2324,1998.
8/29/18 45AdaptfromFromNIPS2017DLTrendTutorial
CNNmodelsLocalityandTranslationInvariance
Makefully-connectedlayerlocally-connectedandsharing weight
YanjunQi/UVACS
• Prof.Schmidhuber invented"Longshort-termmemory”– RecurrentNN(LSTM-RNN) modelin1997
8/29/18 46
Sepp Hochreiter;JürgenSchmidhuber (1997)."Longshort-termmemory".NeuralComputation.9(8):1735–1780.
ImageCreditsfromChristopherOlah
ImportantBlock:RecurrentNeuralNetworks(RNN)
YanjunQi/UVACS
RNNmodelsdynamictemporaldependency
47
Imagecredit:wildML
• Makefully-connectedlayermodeleachunitrecurrently• Unitsformadirectedchaingraphalongasequence• Eachunitusesrecenthistoryandcurrentinputinmodeling
LSTMforMachineTranslation(GermanytoEnglish)
• DeepLearning• Whyisthisabreakthrough?• Basics• History• AFewRecenttrends
8/29/18 48
https://qdata.github.io/deep2Read/
YanjunQi/UVACS
Manyclassificationmodelsinventedsincelate80’s• Neuralnetworks• Boosting• SupportVectorMachine• MaximumEntropy• RandomForest• ……
8/29/18 49YanjunQi/UVACS
DeepLearning(CNN)inthe90’s• Prof.Yann LeCun inventedConvolutionalNeuralNetworks(CNN)in1998• FirstNNsuccessfullytrainedwithmanylayers
8/29/18 50
Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner,Gradient-basedlearningappliedtodocument recognition,ProceedingsoftheIEEE86(11):2278–2324,1998.
YanjunQi/UVACS
DeepLearning(RNN)inthe90’s
• Prof.Schmidhuber invented"Longshort-termmemory”– RecurrentNN(LSTM-RNN) modelin1997
8/29/18 51
Sepp Hochreiter;JürgenSchmidhuber (1997)."Longshort-termmemory".NeuralComputation.9(8):1735–1780.
ImageCreditsfromChristopherOlahYanjunQi/UVACS
Between~2000to~2011MachineLearningFieldInterest
• LearningwithStructures!+ConvexFormulation!• Kernellearning• ManifoldLearning• SparseLearning• Structuredinput-outputlearning…• Graphicalmodel• TransferLearning• Semi-supervised• Matrixfactorization• ……
8/29/18 52YanjunQi/UVACS
“WinterofNeuralNetworks”Since90’sto~2011
• Non-convex
• Needalotoftrickstoplaywith• Howmanylayers?• Howmanyhiddenunitsperlayer?• Whattopologyamonglayers?…….
• Hardtoperformtheoreticalanalysis
8/29/18 53YanjunQi/UVACS
Breakthrough in 2012 Large-Scale Visual Recognition Challenge (ImageNet) : Milestones in Recent Vision/AI Fields
8/29/18 YanjunQi/UVACS 54
- 2013,GoogleAcquiredDeepNeuralNetworksCompanyheadedbyUtoronto “DeepLearning”ProfessorHinton
- 2013,FacebookBuiltNewArtificialIntelligenceLabheadedbyNYU“DeepLearning”ProfessorLeCun- 2016,Google'sDeepMind defeatslegendaryGoplayerLeeSe-dol inhistoricvictory/2017AlphaZero
Reason:Plentyof(Labeled)Data
• Text:trillionsofwordsofEnglish+otherlanguages• Visual:billionsofimagesandvideos• Audio: thousandsofhoursofspeechperday• Useractivity:queries,userpageclicks,maprequests,etc,• Knowledgegraph:billionsoflabeledrelationaltriplets
• ………
8/29/18 55Dr.JeffDean’stalk
YanjunQi/UVACS
Reason:AdvancedComputerArchitecturethatfitsDNNs
8/29/18 56YanjunQi/UVACS
http://www.nvidia.com/content/events/geoInt2015/LBrown_DL.pdf
SomeRecentTrends• 1.Autoencoder/layer-wisetraining• 2.CNN /Residual/Dynamicparameter• 3.RNN /Attention /Seq2Seq,…• 4.NeuralArchitecturewithexplicitMemory• 5.NTM4programinduction/sequentialdecisions• 6.Learningtooptimize/LearningDNNarchitectures• 7.Learningtolearn/meta-learning/few-shots• 8.DNNongraphs/trees/sets• 9.DeepGenerativemodels,e.g.,autoregressive• 10.GenerativeAdversarialNetworks(GAN)• 11.Deepreinforcementlearning• 12.Validate/Evade/Test/Understand /VerifyDNNs
8/29/18 57
https://qdata.github.io/deep2Read/
YanjunQi/UVACS
Recap
8/29/18 58AdaptfromFromNIPS2017DLTrendTutorial
LearnedModels
https://qdata.github.io/deep2Read/
YanjunQi/UVACS