How to train confident neural networksalinlab.kaist.ac.kr/resource/iclr_2018_confident...Persian cat...
Transcript of How to train confident neural networksalinlab.kaist.ac.kr/resource/iclr_2018_confident...Persian cat...
AlgorithmicIntelligenceLab
Outline
• Introduction• Predictiveuncertaintyofdeepneuralnetworks• Summaryofcontributions
• Howtotrainconfidentneuralnetworks• TrainingConfidence-CalibratedClassifiersforDetectingOut-of-DistributionSamples[Lee’18a]
• Applications• Hierarchicalnoveltydetection[Lee’18b]
• Conclusion• Futurework
1
[Lee’18a]Lee,K.,Lee,H.,Lee,K.andShin,J.TrainingConfidence-calibratedClassifiersforDetectingOut-of-DistributionSamples. InICLR,2018.[Lee’18b]Lee,K.,Lee,Min.K,Zhang,Y.Shin.J,Lee,H.HierarchicalNoveltyDetectionforVisualObjectRecognition,InCVPR,2018.
AlgorithmicIntelligenceLab
• Supervisedlearning(e.g.,regressionandclassification)• Objective:findinganunknowntargetdistribution,i.e.,P(Y|X)
• Recentadvancesindeeplearninghavedramaticallyimprovedaccuracyonseveralsupervisedlearningtasks
Introduction:Predictiveuncertaintyofdeepneuralnetworks(DNNs)
2
[Amodei’16]Amodei,D.,Ananthanarayanan,S.,Anubhai,R.,Bai,J.,Battenberg,E.,Case,C.,Casper,J.,Catanzaro,B.,Cheng,Q.,Chen,G.andChen,J.Deepspeech2:End-to-endspeechrecognitioninenglish andmandarin.In ICML,2016.[He’16]He,K.,Zhang,X.,Ren,S.andSun,J.Deepresiduallearningforimagerecognition.In CVPR,2016.[Hershey’17]Hershey,S.,Chaudhuri,S.,Ellis,D.P.,Gemmeke,J.F.,Jansen,A.,Moore,R.C.,Plakal,M.,Platt,D.,Saurous,R.A.,Seybold,B.andSlaney,M.CNNarchitecturesforlarge-scaleaudioclassification.In ICASSP,2017.[Girshick’15]Girshick,Ross.Fastr-cnn.InICCV,pp.1440–1448,2015
Inputspace Outputspace
Objectivedetection[Girshick’15]
Speechrecognition[Amodei’16]
Imageclassification[He’16]
Audiorecognition[Hershey’17]
AlgorithmicIntelligenceLab
• UncertaintyofpredictivedistributionisimportantinDNN’sapplications• Whatispredictiveuncertainty?
• Asaexample,considerclassificationtask
• Itrepresentsaconfidenceaboutprediction!• Forexample,itcanbemeasuredasfollows:
• Entropyofpredictivedistribution[Lakshminarayanan’17]
• Maximumvalueofpredictivedistribution[Hendrycks’17]
Introduction:Predictiveuncertaintyofdeepneuralnetworks(DNNs)
3[Lakshminarayanan’17]Lakshminarayanan,B.,Pritzel,A.andBlundell,C.,Simpleandscalablepredictiveuncertaintyestimationusingdeepensembles.In NIPS,2017.[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.
Persiancat
tigercat
0.120.18
Persiancat
dog
0.99
AlgorithmicIntelligenceLab
• Predictiveuncertaintyisrelatedtomanymachinelearningproblems:
• PredictiveuncertaintyisalsoindispensablewhendeployingDNNsinreal-worldsystems[Dario’16]
Introduction:Predictiveuncertaintyofdeepneuralnetworks(DNNs)
4
Autonomousdrive Secureauthenticationsystem[Dario’16]DarioAmodei,ChrisOlah,JacobSteinhardt,PaulChristiano,JohnSchulman,andDanMane.Concreteproblemsinaisafety.arXivpreprintarXiv:1606.06565,2016.[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Guo’17]Guo,C.,Pleiss,G.,Sun,Y.andWeinberger,K.Q.,2017.OnCalibrationofModernNeuralNetworks. InICML2017.[Goodfellow’14]Goodfellow,I.J.,Shlens,J.andSzegedy,C.,2014.Explainingandharnessingadversarialexamples. arXivpreprintarXiv:1412.6572.[Srivastava’14]Srivastava,N.,Hinton,G.E.,Krizhevsky,A.,Sutskever,I.andSalakhutdinov,R.,Dropout:asimplewaytopreventneuralnetworksfromoverfitting.JMLR.2014.
Noveltydetection[Hendrycks’17]
Adversarialdetection[Song’18]
Ensemblelearning[Lee’17]
AlgorithmicIntelligenceLab
• However,DNNsdonotcapturetheirpredictiveuncertainty
• E.g.,DNNstrainedtoclassifyMNISTimagesoftenproducehighconfidentprobability91%evenforrandomnoise[Henderycks’17]
• Challengearisesinimprovingthequalityofthepredictiveuncertainty!
• Maintopicofthispresentation• Howtotrainconfidentneuralnetworks?
• Trainingconfidence-calibratedclassifiersfordetectingout-of-distributionsamples[Lee’18a]
• Applications• Confidentmultiplechoicelearning [Lee’17]• Hierarchicalnoveltydetection[Lee’18b]
Introduction:Predictiveuncertaintyofdeepneuralnetworks(DNNs)
5
Unknownimage Cat TrainDog
99%
[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Lee’18a]Lee,K.,Lee,H.,Lee,K.andShin,J.TrainingConfidence-calibratedClassifiersforDetectingOut-of-DistributionSamples. InICLR2018.[Lee’17]Lee,K.,Hwang,C.,Park,K.andShin,J.ConfidentMultipleChoiceLearning. InICML,2017.[Lee’18b]Lee,K.,Lee,Min.K,Zhang,Y.Shin.J,Lee,H.HierarchicalNoveltyDetectionforVisualObjectRecognition,InCVPR,2018.
AlgorithmicIntelligenceLab
Outline
• Introduction• Predictiveuncertaintyofdeepneuralnetworks• Summaryofcontributions
• Howtotrainconfidentneuralnetworks• TrainingConfidence-CalibratedClassifiersforDetectingOut-of-DistributionSamples[Lee’18a]
• Applications• ConfidentMultipleChoiceLearning[Lee’17]• Hierarchicalnoveltydetection[Lee’18b]
• Conclusion• Futurework
6
[Lee’18a]Lee,K.,Lee,H.,Lee,K.andShin,J.TrainingConfidence-calibratedClassifiersforDetectingOut-of-DistributionSamples. InICLR,2018.[Lee’17]Lee,K.,Hwang,C.,Park,K.andShin,J.ConfidentMultipleChoiceLearning. InICML,2017.[Lee’18b]Lee,K.,Lee,Min.K,Zhang,Y.Shin.J,Lee,H.HierarchicalNoveltyDetectionforVisualObjectRecognition,InCVPR,2018.
AlgorithmicIntelligenceLab
• Relatedproblem• Detectingout-of-distribution[Hendrycks’17,Liang’18]
• Detectwhetheratestsampleisfromin-distribution(i.e.,trainingdistributionbyclassifier)orout-of-distribution
HowtoTrainConfidentNeuralNetworks?
7[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Liang’18]Liang,S.,Li,Y.andSrikant,R.PrincipledDetectionofOut-of-DistributionExamplesinNeuralNetworks. InICLR,2018.
AlgorithmicIntelligenceLab
• Relatedproblem• Detectingout-of-distribution[Hendrycks’17,Liang’18]
• Detectwhetheratestsampleisfromin-distribution(i.e.,trainingdistributionbyclassifier)orout-of-distribution
• E.g.,imageclassification• Assumeaclassifiertrainshandwrittendigits(denotedasin-distribution)• Detectingout-of-distribution
• Performanceofdetectorreflectsconfidenceofpredictivedistribution!
HowtoTrainConfidentNeuralNetworks?
8
In-distribution Out-of-distribution
Predictivedist.
Data
[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Liang’18]Liang,S.,Li,Y.andSrikant,R.PrincipledDetectionofOut-of-DistributionExamplesinNeuralNetworks. InICLR,2018.
AlgorithmicIntelligenceLab
• Threshold-basedDetector[Guo’17,Hendrycks’17,Liang’18]
RelatedWork
9
[Input] [Classifier]
score10
Ifscore>𝜖:In-distribution
Else:out-of-distribution
[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Guo’17]Guo,C.,Pleiss,G.,Sun,Y.andWeinberger,K.Q.,2017.OnCalibrationofModernNeuralNetworks. InICML2017.[Liang’18]Liang,S.,Li,Y.andSrikant,R.,2017.PrincipledDetectionofOut-of-DistributionExamplesinNeuralNetworks. InICLR,2018.
AlgorithmicIntelligenceLab
• Threshold-basedDetector[Guo’17,Hendrycks’17,Liang’18]
• Howtodefinethescore?• Baselinedetector[Hendrycks’17]
• Confidencescore=maximumvalueofpredictivedistribution:
• Temperaturescaling[Guo’17]
• Confidencescore=maximumvalueofscaledpredictivedistribution
RelatedWork
10
[Input] [Classifier]
score10
Ifscore>𝜖:In-distribution
Else:out-of-distribution
[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Guo’17]Guo,C.,Pleiss,G.,Sun,Y.andWeinberger,K.Q.,2017.OnCalibrationofModernNeuralNetworks. InICML2017.[Liang’18]Liang,S.,Li,Y.andSrikant,R.,2017.PrincipledDetectionofOut-of-DistributionExamplesinNeuralNetworks. InICLR,2018.
Outputofneuralnetworks
AlgorithmicIntelligenceLab
• Threshold-basedDetector[Guo’17,Hendrycks’17,Liang’18]
• Howtodefinethescore?• Baselinedetector[Hendrycks’17]
• Confidencescore=maximumvalueofpredictivedistribution:
• Temperaturescaling[Guo’17]
• Confidencescore=maximumvalueofscaledpredictivedistribution
• Limitations• Performanceofpriorworkshighlydependsonhowtotraintheclassifiers
RelatedWork
11
[Input] [Classifier]
score10
Ifscore>𝜖:In-distribution
Else:out-of-distribution
[Henderycks’17]Hendrycks,D.andGimpel,K.,Abaselinefordetectingmisclassifiedandout-of-distributionexamplesinneuralnetworks. InICLR2017.[Guo’17]Guo,C.,Pleiss,G.,Sun,Y.andWeinberger,K.Q.,2017.OnCalibrationofModernNeuralNetworks. InICML2017.[Liang’18]Liang,S.,Li,Y.andSrikant,R.,2017.PrincipledDetectionofOut-of-DistributionExamplesinNeuralNetworks. InICLR,2018.
AlgorithmicIntelligenceLab
• Onecanconsider• Bayesianneuralnetworks[Yingzhen’17]
• Trainingorinferringthosemodelsarecomputationallyexpensive
OurContributions
12
• Ensembleofclassifiers[Balaji’17]
[Yingzhen’17]Yingzhen LiandYarin Gal.Dropoutinferenceinbayesian neuralnetworkswithalpha-divergences.InInternationalConferenceonMachineLearning(ICML),2017.[Balaji’17]Balaji Lakshminarayanan,AlexanderPritzel,andCharlesBlundell.Simpleandscalablepredictiveuncertaintyestimationusingdeepensembles.InAdvancesinneuralinformationprocessingsystems(NIPS),2017.
AlgorithmicIntelligenceLab
• Onecanconsider• Bayesianneuralnetworks[Yingzhen’17]
• Trainingorinferringthosemodelsarecomputationallyexpensive
• Ourcontribution
OurContributions
13
Confidencelossfortrainingmoreplausible
simpleDNNs
GANforgeneratingout-of-distributionsamples
JointtrainingmethodofclassifierandGAN
• Ensembleofclassifiers[Balaji’17]
[Yingzhen’17]Yingzhen LiandYarin Gal.Dropoutinferenceinbayesian neuralnetworkswithalpha-divergences.InInternationalConferenceonMachineLearning(ICML),2017.[Balaji’17]Balaji Lakshminarayanan,AlexanderPritzel,andCharlesBlundell.Simpleandscalablepredictiveuncertaintyestimationusingdeepensembles.InAdvancesinneuralinformationprocessingsystems(NIPS),2017.
AlgorithmicIntelligenceLab
• Onecanconsider• Bayesianneuralnetworks[Yingzhen’17]
• Trainingorinferringthosemodelsarecomputationallyexpensive
• Ourcontribution
• Experimentalresults• Ourmethoddrasticallyimprovesthedetectionperformance• E.g.,VGGNet trainedbyourmethodimprovesTPRcomparedtothebaseline:14.0%à39.1%and46.3%à 98.9%onCIFAR-10andSVHN
OurContributions
14
Confidencelossfortrainingmoreplausible
simpleDNNs
GANforgeneratingout-of-distributionsamples
JointtrainingmethodofclassifierandGAN
• Ensembleofclassifiers[Balaji’17]
[Yingzhen’17]Yingzhen LiandYarin Gal.Dropoutinferenceinbayesian neuralnetworkswithalpha-divergences.InInternationalConferenceonMachineLearning(ICML),2017.[Balaji’17]Balaji Lakshminarayanan,AlexanderPritzel,andCharlesBlundell.Simpleandscalablepredictiveuncertaintyestimationusingdeepensembles.InAdvancesinneuralinformationprocessingsystems(NIPS),2017.
AlgorithmicIntelligenceLab
• Confidentloss• MinimizetheKLdivergenceondatafromout-of-distribution
• Interpretation• Assigninghighermaximumpredictionvaluestoin-distributionsamplesthanout-of-distributionones
Contribution1:ConfidentLoss
15
Datafromin-dist Datafromout-of-dist
Datadistribution Uniformdistribution
“Zeroconfidence”
AlgorithmicIntelligenceLab
• Confidentloss• MinimizetheKLdivergenceondatafromout-of-distribution
• Interpretation• Assigninghighermaximumpredictionvaluestoin-distributionsamplesthanout-of-distributionones
• Effectsofconfidenceloss• FractionofthemaximumpredictionvaluefromsimpleCNNs(2Conv +3FC)
Contribution1:ConfidentLoss
16
Datafromin-dist Datafromout-of-dist
CIFAR-10SVHN TinyImageNet LSUN
AlgorithmicIntelligenceLab
• Confidentloss• MinimizetheKLdivergenceondatafromout-of-distribution
• Interpretation• Assigninghighermaximumpredictionvaluestoin-distributionsamplesthanout-of-distributionones
• Effectsofconfidenceloss• FractionofthemaximumpredictionvaluefromsimpleCNNs(2Conv +3FC)• In-distribution:SVHN
Contribution1:ConfidentLoss
17
Datafromin-dist Datafromout-of-dist
AlgorithmicIntelligenceLab
• Confidentloss• MinimizetheKLdivergenceondatafromout-of-distribution
• Interpretation• Assigninghighermaximumpredictionvaluestoin-distributionsamplesthanout-of-distributionones
• Effectsofconfidenceloss• FractionofthemaximumpredictionvaluefromsimpleCNNs(2Conv +3FC)• KLdivergencetermisoptimizedusingCIFAR-10trainingdata
Contribution1:ConfidentLoss
18
Datafromin-dist Datafromout-of-dist
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
Contribution2.GANforGeneratingOut-of-DistributionSamples
19
Datafromout-of-dist
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
• Thenumberofout-of-distributionsamplesmightbealmostinfinitetocovertheentirespace
Contribution2.GANforGeneratingOut-of-DistributionSamples
20
Datafromout-of-dist
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
• Thenumberofout-of-distributionsamplesmightbealmostinfinitetocovertheentirespace
Contribution2.GANforGeneratingOut-of-DistributionSamples
21
Datafromout-of-dist
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
• Thenumberofout-of-distributionsamplesmightbealmostinfinitetocovertheentirespace
• Ourintuition• Samplesclosetoin-distributioncouldbemoreeffectiveinimprovingthedetectionperformance
Contribution2.GANforGeneratingOut-of-DistributionSamples
22
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
• Thenumberofout-of-distributionsamplesmightbealmostinfinitetocovertheentirespace
• Ourintuition• Samplesclosetoin-distributioncouldbemoreeffectiveinimprovingthedetectionperformance
Contribution2.GANforGeneratingOut-of-DistributionSamples
23
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
• Thenumberofout-of-distributionsamplesmightbealmostinfinitetocovertheentirespace
• Ourintuition• Samplesclosetoin-distributioncouldbemoreeffectiveinimprovingthedetectionperformance
Contribution2.GANforGeneratingOut-of-DistributionSamples
24
[0,0.2)=
[0.2,0.8)=
[0.8,1)=
AlgorithmicIntelligenceLab
• Mainissuesofconfidenceloss• HowtooptimizetheKLdivergenceloss?
• Thenumberofout-of-distributionsamplesmightbealmostinfinitetocovertheentirespace
• Ourintuition• Samplesclosetoin-distributioncouldbemoreeffectiveinimprovingthedetectionperformance
Contribution2.GANforGeneratingOut-of-DistributionSamples
25
AlgorithmicIntelligenceLab
• NewGANobjective
• Term(a)forcesthegeneratortogeneratelow-densitysamples• (approximately)minimizingthelognegativelikelihoodofin-distribution
• Term(b)correspondstotheoriginalGANloss• Generatingout-of-distributionsamplesclosetoin-distribution
Contribution2.GANforGeneratingOut-of-DistributionSamples
26
AlgorithmicIntelligenceLab
• NewGANobjective
• Term(a)forcesthegeneratortogeneratelow-densitysamples• (approximately)minimizingthelognegativelikelihoodofin-distribution
• Term(b)correspondstotheoriginalGANloss• Generatingout-of-distributionsamplesclosetoin-distribution
Contribution2.GANforGeneratingOut-of-DistributionSamples
27
AlgorithmicIntelligenceLab
• NewGANobjective
• Term(a)forcesthegeneratortogeneratelow-densitysamples• (approximately)minimizingthelognegativelikelihoodofin-distribution
• Term(b)correspondstotheoriginalGANloss• Generatingout-of-distributionsamplesclosetoin-distribution
Contribution2.GANforGeneratingOut-of-DistributionSamples
28
Classifiertrainedonin-distribution
AlgorithmicIntelligenceLab
• NewGANobjective
• Term(a)forcesthegeneratortogeneratelow-densitysamples• (approximately)minimizingthelognegativelikelihoodofin-distribution
• Term(b)correspondstotheoriginalGANloss• Generatingout-of-distributionsamplesclosetoin-distribution
• ExperimentalresultsontoyexampleandMNIST
Contribution2.GANforGeneratingOut-of-DistributionSamples
29
AlgorithmicIntelligenceLab
• WesuggesttrainingtheproposedGANusingaconfidentclassifier• Converseisalsopossible
Contribution3.JointConfidenceLoss
AlgorithmicIntelligenceLab
• WesuggesttrainingtheproposedGANusingaconfidentclassifier• Converseisalsopossible
• Weproposeajointconfidenceloss
• Classifier’sconfidenceloss:(c)+(d)• GANloss:(d)+(e)
Contribution3.JointConfidenceLoss
31
AlgorithmicIntelligenceLab
• WesuggesttrainingtheproposedGANusingaconfidentclassifier• Converseisalsopossible
• Weproposeajointconfidenceloss
• Classifier’sconfidenceloss:(c)+(d)• GANloss:(d)+(e)
• Alternatingalgorithmforoptimizingthejointconfidenceloss
Contribution3.JointConfidenceLoss
32Step2.updateclassifier
Classifier
GANStep1.updateGAN
Classifier
GAN
AlgorithmicIntelligenceLab
• Model:VGGNet [Christian’15]with13layers• In-distribution:CIFAR-10orSVHN
• Out-of-distribution:(resized)TinyImageNet andLSUN
ExperimentalResults:dataset&model
33
• 32×32 RGB• 10classes• 50,000trainingset• 10,000testset
• 32×32 RGB• 10classes• 73,257trainingset• 26,032testset
CIFAR-10[Krizhevsky’09] SVHN[Netzer’11]
• 32×32 RGB• 200classes• 10,000testset
• 32×32 RGB• 10classes• 10,000testset
TinyImageNet LSUN
[Krizhevsky’09]Krizhevsky,A.andHinton,G.Learningmultiplelayersoffeaturesfromtinyimages.Master’sthesis,DepartmentofComputerScience,UniversityofToronto,2009.[Netzer’11]Netzer,Y.,Wang,T.,Coates,A.,Bissacco,A.,Wu,B.andNg,A.Y.Readingdigitsinnaturalimageswithunsupervisedfeaturelearning.InNIPSWorkshoponDeepLearningandUnsupervisedFeatureLearning,2011.[Christian’15]ChristianSzegedy,WeiLiu,Yangqing Jia,PierreSermanet,ScottReed,Dragomir Anguelov,Dumitru Erhan,VincentVanhoucke,andAndrewRabinovich.Goingdeeperwithconvolutions.InComputerVisionandPatternRecognition(CVPR),2015
AlgorithmicIntelligenceLab
• TP=truepositive• FN=falsenegative• TN=truenegative• FP=falsepositive• [Metrics]• FPRat95%TPR
• FPR=FP/(FP+TN),TPR=TP/(TP+FN)
• AUROC(AreaUndertheReceiverOperatingCharacteristiccurve)• ROCcurve=relationshipbetweenTPRandFPR
• DetectionError• Minimummisclassificationprobabilityoverallthresholds
• AUPR(AreaunderthePrecision-Recallcurve)• PRcurve=relationshipbetweenprecision=TP/(TP+FP)andrecall=TP/(TP+FN)
ExperimentalResults- Metric
34
AlgorithmicIntelligenceLab
• Measurethedetectionperformanceofthreshold-baseddetectors• Confidencelosswithsomeexplicitout-of-distributiondataset
• Classifiertrainedbyourmethoddrasticallyimprovesthedetectionperformanceacrossallout-of-distributions
ExperimentalResults
35
RealisticimagessuchasTinyImageNet(aqualine)andLSUN(greenline)aremoreusefulthansyntheticdatasets(orangeline)forimprovingthedetectionperfor-mance
AlgorithmicIntelligenceLab
• Jointconfidenceloss
• ConfidencelosswiththeoriginalGAN(orangebar)isoftenusefulforimprovingthedetectionperformance
• Jointconfidenceloss(bluebar)stilloutperformsallbaselineitinallcases
ExperimentalResults
36
AlgorithmicIntelligenceLab
• ComparisonwithODIN[Liang’18]
ExperimentalResults
37
AlgorithmicIntelligenceLab
• ComparisonwithODIN[Liang’18]
ExperimentalResults
38
AlgorithmicIntelligenceLab
• Interpretabilityoftrainedclassifier
• Classifiertrainedbycrossentropylossshowssharpgradientmapsforbothsamplesfromin- andout-of-distributions
• Classifierstrainedbytheconfidencelossesdoonlyonsamplesfromin-distribution.
ExperimentalResults
39
AlgorithmicIntelligenceLab
Outline
• Introduction• Predictiveuncertaintyofdeepneuralnetworks• Summaryofcontributions
• Howtotrainconfidentneuralnetworks• TrainingConfidence-CalibratedClassifiersforDetectingOut-of-DistributionSamples[Lee’18a]
• Applications• Hierarchicalnoveltydetection[Lee’18b]
• Conclusion• Futurework
40
[Lee’18a]Lee,K.,Lee,H.,Lee,K.andShin,J.TrainingConfidence-calibratedClassifiersforDetectingOut-of-DistributionSamples. InICLR,2018.[Lee’17]Lee,K.,Hwang,C.,Park,K.andShin,J.ConfidentMultipleChoiceLearning. InICML,2017.[Lee’18b]Lee,K.,Lee,Min.K,Zhang,Y.Shin.J,Lee,H.HierarchicalNoveltyDetectionforVisualObjectRecognition,InCVPR,2018.
AlgorithmicIntelligenceLab
• Noveltydetection• 1.Findtheclosestknown(super-)categoryintaxonomy• 2.Findfine-grainedclassificationfornovelcategories(i.e.,out-of-distributionsamples)
HierarchicalNoveltyDetection
41
Figure1.Anillustrationofourhierarchicalnoveltydetectiontask
AlgorithmicIntelligenceLab
• Objective• 1.Findtheclosestknown(super-)categoryintaxonomy• 2.Findfine-grainedclassificationfornovelcategories(i.e.,out-of-distributionsamples)
HierarchicalNoveltyDetection
42
Figure1.Anillustrationofourhierarchicalnoveltydetectiontask
AlgorithmicIntelligenceLab
• Objective• 1.Findtheclosestknown(super-)categoryintaxonomy• 2.Findfine-grainedclassificationfornovelcategories(i.e.,out-of-distributionsamples)
HierarchicalNoveltyDetection
43
Figure1.Anillustrationofourhierarchicalnoveltydetectiontask
AlgorithmicIntelligenceLab
• Top-downmethod(TD)• p(child)=∑super p(child|super)p(super)
• Inference
• Definitionofconfidence:
TwoMainApproaches
44
Novelclass
AlgorithmicIntelligenceLab
• Top-downmethod(TD)• p(child)=∑super p(child|super)p(super)
• Inference
• Definitionofconfidence:• Objective
TwoMainApproaches
45
Novelclass
AlgorithmicIntelligenceLab
• ImageNetdataset• 22Kclasses• Taxonomy
• 396superclassesof1Kknownleafclasses
• Restof21Kclassescanbeusedasnovelclass
• Example
ExperimentalResultsonImageNetDataset
46
[Deng’ 12] J. Deng, J. Krause, A. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade offs in large scale visual recognition. In CVPR , pages 3450–3457. IEEE, 2012.
AlgorithmicIntelligenceLab
• ImageNetdataset• 22Kclasses• Taxonomy
• 396superclassesof1Kknownleafclasses
• Restof21Kclassescanbeusedasnovelclass
• Example
ExperimentalResultsonImageNetDataset
47
[Deng’ 12] J. Deng, J. Krause, A. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade offs in large scale visual recognition. In CVPR , pages 3450–3457. IEEE, 2012.
• Hierarchicalnoveltydetectionperformance
• Baseline:DARTS[Deng’12]
• OnecannotethatourmethodshavehighernovelclassaccuracythanDARTStohaveasameknownclassaccuracyinmostregions
AlgorithmicIntelligenceLab
• Weproposeanewmethodfortrainingconfident deepneuralnetworks• Itproducetheuniformdistributionwhentheinputisnotfromtargetdistribution
• Weshowthatitcanbeappliedtomanymachinelearningproblems:• Detectingout-of-distributionproblem[Lee’18a]• Ensemblelearningusingdeepneuralnetworks[Lee’17]• Hierarchicalnoveltydetection[Lee’18b]
• Webelievethatournewapproachbringsarefreshinganglefordevelopingconfidentdeepnetworksinmanyrelatedapplications:
• Networkcalibration• Adversarialexampledetection• Bayesianprobabilisticmodels• Semi-supervisedlearning
Conclusion
48
[Lee’18a]Lee,K.,Lee,H.,Lee,K.andShin,J.TrainingConfidence-calibratedClassifiersforDetectingOut-of-DistributionSamples. InICLR,2018.[Lee’17]Lee,K.,Hwang,C.,Park,K.andShin,J.ConfidentMultipleChoiceLearning. InICML,2017.[Lee’18b]Lee,K.,Lee,Min.K,Zhang,Y.Shin.J,Lee,H.HierarchicalNoveltyDetectionforVisualObjectRecognition,InCVPR,2018.