Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks...
Transcript of Deep Temporal Models (Benchmarks and …...2016 SRI International Deep Temporal Models (Benchmarks...
2016 SRI International
DeepTemporalModels(BenchmarksandApplica6onsAnalysis)SekChaiSRIInterna6onalPresentedat:NICE2016,March7,2016
2016 SRI International
HighdimensionalDataSet
Project Summary
SRIInterna4onalSekChai(PI)MohamedAmerDavidZhangTimShields
U.MontrealRolandMemisevicYoshuaBengio
U.GuelphGrahamTaylorDhaneshRamachandram
2
GoalsAnalyzeDeepTemporalModels(DTMs).Findapproachestoreducetraining6me,lowermemorysize,anduselowprecision.
Audio,video,gesture
DeepTemporalModels
Analysis
BenchmarksandApplica4onsAnalysis
ProcessorArchitecture
Benchmark
RNN,LTSM,CRBM
2016 SRI International
SeeingHumans• ChaLearn2014-Thisdatasetconsistsofasingleuserisrecordedinfrontofadepthcamera,
performingnaturalcommunica6vegesturesandspeakinginfluentItalian.Thedatasetfocusesontheuserindependentautoma6crecogni6onofavocabularyof20Italiancultural/anthropologicalsignsinimagesequences.
• Challenges:• Mul6modalvisualcues(RGBD)andaudio
• Mul6-6mescale,unreliabledepthcues
• Noinforma6onaboutthenumberofgestureswithineachsequence
• Highintra-classvariabilityofgesturesamples
• Lowinter-classvariabilityforsomegesturecategories.
• Severaldistractorgestures(outofthevocabulary)arepresent.
3
S.Escalera,etal.,"ChaLearnLookingatPeopleChallenge2014:DatasetandResults",ECCV-W2014
Image:Neverovaetal.(2015)
2016 SRI International
DeepGesture Architecture (for ChaLearn Dataset)
4
State-of-Art88.1%recogni4onrate
N.Neverova,etal.(2015),“ModDrop:Adap6veMul6modalGestureRecogni6on,IEEEPAMI(InPress)
temporalstrides
Valida6
onError,%
TrainingStage
KeyInsights• Weadoptedastrategywhere“likemodali6es”arefusedfirst;itresemblesbrain’smul6-modalfusionstrategy.
• Mostpreviousworkonmul6-modellearninghasfuseddata– Attheinputfeaturelevel(earlyfusion);or– Atthelevelofper-modalityclassifieroutputs(latefusion)
2016 SRI International
Example Training Complexity (for ChaLearn Dataset)
5
Classifier Description Modality Training Data Size (GB)
Training Time Epochs Time(sec)/epoch
Motion detect
Shallow MLP used to detect the start-frame and stop-frame of a given gesture in an action set.
Motion Capture 1.663 GB 20,174 sec (5.6hrs)
200,000 0.100
Skeleton Convolutional network which is trained to extract features from motion capture data (Path M)
Motion Capture 1.663 GB 69,344 sec (19.25hrs)
200,000 0.347
Video Feature
3D convolutional layer followed by 2D convolution layer which uses depth and intensity video (ConvC1 -> ConvC2, ConD1->ConvD2 )
intensity+depth video
12.211 GB 82,655 sec (22.9 hrs)
819 100.922
Video Shared hidden layers which uses inputs from previous convolutional layers (HLV1 + HLV2)
Intensity+depth video
12.211 GB 170,223 sec (47.28 hrs)
796 213.848
Multimodal Corresponds to the fully connected shared hidden layer where multimodal inputs are fused. (HLS)
all 13.874 GB 93,247 (25.9 hrs)
449 207.677
Summary• Total5daystoprocess42GBtrainingdataonSharcnetCopperCluster(064GPUs,128CPUcores,
24cores/node,64GB/node,x86,080TBRAIDAhachedStorage,InfiniBand,4TeslaK80s/node).• Total#Parameters=7,836,457
2016 SRI International 6
Stochas6crounding
CurrentApproaches
Needs:Memoryismainbohleneck,especiallyforembeddedsolu6ons.Es6mates1Bconnec6onneuralnetworkconsumes12W*.
LowPrecisionNeuralNetworks
NetworkPruning*Image:Han,etal.(NIPS2015)
BinaryConnectImage:Courbariaux,etal.(NIPS2015)
Gupta,etal.(arXiV2015)
2016 SRI International
Subband/WaveletDecomposi6on
7
Subbanddecomposi6onenablesdatareduc6onbydiscardinginforma6onaboutcertainfrequencieswherehumanvisualsystemislesssensi6ve.*Canwedothesameforlearntrepresenta6ons?
[1] "The Laplacian Pyramid as a Compact Image Code", Burt et. al. [2] Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks, Denton et. al. [3] Image fusion: algorithms and applications, Stathaki.
*Goodrepresenta6on:UsedinImageCompression[1],Reconstruc6on[2]andFusion[3]extensively.
2016 SRI International 8
TheMNISTdatabasecontainsblackandwhitehandwrihendigits,normalizedto20x20pixelsize.Thereare60,000trainingimagesand10,000tes6ngimages.
I0
Conven6onalApproachforMNISTdata
*Image:Lecun,etal.(1998)
2016 SRI International 9
G1
Lo
+Fusion
+
Image(I0) Laplacian(L0) Gaussian(G1)
28x28 32x32 16x16
OurApproach
CapturesEdges Spa6alInforma6on
BackgroundCues
SeparateNetworksforSubbandLearning
BasicIdea:Weseparateimageryintodifferentfrequencybands(e.g.withdifferentinforma6oncontent)suchthattheneuralnetcanbe.erlearnusinglessbits.
2016 SRI International 10
Valid
a4on
Error(Logsc
ale)
Epochs
Valida6onErrorvsEpochsinMNIST
---GaussianG1
-x-Fusion
-x-LaplaceL0-o-OriginalI0
+
I0 L0 G1
28x28 32x32 16x16
Discussions• CNNtrainedonLaplacianfocussesmoreonedges(goodfeature).
• CNN(L0)beatsCNN(I0)onthisdataset.
2016 SRI International 11
Weightbits
32bit 16bit 8bit 4bit
Original 1.13 1.13 1.11 1.32
GBlur 1.38 1.41 1.36 1.38
Laplace 1.02 1.00 0.91 1.66
Fusion 0.95 0.89 0.89 1.13
Stochas4cRoundingaZereveryepochWeightbits
32bit 16bit 8bit 4bit
Original 1.13 1.22 1.14 4.45
GBlur 1.38 1.34 1.35 5.24
Laplace 1.02 0.96 1.03 6.03
Fusion 0.89 0.91 0.92 6.03
Stochas4cRoundingaZerfinalepoch
Robustness to Low Bit Precision Weights
Discussions• Fusionresultsarecomparabletooriginal,usinghalfthenumberofbits.• Stochas6croundingavereveryepochguidesthelearning,andisespecially
usefulforlowprecision.
• SimpleFusion:Equi-Weightedaverageofsovmaxoutput• Scores0.5*(s1(x)+s2(y)),s1(x),s2(x)in[0,1]10and||s(x)||1=1.
2016 SRI International 12
Cifar-10 data set
Discussions:• Cluheredcolorimages,andmorechallengingthanMNIST.• Containsbackgroundcuesandcontextthatcanhelprecogni6on,(e.g.blue
skyforairplaneorwaterforship).• Architectureisthesameasbefore(LeNet-5)with30and60featuremaps.• Ourgoalistoshowcompara6veresultsforlowprecision.Therearenodata
augmenta6on.
TheCIFAR-10datasetconsistsof6000032x32colorimagesin10classes,with6000imagesperclass.Thereare50000trainingimagesand10000testimages.
2016 SRI International 13
0.28
0.38
0.48
0.58
0.68
0.78
Testerror
Epoch
CIFAR-10Performance
Original Laplacian Gblur Fusion
Clu.eredimagecombinedwithhighlearningrate
Gaussianblurremovesnoiseinclu.er
Laplacianenhancesforeground
2016 SRI International
Conclusion
14
• Hybrid-mul6modalneuralnetworksimprovesalgorithmicperformance.
• Fusionoflearntrepresenta6onsisimportant.• Lowprecisionnetworksshowspromise.
• StopbyandvisittheposteratNICE2016.
Chai,etal.,"LowPrecisionNeuralNetworksusingSubbandDecomposiHon",CogArch,April2016