Deep Learning With Keras: Beginner’s Guide To Deep Learning With Keras
Transcript of Deep Learning With Keras: Beginner’s Guide To Deep Learning With Keras
DeepLearningwithKeras
Beginner’sGuideTo
DeepLearningWithKeras
ByFrankMillstein
PleaseCheckOutMyOtherBooksBeforeYouContinueBelowyouwillfindjustafewofmyotherbooksthatarepopularonAmazonandKindle.Simplyclickonthelinkorimagebelowtocheckthemout.
LinkToMyAuthorPage
WHATISINTHEBOOK?INTRODUCTION
HOWDEEPLEARNINGISDIFFERENTFROMMACHINELEARNINGDEEPERINTODEEPLEARNING
CHAPTER1:AFIRSTLOOKATNEURALNETWORKS
CONVOLUTIONALNEURALNETWORKRECURRENTNEURALNETWORKRNNSEQUENCETOSEQUENCEMODELAUTOENCODERSREINFORCEMENTDEEPLEARNINGGENERATIVEADVERSARIALNETWORK
CHAPTER2:GETTINGSTARTEDWITHKERAS
BUILDINGDEEPLEARNINGMODELSWITHKERAS
CHAPTER3:MULTI-LAYERPERCEPTRONNETWORKMODELS
MODELLAYERSMODELCOMPILATIONMODELTRAININGMODELPREDICTION
CHAPTER4:ACTIVATIONFUNCTIONSFORNEURALNETWORKS
SIGMOIDACTIVATIONFUNCTIONTANHACTIVATIONFUNCTIONRELUACTIVATIONFUNCTION
CHAPTER5:MNISTHANDWRITTENRECOGNITION
CHAPTER6:NEURALNETWORKMODELSFORMULTI-CLASSCLASSIFICATIONPROBLEMS
ONE-HOTENCODINGDEFININGNEURALNETWORKMODELSWITHSCIKIT-LEARNEVALUATINGMODELSWITHK-FOLDCROSSVALIDATION
CHAPTER7:RECURRENTNEURALNETWORKS
SEQUENCECLASSIFICATIONWITHLSTMRECURRENTNEURALNETWORKSWORDEMBEDDINGAPPLYINGDROPOUTNATURALLANGUAGEPROCESSINGWITHRECURRENTNEURALNETWORKS
LASTWORDS
Copyright©2018byFrankMillstein-Allrightsreserved.
This document is geared towards providing exact and reliable information inregardstothetopicandissuecovered.Thepublicationissoldwiththeideathatthe publisher is not required to render accounting, officially permitted, orotherwise, qualified services. If advice is necessary, legal or professional, apracticedindividualintheprofessionshouldbeordered.FromaDeclarationofPrincipleswhichwasacceptedandapprovedequallybyaCommitteeoftheAmericanBarAssociationandaCommitteeofPublishersandAssociations.In no way is it legal to reproduce, duplicate, or transmit any part of thisdocument by either electronic means or in printed format. Recording of thispublicationisstrictlyprohibited,andanystorageofthisdocumentisnotallowedunlesswithwrittenpermissionfromthepublisher.Allrightsreserved.The informationprovidedherein is stated to be truthful and consistent, in thatanyliability,intermsofinattentionorotherwise,byanyusageorabuseofanypolicies, processes, or directions contained within is the solitary and utterresponsibility of the recipient reader. Under no circumstances will any legalresponsibility or blame be held against the publisher for any reparation,damages, or monetary loss due to the information herein, either directly orindirectly.Respectiveauthorsownallcopyrightsnotheldbythepublisher.The information herein is offered for informational purposes solely and isuniversal as so.Thepresentationof the information iswithout contractor anytypeofguaranteeassurance.Thetrademarksthatareusedarewithoutanyconsent,andthepublicationofthetrademark is without permission or backing by the trademark owner. Alltrademarksandbrandswithinthisbookareforclarifyingpurposesonlyandareownedbytheownersthemselves,notaffiliatedwiththisdocument.
INTRODUCTION
NeuralnetworksanddeeplearningareincreasinglyimportantstudiesandconceptsincomputersciencewithamazingstridesbeingmadebymajortechcompanieslikeGoogle.Overtheyears,youmayhaveheardwordslikebackpropagation,neuralnetworks,anddeeplearningtossedaroundalot.Therefore,aswehearthemmoreoften,thereislittlewonderwhythesetermshaveseizedyourcuriosity.
Deeplearningisanimportantareaofactiveresearchtodayinthefieldofcomputerscience.Ifyouareinvolvedinthisscientificarea,Iamsureyouhavecomeacrossthesetermsatleastonce.Deeplearningandneuralnetworksmaybeanintimidatingconcept,butsinceitisincreasinglypopularthesedays,thistopicismostdefinitelyworthyourattention.
Googleandotherlargeglobaltechcompaniesaremakinggreatstrideswithdeep-learningprojects,liketheGoogleBrainprojectanditsrecentacquisitioncalledDeepMind.Moreover,manydeeplearningmethodsarebeatingthosetraditionalmachinelearningmethodsoneverysinglematric.
HOWDEEPLEARNINGISDIFFERENTFROMMACHINELEARNING
Beforegoingfurtherintothissubject,wemusttakeastepback,soyougettolearnmoreaboutthebroaderfieldofmachinelearning.Veryoften,weencounterproblemsforwhichitishardtowriteacomputerprogramforsolvingthoseissues.Forinstance,ifyouwanttoprogramyourcomputertorecognizespecifichandwrittendigitsthatyoumayencounteroncertainissues,youcantrytodeviseacollectionofrulestodistinguisheveryindividualdigit.Inthiscase,zerosareoneclosedloop,butwhatifyoudidnotperfectlyclosethisloop.Ontheotherhand,whatiftherighttopofyourloopclosesonthatpartwherethelefttopofyourloopstarts?
Issueslikethishappenroutinely,aszeromaybeverydifficultwhenitcomestodistinguishingfromsixalgorithmically.Therefore,youhaveissueswhendifferentiatingzeroesfromsixes.Youcouldestablishakindofcutoff,butyouwillhaveproblemsdecidingtheoriginationofthecutoffinthefirstplace.Therefore,quicklyitbecomesverycomplicatedtocompilealistofguessesandrulesthatwillaccuratelyclassifyyourhandwrittendigits.
Therearemanymorekindsofissuesthatfallintothiscategorysuchascomprehendingspeech,recognizingobjects,andunderstandingconcepts.Therefore,wecanhaveissueswhenwritingcomputerprograms,aswedonotknowhowthisisdonebyhumanbrains.Despitethefactyouhavearelativelygoodideaonhowtodothis,yourprogrammaybeverycomplicated.
Therefore,insteadofwritingaprogram,youcantryanddevelopanalgorithmwhichyourcomputercanusetolookatthousandsofexamplesandcorrectanswers.Therefore,yourcomputercanusetheexperiencethathasbeenpreviouslygainedtosolvethesameprobleminnumerousothersituations.Ourmaingoalwiththissubjectistoteachourcomputerstosolvebyexampleintheverysimilarwayyoucanteachyourchildtodistinguishadogfromacat.
Deeplearningwasfirsttheorizedbackintheearly1980sandwasoneofthemainparadigmsforperformingbroadermachinelearning.Overthepastfewdecades,computerscientistshavesuccessfullydevelopedawiderangeofdifferentalgorithmswhichtrytoallowcomputerstolearntosolveproblemsthroughexamples.Becauseoftheflurryofmoderntechnologicaladvancementsandmodernresearch,deeplearningisontherisesinceithasproventobeextremelygoodwhenitcomestoteachingourcomputerstodowhatthehumanbraincandonaturallyandeffortlessly.
Oneofthemainchallengeswithtraditionalmachinelearningmodelsisaprocessnamedfeatureextraction.Morespecifically,theprogrammersmusttellthecomputerwhatkindoffeaturesandinformationitshouldbelookingforwhentryingtomakeachoiceordecision.
However,feedingthealgorithmrawdatainfactrarelyworks,sothisprocessoffeatureextractionisoneofthecriticalpartsofthetraditionalmachinelearningworkflow.Moreover,thisplacesamassiveburdenontheprogrammerastheeffectivenessofthealgorithmmainlyreliesonhowtheinsightoftheprogrammer.Formorecomplexissues,suchashandwritingrecognitionorobjectrecognition,thisisoneofthemainchallenges.
Fortunately,wehavedeeplearningmethodsbywhichwecansurelycircumventthesechallengesregardingfeatureextraction.Thisismainlybecausedeeplearningalgorithmsarecapableoflearningtofocusonlyonthoseinformative,rightfeaturesbythemselveswhileatthesametimetheyrequireverylittleguidancefromtheprogrammer.Moreover,thismakesdeeplearninganamazinglypowerfultoolformachinelearning.
Machinelearningusesourcomputerstorunpredictivemodels,whicharecapableoflearningfromalreadyexistingdatatoforecastfutureoutcomes,behaviors,andtrends.Ontheotherhand,deeplearningisanimportantsubfieldofmachinelearninginwhichalgorithmsormodelsareinspiredbyhowthehumanbrainworks.Thesedeeplearningmodelsareexpressedmathematicallywhereparametersthatdefinemathematicalmodelscanbeintheorderofseveralthousandtomillions.Indeeplearningmodels,everythingislearnedautomatically.
Moreover,deeplearningisoneofthemainkeysenablingartificialintelligencepoweredtechnologiesthatarebeingdevelopedaroundtheglobeeveryday.Inthefollowingsectionsofthebook,youaregoingtolearnhowtobuildcomplexmodelswhichhelpmachinessolvedistinctreal-worldissueswithhuman-likeintelligence.YouwilllearnhowtobuildandderivemanyinsightsfromthesemodelsusingKerasrunningonyourLinuxmachine.
Thebook,infact,providesthelevelofdetailsneededfordatascientistsandengineerstodevelopagreaterintuitiveunderstandingofthemainconceptsofdeeplearning.Youwillalsolearnpowerfulmotifsthatcanbeusedinbuildingnumerousdeeplearningmodelsandmuchmore.
Machinelearninganddeeplearninghaveonethingincommon,thatistheyarebothrelatedtoartificialintelligence.Artificialintelligenceregardscomputersystems,whichmimicorreplicateshumanintelligence,whilebroaderfieldofmachinelearningallowsmachinetolearnentirelyontheirown.Ontheotherhand,deeplearningregardsmanycomputeralgorithms,whichattempttomodelhigh-levelabstractionscontainedindatatodeterminehigh-levelmeaning.
Forinstance,ifartificialintelligenceisusedtorecognizeemotionsinpictures,thenmachinelearningmodelswouldinputhundredsorthousandsofpicturesofhumanfacesintothesystemwhiledeeplearningwillhelpthatsystemtorecognizecountlesspatternsinthehumanfacesandtheemotionstheyshare.
Thisisaverysimpleexplanationofthethree.However,itismorecomplex.Deeplearningbyfaristhemostconfusingasitworkswithneuralnetworks,data,andmath.Unlikedeeplearning,machinelearninganalyzes,crunchesnumbersanddata,learnsfromitandusesthatinfotomakeinnumerablepredictions,truthstatementsanddeterminationsdependingonthescenario.
Inthiscase,themachineisbeingtrainedoritistrainingitselfonhowtoperformtaskscorrectlyafterlearningfromnumbersanddataithaspreviouslyanalyzed.Therefore,machinelearningmodels,buildstheirownsolutionsandlogic.MachinelearningcanbedonewithseveralalgorithmslikerandomforestanddecisiontreeusedbyNetflixforinstance,thatsuggestmoviestoitscustomersbasedontheirstarratings.
Anothercommonmachinelearningmodelisalinearregressionthatpredictsthe
valueofamultitudeofcategoricaloutcomeswithlimitlessresultslikefiguringouthowmuchmoneyyoucansellyourcarforbasedonthecurrentmarketflow.Othermachinelearningmodelsincludelogisticregressionthatpredictsthevalueofcategoricaloutcomesbasedonalimitednumberofpossiblevalues.
ClassificationandnaiveBayesadditionallyincludemachinelearningmodels.MachinelearningclassificationputsdataintodistinctgroupslikeemailsorfilingdocumentswhilenaïveBayesincludesafamilyofalgorithms,whichallsharecommonprinciplesinwhicheveryfeatureisbeingclassifiedindependentlyofotherfeatures.Thismaygoonastherearemanyothermachinelearningmodels.Therearetwotypesofmachinelearningmodelsaswellincludingsupervisedandunsupervised.
Supervisedlearningmodelsrequireahumantoinputboththedataandthesolution.Ontheotherhand,thesemodelsallowthemachinetofigureoutonitsowntherelationshipbetweenthetwo.Ontheotherhand,unsupervisedmachinelearningmodelsincludeputtinginrandomdataandnumbersforaspecificsituationandaskingthemachineorcomputertofindarelationshipandsolution.Therefore,machinelearningeliminatedtheneedforsomeonetoconstantlyanalyzeorcodedatatopresentlogicandasolution.
DEEPERINTODEEPLEARNING
Unlikemachinelearning,deeplearningcrunchesmoredata,whichisthebiggestdifferencebetweenthetwo.Forinstance,ifyouhavealittlebitofdatatoanalyze,thewaytogoismachinelearning.However,ifyouhavemoredatatoanalyze,deeplearningisyoursolution.Deeplearningmodelsareextremelypowerful,andtheyneedalotofdatatogiveyouthebestpossibleoutcomeorsolution.Ontheotherhand,deeplearningmodelsneedmorepowerfulmachineswhilemachinelearningmodelsdonot.
MorepowerfulmachinesarerequiredfordeeplearningasdeeplearningmodelsdomorecomplicatedthingssuchasmatrixmultiplicationsthatrequireaGPUorgraphicsprocessingunit.Deeplearningmodelsalsotrytolearnhigh-levelfeatures,sointhecaseoffacialrecognition,thedeeplearningmodelwillgettheimagethatisquiteclosetotheRAWversion,whileamachinelearningmodelwillgetablurryimage.Otherpowerfuldeeplearningfeaturesareformingend-to-endsolutionsinsteadofbreakingissuesandsolutionsdowninparts.
Deeplearningisoneofthemostpowerfultoolsusedbymajorglobaltechcompanies.Deeplearningtakesalongtimeinprocessingdataandfindingcorrectsolutions.Justkeepinmind,itmaybechallengingattheverybeginning,butyouwillgetthereeventually.Fortunately,youhavethebooktostartoffyourdeeplearningjourney.
CHAPTER1:AFIRSTLOOKATNEURALNETWORKS
Inrecentyears,neuralnetworks,ormorespecificallydeepneuralnetworks,havewonnumerouscontestsinmachinelearningandpatternrecognition.Deeplearnersaremainlydistinguishedbythedepthoftheirpathsthatarechainsofthepossiblylearnablecausalrelationshipsbetweeneffectsandactions.
Deeplearningalgorithmsinverysimplewordsaredeep,largeartificialneuralnets.AnNNorneuralnetworkcanbeeasilypresentedasadirectedacrylicgraphinwhichtheinitialinputlayerbyitselftakesinsignalvectorsinadditiontoincludingsingleormultiplehiddenlayersandthenprocesstheoutputsofthosepreviouslayers.
Infact,themainconceptbehindneuralnetworkscanbetracedtohalfacenturyago.Thereismoretalkabouttheideatodaybecausewehavealotmoredataandwehavesignificantlymorepowerfulcomputers,whichwerenotavailabledecadesago.
Adeepneuralnetworkhasmanymorelayersandmanymorenodesineverylayerwhichresultinexponentiallymoreparameterstotune.Inthecasewhenwedonothaveenoughdata,wearenotabletolearnthoseimportantparametersefficiently.Inaddition,withoutpowerfulmachinesorcomputers,learningwouldbeinsufficientaswellastooslow.
Whenitcomestosmalldatasets,traditionalmachinelearningalgorithmssuchasrandomforests,regression,GBM,SVMandstatisticallearningdoanamazingjob.However,whenthedatascalegoesupcontainingalargeamountofinformation,thoselarge,deepneuralnetworksquicklyoutperformthetraditionalones.
Thishappensprimarilybecausecomparedtoatraditionalmachinelearningalgorithm,adeepneuralnetworkmodelhasawiderrangeofparametersandithasthecapabilityoflearningmorecomplexnonlinearparameters.Therefore,weexpectadeepneuralnetworkmodeltoautomaticallypickthemostimportantandhelpfulfeaturesonitsownwithouttoomuchmanualengineering.
Asalreadymentioned,deeplearningisoneofthemainformsofmachinelearningwhichusesamodeloralgorithmofcomputingthatisverymuchbasedorinspiredbythestructureofthehumanbrain.Therefore,thesemodelsarecalledneuralnetworks.Thebasicunitofanyneuralnetworkistheneuron.Everyneuronhasaspecificsetofinputs.Inaddition,everyneuronhasaspecificweight.Theneuronshavethepowerofcomputingfunctionsbasedonthesespecificweightedinputs.Forinstance,alinearneurontakesalinearcombinationofitsweightedinputswhilesigmoidalneuronsfeedthespecificweightedsumofitsinputsintoalogisticfunction.
Thelogisticfunctionalwaysreturnsavaluebetween1and0.Inthecasewhentheweightedsumisnegative,thereturnvalueiscloseto0.Ontheotherhand,whentheweightedsumislargeorpositive,thereturnvalueiscloseto1.Whenitcomestomathematicalproblems,thelogisticfunctionisalogicalchoicesinceithasnicelookingderivativesthatmakethelearningprocesssimpler.
However,whateverfunctiontheneuronuses,thevaluecomputedisimmediatelytransmittedtootherincludedneuronsthatshowtheneuron’soutput.Inpractice,thesesigmoidalneuronsareusedmoreoftenthanlinearfunctionssincetheyenablemoreversatiledeeplearningmodelsincomparisontolinearneurons.
Adeepneuralnetworkoccurswhenyoustartconnectingneuronstoeachotherinyourinputdataandeventuallytotheoutletsthatcorrespondtoyournetwork’sanswertoyourlearningproblems.Tomakethisstructureeasiertovisualize,youmustobtaintheweightofyourneuronsinthelinkthatconnectyourinitiallayertootherlayerscontainedinyourneuron.
Verysimilartohowneuronsareorganizedinlayersinthebrain,neuronsindeepneuralnetsaretypicallyorganizedincertainlayers.Indeepneuralnetworks,neuronssituatedonthebottomlayersarethosethatreceivesignalsfromtheinputs.Whileneuronssituatedinthetoplayersareconnectedtotheanswer;thankstotheiroutlets.Usually,therearenoconnectionsbetweenneuronsinthesamelayersasmorecomplexconnectivitybetweenneuronsrequiremoremathematicalanalysis.
Inthecasewhentherearenoconnections,whichleadfromaneuroninahigherleveltothoseneuronsinlowerlayers,wecallthemfeed-forwardneuralnetworks.Opposedtotheseneuralnetworksarerecursiveneuralnetworksthataremuchmorecomplicatedtotrainandanalyze.Now,wewillgothroughseveralofthemostcommonlyuseddeepneuralnetworks.
CONVOLUTIONALNEURALNETWORK
ConvolutionalneuralnetworksoralsoknownasCNNandareoneofthemostcommonlyusedtypesoffeed-forwardneuralnetworksinthattheconnectivitypatternbetweenneuronsisbasedontheorganizationofthecommonvisualcortexsystem.Therefore,theV1ortheprimaryvisualcortexdoesedgedetectionoutofrawvisualinputobtainedfromtheretina.Then,theV2orsecondaryvisualcortexreceivestheedgefeaturesfromtheprimaryvisualcortexandextractssimplevisualpropertiessuchasspatialfrequency,orientation,andcolor.
ThevisualareaofV4oranothervisualcortexmainlyhandlesmorecomplicatedattributesorgrainedobjects.Then,allthoseprocessedvisualfeaturesflowintothefinalunitnamedinferiortemporalgyrusofITforfurtherobjectrecognition.ThisspecificshortcutbetweenV1layerandV4layer,infact,inspiredacertaintypeofconvolutionalneuralnetworkswithconnectionsbetweenthosenon-adjacentlayersnamedresidualnet.Residualnetscontainresidualblocksthatsupportinputsofonelayertobereadilypassedtothoselayerscominglater.
Therefore,convolutionalneuralnetworksarecommonlyusedforedgedetections,extractingsimplevisualpropertiessuchasspatialfrequency,orientation,andcolors,detectingobjectfeaturesofintermediatecomplexityandobjectrecognition.
Convolutioniscommonlyusedinmathematicaltermsreferringtoanoperationbetweenmatricesasconvolutionallayerswhichgenerallyhaveasmallmatrixnamedfilterorkernel.Asthefilterorkernelisslidingorconvolvingacross
thesematricesofinputimages,itis,atthesametime,computingtheimportantelement-widemultiplicationofspecificvaluescontainedinthekernelmatrixaswellascontainedintheoriginalimagevalues.
Therefore,specificallydesignedfiltersorkernelsarecapableofprocessingimagesforveryspecificpurposessuchasimagesharpening,blurring,edgedetectionandmanyotherprocessesefficientlyandrapidly.
RECURRENTNEURALNETWORK
Aneuralnetworksequencemodeliscommonlydesignedtotransforminputsequencesintoanoutputsequences,whichliveinadifferentdomain.AnothercommontypeofdeepneuralnetworksnamedRNNorrecurrentneuralnetworksaregreatlysuitableforthesepurposesastheyhaveshownanamazingimprovementinproblemslikespeechrecognition,handwritingrecognition,andmachinetranslation.
AnRNNmodelisbornwithanamazingcapabilityofprocessinglongsequentialdataandtacklingverycomplextaskswithcontextspreadoveraperiod.Therecurrentmodel,infact,processessingleelementintheneuralsequenceatthetime.Aftertheinitialcomputation,thisnewlyupdatedunitstateiseasilypasseddowntothatnexttimesteptofacilitatethecomputationofeverynextelement.
ImaginethecasewhenrecurrentneuralnetworkmodelreadsallarticlesonWikipediacharacterbycharacter.Ontheotherhand,simpleperceptronneuronswhichlinearlycombinethecurrentinputelements,aswellasthelastunitstatetypically,maylosethoselong-termdependencies.
Forinstance,wecanstartasentencewithSusanisworkingat…Then,afterawholeparagraph,wewanttostartournextsentenceswithHeorShecorrectly.Iftherecurrentneuralmodelforgetsthecharacter’snameweused,wecanneverknow.Toresolvethisissue,engineershavecreatedaspecialdeepneuronthatcomeswithmorecomplicatedinternalstructuredesignedtomemorizethelong-termcontextnamedLTSMorlong-short-termmemory.
LTSMmodelsaresmartenoughtolearnlong-termcontext.Thesemodelscanlearnforhowlongtheyshouldmemorizetheoldinformation,whentoforgetinformation,whentousenewlyupdateddata,andwhentocombinethenewinputwitholdmemory.UsingthepowerofLSTMandRNNcells,youcanbuildanRNNcharacter-basedmodelthatwillbeabletolearnthespecificrelationshipbetweenthecharacterstoformwordsandsentenceswithoutanypreviousknowledgeofEnglishvocabulary.ThisRNNmodel,infact,couldachieveaverygoodperformanceevenwithoutalargesetoftrainingdata.
RNNSEQUENCETOSEQUENCEMODEL
Thecommonsequence-to-sequencemodelisveryoftenusedasanextendedversionofrecurrentneuralmodels,butitsapplicationfieldismoredistinguishable.Sameasrecurrentneuralnetworks,sequence-to-sequencemodelsoperateonsequentialdata,buttheyarecommonlyusedtodeveloppersonalassistantsorchatbotsbygeneratingmeaningfulresponsestonumerousinputquestions.
Somecommonsequence-to-sequencemodelsconsistoftworecurrentneuralnetworksincludingdecoderandencoder.Inthiscase,theencoderlearnseverythingabouttheobtainedcontextualinformationfromvariousinputwords.Then,theencoderhandsthisknowledgedowntothedecoderbyusingaspecificcontextvectoralsoknownasthoughtvectors.Eventually,thedecoderconsumesthesecontextvectorsandgeneratescorrectresponses.
AUTOENCODERS
Autoencodersaredifferentfromthepreviousdeeplearningmodelsastheyareusedonlyforunsuperviseddeeplearning.Autoencodersaredesignedmainlytolearnlow-dimensionalrepresentationofhigh-dimensionaldatasetsverysimilartowhatPCAorprincipalcomponentsanalysisdoes.Theautoencodermodeliscapableoflearningapproximationfunctionstoreproducetheinputdata.
Ontheotherhand,specificbottlenecklayerssituatedinthemiddlecontainingaverysmallnumberofnodesrestrictthesemodels.Withthisverylimitednumberofnodes,thesemodelscomewithverylimitedcapacity,sotheyareforcedtoformaspecific,veryefficientencodingofthedata,whichisthelow-dimensionalcodeweobtained.
Youcanuseautoencodermodelstocompressyourdocumentsonavarietyoftopics.Therearesomelimitationsasthesemodelsastheycomewithabottlenecklayerthatcontainsonlyafewneurons.
However,whenyouusebothautoencoderandPCA,youcanreduceyourdocumentsontotwodimensions,soyourautoencodermodelwilldemonstrateamuchbetteroutcome.Withthehelpofthesemodels,youcandoveryefficientdatacompressiontospeeduptheoverallprocessofinformationretrievalincludingbothimagesanddocuments.
REINFORCEMENTDEEPLEARNING
ReinforcementlearningisoneofthesecretsbehindmanysuccessfulAIprojectsdoneinthepast.Reinforcementlearningisasubfieldofmachinelearningthatallowssoftwareagentsandmachinestoautomaticallydeterminetheoptimalbehaviorwithinagivencontextwiththemaingoalofmaximizingthelong-termperformances,whicharemeasuredbyagivenmetric.
Mostreinforcementlearningprojectsstartwithaspecificsupervisedlearningprocess,trainafastrolloutpolicyaswellaspolicynetwork,mainlyrelyingonthemanuallycuratedtrainingdata.Thereinforcementlearningpolicynetworkgetsimprovedwhenitgainedmoreversionsofthepreviouspolicynetwork.Therefore,withmoreandmoreobtaineddata,itgetsstrongerandstrongerwithoutrequiringanyadditionalexternaltrainingofdata.
GENERATIVEADVERSARIALNETWORK
AnothertypeofdeepneuralnetworkcommonlyusedisgenerativeadversarialnetworkorGAN.Thisisatypeofdeepgenerativealgorithm.GANhasthepowerofcreatingnewexamplesaftergoingthroughandlearningsomeoftherealdata.AcommontypeofGANconsistsoftwomodelsthatarecompetingagainsteachotherinazero-sumnetwork.
Thegenerativeadversarialnetworkmainlycontainssomereal-worldexamples,generator,generatedfakesamples,discriminator,andfinetunetraining.Thesemodelscantellthefakedataapartfromthetruedataduetodatadistribution.GANwasinitiallyproposedtogeneratemeaningfulimagesafterlearningfromrealworldphotos.
TheGANmodelproposedintheoriginalGANpaperwascomposedoftwoindependentmodelsincludingthediscriminatorandthegenerator.Inthiscase,thegeneratorsproducedfakeimagesandsenttheoutputbacktothediscriminatormodel.Then,thediscriminatorworkedinamannerverysimilartoajudgesinceitwasfullyoptimizedtoidentifythefakephotosfromtherealones.
Inaddition,thegeneratormodel,atthesametime,wastryinghardtocheatthediscriminatorasthejudgewastryingveryhardnottobecheatedbythegenerators.Thiswasaveryinterestingzero-sumgameoccurringbetweentheseGANmodelsthatmotivatedbothmodelstofurtherimprovetheirfunctionalitiesanddeveloptheirdesignedskills.
Afterlearningaboutthesedeepneuralnetworkmodels,youprobablywonderhowyoucanimplementthesemodelsandusetheminrealproblemsolvingwithdeeplearningissues.Fortunately,therearemanysourcelibrariesandtoolkitsyoucanuseforbuildingyourowndeeplearningmodels.TensorFlowarguablyisoneofthemostpopularthatattractedalotofattention.Intermsofpopularity,TheanofollowsTensorFlowveryclosely.Therefore,thosetwoarethebestnumericalplatformsinPythonwhichprovidethebasisforinnumerabledeeplearningprojects.
Bothareverypowerfullibrariesandcanbeusedfordifferenttasksforcreatingdeeplearningmodels.AnotherpowerfultoolinPython’slibraryisKeraswhichwearegoingtouseinthisbook.Kerasisanamazingly,powerfulhigh-levelneuralnetworkAPIwithastonishingpowerofrunningontopofTheano,TensorFloworCNTK.ItwaswritteninPythonanddevelopedwiththefocusofenablingfastandefficientdeeplearningexperimentation.
CHAPTER2:GETTINGSTARTEDWITHKERAS
WearegoingtouseKerasasitallowsfastandeasyprototyping,supportsbothrecurrentandconvolutionalnetworksandacombinationofthetwo,andrunsseamlesslyonGPUandCPU.Designedtoenablefastdeeplearningmodelingandexperimentationwithneuralnetworks,itfocusesonbeingmodular,minimalandextensible.
Therefore,withKerasyoucanbuildawiderangeofdifferentdeeplearningmodels,whichrunontopofTensorFloworTheanoeffortlesslyandefficiently.Kerasisafreeopen-sourceneuralnetwork,soyouwillfindandinstalliteasily.ThecoredatastructureofKerasisamodelthatisawayoforganizingmultiplelayers.BeforeyoudelvedeeperintoKeras,youmustinstallit,ofcourse.BeawarethatthispopularprogrammingdeeplearningframeworkuseseitherTheanoorTensorFlowbehindthescenesinsteadofprovidingallthefunctionalitybyitself.
KerasisverysimpletoinstallifyouhavebeenworkinginSciPyandPythonenvironment.MakesureyouhaveaninstallationofTensorFloworTheanoonyoursystemalreadybeforeyouinstallKeras.KerascanbeveryeasilyinstalledusingPyPI.
sudopipinstallkeras
python-c"importkeras;printkeras.__version__"
1.1.0
sudopipinstall--upgradekeras
Usingthesamemethod,youcanupgradeyourversionofKeras.AssumingyouhaveinstalledbothTensorFlowandTheano,youareabletoconfigurethebackendusedbyKeras.ThebestwayisbyeditingoraddingtheKerasconfigurationfileinyourdirectory.
~/.keras/keras.json
{
"imagedimordering":"tf",
"epsilon":1e-07,
"floatx":"float32",
"backend":"tensorflow"
}
Inthisconfigurationfile,youcanchangethepropertyofbackendfromTensorFlowtoTheano.Inthiscase,Keraswilluseyournewconfigurationthenexttimeyourunit.Inaddition,youcaneasilyconfigureKerasasfollows.
python-c"fromkerasimportbackend;printbackend._BACKEND"
UsingTensorFlowbackend.
Tensorflow
Inaddition,youcanspecifywhichbackendyouwantKerastouseonyourcommandlineasshownbelow.
KERAS_BACKEND=theanopython-c"fromkerasimportbackend;print(backend._BACKEND)"
Runningthisscript,yougetasillustratedbelow.
UsingTheanobackend.
theano
BUILDINGDEEPLEARNINGMODELSWITHKERAS
ThefocusofKerasisamodel.ThemainkindofmodelbuildinKerasiscalledasequencecontainingalinearstackofmultiplelayers.Therefore,youcreateasequenceandgraduallyaddlayerstothemodelintheorderyouwantsoyougetpropercomputation.Onceyoudefineyourmodel,youmustcompileyourmodelsothatitmakesuseofthespecificunderlyingframeworktooptimizetheentireprocessofcomputation,whichwillbeperformedonyourdeeplearningmodel.
Inthiscase,youmustspecifytheoptimizerandthelossfunction,whichwillbeused.Onceyoucompileyourmodel,yourmodelmustfittothedata.Thisisdoneonebyone,onebatchofdataatatime.Infact,thisiswhereallthecomputationoccurs.Onceyoutrainyourmodel,youcanuseittomakeothernewpredictionsonyourdata.
Insummary,theconstructionofdeeplearningmodelsinKerascanbeexplainedasdefiningyourmodel,compilingyourmodel,fittingyourmodelandmakingpredictions.Todefineit,youmustcreateasequenceandaddmultiplelayers.Oncedone,youcompileyourmodelbyspecifyingoptimizerandlossfunctions.Then,youmustfityourmodelbyexecutingthemodelusingdata.Finally,youmakepredictionsonnewdata.
Asalreadymentioned,Kerasisamazinglypowerfulandeasytouseforevaluatinganddevelopingvarieddeeplearningmodels.ItwrapsallthoseefficientnumericalcomputationlibrarieslikeTensorFlowandTheanoandallowsyoutodefineandproperlytrainyourneuralnetworkmodelsinseveral
linesofcode.
Following,youaregoingtolearnhowtocreateyourfirstnetworkmodelinPythonusingKeras.Beforeyoubegin,makesureyouhavePython2,oranewerversioninstalledandconfigured.YoualsoneedNumPyandSciPyinstalledandconfigured,and,ofcourse,youneedtohaveKerasandTensorFloworTheanoinstalledandconfigured.Onceyouhavetheseupandrunning,createanewfileasfollows.
keras_first_network.py
Inthefollowingsection,youwilllearnhowtoloaddata,defineyourmodel,compilemodel,fitmodel,evaluateyourmodel,andtieitalltogethertoworkperfectlyonyourfuturemodels.
Wheneverweworkwithdeeplearningmodelswhichuseastochasticprocesslikerandomnumbers,itisaverygoodideatosettherandomnumberseed.Thisisagoodideaasyouwillbeabletorunthesamecodeoverandoverandgetthesameresult.Thisisalsoveryusefulinthecasewhenyouneedtodemonstratearesult,comparedifferentmodelsusingthesamesource,orwhenyouneedtodebugapartofyourcode.Fortheinitializationoftherandomnumbergenerator,usethefollowingscript.
fromkeras.modelsimportSequential
fromkeras.layersimportDense
importnumpy
numpy.random.seed(7)
Oncedone,youcanloadyourdata.Todosointhisexample,wearegoingtouseaverypopularPimaIndiansonsetwhichisastandarddeeplearningdatasetdevelopedbytheUCIMachineLearningrepository.ItdescribespatientmedicaldataforPimaIndianswithinfiveyears.Therefore,thisisabinaryclassificationproblemasalltheinputvariablesusedarenumerical.Therefore,thisreallymakesiteasytousethisdatasetdirectlywithaneuralnetworkinKeras.Touseit,downloadthedatasetandplaceitinyourworkingdirectory,whichisthesameasyourpreviouslycreatedPythonfile.
Now,wecontinuewithbuildingyourmodelwithKeras.YoumustloadthefiledirectlyusingthespecificNumPyfunction.Thereisoneoutputvariableinthelastcolumnandeightinputvariables.Onceyouloadthedata,youcansplityourdatasetintooutputvariabledenotedasYandinputvariablesdenotedasX.
dataset=numpy.loadtxt("pima–indians–diabetes.csv",delimiter=",")
X=dataset[:,0:8]
Y=dataset[:,8]
Makesureyouhaveinitializedyourrandomnumbergeneratortoensureyourresultsarereproducibleaswellasproperlyloadedonyourdata.Now,youmustdefineyourneuralnetworkmodel.
Asalreadymentioned,modelsinKerasaredefinedasaspecificsequenceofmultiplelayers.Tocreateasequentialmodelandthenaddonelayeratatime
untilyouaresatisfiedwithyournetworktopology,youmustdefineyourmodel.Thefirstthingyoumustdoistoensureyourinputlayerhasthepropernumberofinputs.Youcanspecifythiswhenyoucreateyourfirstlayerusingtheinputdimargument.Makesureyousetittoeightfortheeightinputvariables.
Nowyouprobablywonder,howdoyouknowtherightnumberoflayersandtheirtypes.Well,thisisacomplexquestion.Therearesomeheuristicsyoucanuse.However,thebestnetworkstructureisfoundthroughacertainprocessoftrialanderror.Youwillneedanetworklargeenoughtocapturethecorestructureoftheproblem.
Further,wearegoingtoseeafully-connectedneuralnetworkstructurecontainingthreelayers.Takeintoconsiderationthatfully-connectedlayersaremainlydefinedusingthedenseclass.Youcanspecifytherightnumberofneuronscontainedinthelayerasyourfirstargument,whileyoursecondargumentisdefinedasinit.Then,youcanspecifyyouractivationfunctionaswellusingtheactivationargument.
Inthefollowingcase,wearegoingtoinitializethespecificnetworkweightstoyoursmallrandomnumbergeneratedfromaspecificuniformdistribution.Inthiscase,youwillgetbetween0and0.05asthisisthedefaultuniformweightwhenusingKeras.ThereisalsoanotheralternativenamednormalandinvariablyusedforsmallrandomnumbersthataregeneratedfromGaussiandistribution.
Inourexample,wearegoingtousethereluorrectifieractivationfunctionaswellonthetwoinitiallayers.Wemustusethesigmoidfunctioninouroutput
layer.Before,itwascommontousethetanhandsigmoidactivationfunctionsforalllayers.
However,thesedaysthisisnotthecaseasbetterperformanceisachievedwhenyouusetherectifieractivationfunctioninadditiontousingasigmoidfunctiononyouroutputlayertoensureyourneuralnetworkoutputisbetweenoneandzero.Usingthismanner,itisveryeasytomapallclasseswithadefaultthreshold.Then,youcanpieceeverythingtogetherbyaddinglayers.Yourfirstlayerwillhavetwelveneurons,soexpecteightinputvariables.Yoursecondhiddenlayerswillhaveeightneuronswhileyouroutputlayerwillhaveoneneuronpredictingtheclasses.
model=Sequential()
model.add(Dense(12,input_dim=8,activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
Thenextstepistocompileyourmodel.Tocompileyourneuralnetworkmodel,youmustusetheefficientnumericallibrariescalledbackendlikeTensorFloworTheano.
Whenyouuseitforcompiling,thebackendautomaticallychoosesthebestpossiblewayforrepresentingyournetworkformakingpredictionsandtraining,whichwillrunonyourhardwarelikeGPUorCPUandsometimesevendistributed.
Whencompilingyourmodel,firstly,youmustspecifysomeadditionalproperties,whicharerequiredwhenyoutrainyourneuralnetwork.Becognizant
thattrainingyournetworkmeansfindingthebestpossiblesetofweightswhichyouuseformakingrightpredictionsforyourspecificproblem.
Toevaluateasetofweights,firstly,youmustspecifythelossfunctions.Youwillusetheoptimizerthatisusedtosearchthroughdifferentweightsforyournetworkaswellasoptionalmetricsyouwouldliketocollectandreportduringmodeltraining.Inthisspecificcase,youwilluselogarithmicloss,whichisdefinedasbinarycrossentropyformostofbinaryclassificationproblems.
YoumustusetheefficientgradientdescentalgorithmorAdamonlybecauseitisaveryefficientdefault.Sincethisisaclassificationproblem,youmustcollectandreportthemetricoftheclassificationaccuracy.
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
Onceyouaredonewithcompilingyourmodel,thenextstepisfittingyourmodel.Onceyoudefineandcompileyourmodel,youmustfitit,soitisreadyforefficientcomputation.Therefore,nowistherighttimetoexecuteyourmodelwiththedata.Youcantrainorfityourneuralnetworkmodelonyourloadeddatabycallingthespecificfitfunctiononyourmodel.
Considerthatthemodeltrainingprocesswillrunforaspecificnumberofiterationsthroughyourdatasetnamedepochs.Therefore,youmustspecifyyourmodelusinganepochsargument.Youcansetthespecificnumberofinstances,whichwillbeevaluatedbeforeyouupdateweight.
Thisiscalledthebatchsetandsiteyoucallusingthebatchsizeargument.Forthisspecificproblem,youwillrunasmallnumberofiterations,hundredandfifty.Youwillalsouseasmallbatchsizeoften.Asmentionedbefore,thesecanbechosenexperimentallybyyourmodelusingtrialanderror.Duringthisstep,thecomputationoccursonyourGPUorCPU.
model.fit(X,Y,epochs=150,batch_size=10)
Onceyouaredonewithfittingyourmodel,youmustevaluateitasthenextstep.Uptothispoint,youhavetrainedyourneuralnetworkontheentiredataset,sonowyoucaneasilyevaluatetheoverallperformanceofyourneuralnetworkonthesamedataset.Thiswillgiveyouthebestideaonhowwellyouhavejustmodeledtheobtaineddatasetalsoknownastrainaccuracy.
However,youwillhavenoideaofhowwellyourmodelmayperformonthenewdata.Youhavedonethismainlyforsimplicity,butanidealpathistoseparateyourdataintotestandtraindatasetsforevaluationandtrainingofyourmodel.
Youcansurelyevaluateyourmodelonyourtrainingdatasetusingthespecificevaluatefunctiononyourmodelandthenpassthatsameinputandoutputyouhaveusedtotrainyourmodel.
Thiswillresultinapredictionforeveryinputandoutputpair,soyoucollectscoresinpatientsincludingaveragelossandanyotherimportantmetricsyouhavejustconfiguredlikeaccuracy.
scores=model.evaluate(X,Y)
print("\n%s:%.2f%%"%(model.metrics_names[1],scores[1]*100))
Onceyoutieeverythingtogether,yougetyourfirstneuralnetworkmodelyouhavejustcreatedinKeras.Yourcompletecodewilllookasfollows.
fromkeras.modelsimportSequential
fromkeras.layersimportDense
importnumpy
numpy.random.seed(7)
dataset=numpy.loadtxt("pima–indians–diabetes.csv",delimiter=",")
X=dataset[:,0:8]
Y=dataset[:,8]
model=Sequential()
model.add(Dense(12,input_dim=8,activation='relu'))
model.add(Dense(8,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(X,Y,epochs=150,batch_size=10)
scores=model.evaluate(X,Y)
print("\n%s:%.2f%%"%(model.metrics_names[1],scores[1]*100))
Runningthis,youshouldseeamessageforeveryofthehundredandfiftyepochsthatprintboththeaccuracyandlossforeach,followedbythefinalevaluationofyourtrainedmodelonyourtrainingdataset.Youshouldgetmessagelikefollowing.
...
Epoch145/150
768/768[==============================]-0s-loss:0.5105-acc:0.7396
Epoch146/150
768/768[==============================]-0s-loss:0.4900-acc:0.7591
Epoch147/150
768/768[==============================]-0s-loss:0.4939-acc:0.7565
Epoch148/150
768/768[==============================]-0s-loss:0.4766-acc:0.7773
Epoch149/150
768/768[==============================]-0s-loss:0.4883-acc:0.7591
Epoch150/150
768/768[==============================]-0s-loss:0.4827-acc:0.7656
32/768[>.............................]-ETA:0s
acc:78.26%
Youcanusethisneuralnetworkmodelyouhavejustcreatedformakingpredictionsaswell.However,youwillmustadaptourexamplefromtheabovejustabittouseitformakingpredictions.Makingpredictionsisveryeasyonceyoucallmodelpredictargument.
Inthiscase,youaregoingtouseasigmoidactivationfunctiononyourinputlayers,soyougetpredictionsintherangebetweenoneandzero.Inaddition,youcanquicklyconvertthemintobinarypredictionsforyourclassificationtaskbyjustroundingthem.
Torunpredictionsforeveryrecordcontainedinyourtrainingdata,youmustrun
codeasshownbelow.
fromkeras.modelsimportSequential
fromkeras.layersimportDense
importnumpy
seed=7
numpy.random.seed(seed)
dataset=numpy.loadtxt("pima–indians–diabetes.csv",delimiter=",")
X=dataset[:,0:8]
Y=dataset[:,8]
model=Sequential()
model.add(Dense(12,input_dim=8,init='uniform',activation='relu'))
model.add(Dense(8,init='uniform',activation='relu'))
model.add(Dense(1,init='uniform',activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(X,Y,epochs=150,batch_size=10,verbose=2)
predictions=model.predict(X)
rounded=[round(x[0])
print(rounded)
Runningthisneuralnetworkmodel,youwillprintthepredictionsforeveryinputpatternobtained.Inaddition,youcanusetheseobtainedpredictionsdirectlyinanapplication,ifrequired.
AsyounowknowhowtocreateyourneuralnetworkinKeras,wearegoingtomovetomorecomplexdeeplearningtasksthatyoucanefficientlyexecuteusingthepowerfulKerasPythonlibrary.
CHAPTER3:MULTI-LAYERPERCEPTRONNETWORKMODELS
ThepowerfulKerasPythonlibraryfordeeplearningproblemsmainlyfocusesonthecreationofdeeplearningmodelsasacollection,asequenceofmultiplelayers.Inthefollowingsectionofthebook,youaregoingtolearnhowtousesimplecomponentstocreatesimplemulti-layerperceptronnetworkmodelsusingKeras.
Asalreadymentionedinthebook,thesimplestmodelyoucancreateisdefinedinthesequentialclassthatisalinearstackofmultiplelayers.Youcancreateasequentialmodelandthendefineallincludedlayersasseenbelow.
fromkeras.modelsimportSequential
model=Sequential(...)
However,amodelusefulidiomistofirstcreateyoursequentialmodelandthenaddlayersforpropercomputationasillustratedbelow.
fromkeras.modelsimportSequential
model=Sequential()
model.add(...)
model.add(...)
model.add(...)
Oncedone,youmustaddmodelinputs.Bemindfulthatthefirstlayersinyourneuralnetworkmodelmustspecifytheshapeofyourinput.This,infact,isthetotalnumberofinputsattributedasdefinedbytheargumentinputdim.Thisargumentwillexpectaninteger.Forinstance,youcanreadilydefineyourinputintermsofeightinputsforyourdenselayerasfollows.
Dense(16,input_dim=8)
MODELLAYERS
Oncedone,youmustmodellayersofyourneuralmodel.Rememberthatlayersofdifferentkindusuallyhaveseveralpropertiesincommonespeciallytheiractivationfunctionsandtheirweightinitializationfunctions.Foryourmodels,youmustuseweightinitializationarguments.Thiskindofinitializationisusedforacertainlayer,whichisspecifiedintheinitargument.
Someofthemostcommonlyusedweightinitializationargumentsincludeuniform,normalandzero.Whenitcomestotheuniformweightinitialization,weightsareinitializedtosmalluniformlyrandomvaluesbetween0and0.5.Ontheotherhand,normalweightinitializationareweightsthatareinitializedtoasmallGaussianrandomvalue.Considerthatstandarddeviationiszeromeanof0.5.Thelasttypeiszerowhenallweightissettozerovalues.
Kerassupportsmanystandardneuronactivationfunctionsaswellsuchasrectifier,sigmoid,tanhandsoftmax.Youwillordinarilyspecifythetypeofyouractivationfunctionsusedbyaspecificlayerinyouractivationargumentthattakesastringvalue.Youcancreateanactivationobjectthatyoucanadddirectlytoyourmodeljustafteryouapplytheactivationfunctionstotheoutputofthespecificlayer.
ThereisawiderangeofdifferentcorelayersinKerasusedforstandardneuralnetworks.Someofthemostusefulandroutinelyusedcorelayertypesincludedense,dropoutandmergelayers.Denseisafully-connectedlayerusedmostoftenonmulti-layerperceptronmodels.Dropoutcorelayersapplydropouttotheneuralnetworkmodelbysettingafractionofinputstozerotoreducevery
commonissue,overfitting.MergecorelayerscombinetheinputsfromseveralKerasmodelsintoasingleKerasmodel.
MODELCOMPILATION
Asyoualreadyknow,onceyouhavedonedefiningyourmodel,youmustcompileit.Themodelcompilationwillcreatethehighlyefficientstructurethatwillbeusedbytheunderlyingbackend,TensorFloworTheano,toefficientlyexecuteyourneuralnetworkmodelduringthetrainingprocess.Youcancompileyourneuralnetworkmodelusingthecompileargument.Itwillacceptthreeimportantattributesofyourmodelincludinglossfunction,modeloptimizer,andmetrics.
model.compile(optimizer=,loss=,metrics=)
Whenitcomestothemodeloptimizers,theoptimizeristhemainsearchtechniquerecurrentlyusedtoupdateweightinyourneuralnetworkmodel.Youcancreateanoptimizerobjectandpassittoyourcompilefunctionusingtheoptimizerargument.Thiswillallowyoutoeffortlesslyconfiguretheoveralloptimizationprocesswithitsownargumentslikelearningaspecificrate.
sgd=SGD(...)
model.compile(optimizer=sgd)
Youcanalsousethedefaultparametersofyouroptimizerbyjustspecifyingthenameofthespecificoptimizertoyouroptimizerargumentasshownbelow.
model.compile(optimizer='sgd')
SomefrequentlyusedgradientdescentoptimizersyoucanuseincludeSGD,RMSpropandAdam.TheSGDisusuallyusedstochasticgradientdescentwithgreatsupportformomentum.TheRMSpropisoftenusedinadaptivelearningrateoptimizationmethodswhileAdam,shortforAdaptiveMomentEstimation,otheradaptivelearningrates.
Onceyouusetherightoptimizer,youmustmovetomodellossfunctions.Thelossfunctionisalsocalledtheobjectivefunction.Itistheevaluationoftheneuralnetworkmodelsusedbytheoptimizerstonavigatetheweightspace.
Youcanquicklyspecifythenameofyourlossfunction,whichwillbefurtherusedbythecompilefunctions.SomeofthemostnormallyusedlossfunctionargumentsincludeMSEformeansquarederror,categoricalcrossentropyfornumerousmulti-classlogarithmictasksandbinarycrossentropyforbinarylogarithmicloss.
Onceyouobtainyourmodellossfunction,youmovetometrics.Metricsareevaluatedduringtheprocessofmodeltraining.Considerthatonlyonemetricissupportedatthetimeandthatisforaccuracy.
MODELTRAINING
YourneuralnetworkmodelistrainedonNumPyarrays.Youwillusethefitfunctionasseenbelow.
model.fit(X,y,epochs=,batch_size=)
Modeltrainingbothspecifiesthenumberofepochsyoumusttrainandthebatchsize.Asalreadymentionedinthebook,epochsarethetotalnumberoftimesyourmodelisexposedtothedatasetusedfortrainingwhilethebatchsizeisthetotalnumberoftraininginstancesshowntoyourmodelbeforeyouperformweightupdate.
Asmentioned,formodeltrainingyouaregoingtothefitfunctionswhichallowsabasicevaluationtobeperformedonyourmodelduringmodeltraining.Youcanhandilysetthevalidationsplitvaluetoholdbackacertainfractionofyourtrainingdatasetforfurthervalidationtobeevaluatedbyeachepoch.YoumayuseavalidationdatatupleofYandXofdatatoevaluate.Moreover,fittingyourmodelreturnsahistoryobjectwithmetricsanddetailspreviouslycalculatedfortheneuralnetworkmodeleachepoch.Thisisinvariablyusedforgraphingyourmodel’soverallperformance.
MODELPREDICTION
Onceyouaredonewithtrainingyourneuralnetworkmodel,youcanuseittomakepredictionsonyourtestdataoronothernewdata.Thereisawiderangeofdifferentoutputtypesyoucancalculatefromyourtrainedneuralnetworkmodel.Eachofthesemodelsiscalculatedusingadifferentfunctionyoucallonyourneuralnetworkmodel.
Forinstance,youcanusemodelevaluatefunctiontocalculatethelossvaluesofyourinputdataoryoucanusemodelpredictstogenerateyournetworkoutputforyourinputdata.Youhaveanoptiontousemodelpredictprobaargumenttogenerateclassprobabilitiesforyourinputdataorusemodelpredictclassesfunctiontogeneratedifferentclassoutputsforyourinputdata.Onsomeclassificationproblems,youmustusethepredictclassesargumenttomakedifferentpredictionsfornewdatainstancesorfortestdata.
Onceyouarehappywithyourmodelanditsproperties,youcanfinalizeit.Youmayneedasummaryofyourmodel.Ifso,youcanreadilydisplayasummaryofyourneuralnetworkmodelbycallingtheroutinelyusedsummaryfunctionasfollows.
model.summary()
Youhaveanoptiontoretrieveyourmodelsummaryusingthegetconfigargumentasfollows.
model.get_config()
Finally,youhaveanoptiontocreateanimageofyourneuralnetworkmodelstructureasseenbelow.
fromkeras.utils.visutilsimportplotmodel
plot(model,to_file='model.png')
Therefore,inthissectionofthebook,youdiscoveredtheKerasAPI,whichyoucanusetocreateinnumerabledeeplearningandartificialneuralnetworkmodels.Youhavelearnedhowtoconstructamulti-layerneuralnetworkmodel,howtoaddmultiplelayersincludingactivationandweightinitialization.Youhavethuslearnedhowtocompileyourneuralnetworkmodelusingseveraloptimizersincludingmetricsandlossfunctions.Now,youknowhowtofityourmodelsincludingbatchsizeandepochsaswellashowtomakemodelpredictionsandsummarizeyourmodel.
CHAPTER4:ACTIVATIONFUNCTIONSFORNEURAL
NETWORKS
Inthissectionofthebook,wearegoingtogivemoreattentiontomostregularlyusedactivationfunctionsinneuralnetworks.Inthisexample,wearegoingtousetheMNIST.MNISTdataisasetofapproximately70000photosofmiscellaneoushandwrittendigitswhereeachphotoisblackandwhiteand28x28insize.Wearegoingtosolvethisspecificproblemusingafullyconnectedneuralnetworkwithseveraldifferentactivationfunctions.
Ourinputdatawillbe70000,784whileouroutputshapewillbe70000,10.Therefore,weuseafullyconnectedneuralnetworkmodelwithonehiddenlayer.Thereare784neuronscontainedintheinputlayer,oneforeverypixelinthephotosandthereare521neuronscontainedinthehiddenlayer.Intheoutputlayer,thereare10neuronsforeverydigit.UsingKeras,wecanutilizeseveraldifferentactivationfunctionsforeverylayerinourneuralnetworkmodel.Thismeansthatinthiscase,wemustdecidewhichactivationfunctionsshouldbeusedintheoutputlayerandwhichactivationfunctionshouldbeusedinthehiddenlayer.Therearemanydifferentactivationfunctions,butmostoftenusedarerelu,tanhandsigmoid.Firstly,wewillnotuseanyactivationfunctionstostartbuildingabasicsequentialmodel.
model=Sequential()model.add(Dense(512,input_shape=(784,)))model.add(Dense(10,activation='softmax'))Asalreadymentioned,intheinputthere
are784neurons,inthehiddenlayerthereare512,andthereare10neuronscontainedintheoutputlayer.Beforeyoutrainyourmodel,youcanlookatyourneuralnetworkstructureandparametersusingmodelsummaryargumentasillustratedbelow.
Layers(input==>output)--------------------------dense_1(None,784)==>(None,512)dense_2(None,512)==>(None,10)SummaryLayer(type)OutputShapeParam#=================================================================dense_1(Dense)(None,512)401920_________________________________________________________________output(Dense)(None,10)5130=================================================================Totalparams:407,050Trainableparams:407,050Non-trainableparams:0_________________________________________________________________NoneOnceyouaresureaboutthestructureofyourmodel,youmusttrainitforfiveepochs.
Trainon60000samples,validateon10000samplesEpoch1/560000/60000[==============================]-3s-loss:0.3813-acc:0.8901-val_loss:0.2985-val_acc:0.9178Epoch2/560000/60000[==============================]-3s-loss:0.3100-acc:0.9132-val_loss:0.2977-val_acc:0.9196Epoch3/560000/60000[==============================]-3s-loss:0.2965-acc:0.9172-val_loss:0.2955-val_acc:0.9186Epoch4/560000/60000[==============================]-3s-loss:0.2873-acc:0.9209-val_loss:0.2857-val_acc:0.9245Epoch5/560000/60000[==============================]-3s-loss:0.2829-acc:0.9214-val_loss:0.2982-val_acc:0.9185
Testloss:,0.299Testaccuracy:0.918
Asyoucansee,ourresultsof91.8%usingMNISTisquitebad.Whenyouplotthelosses,youwillseethatthevalidationlossisfarawayfromimprovinganditwillnotimproveevenafterhundredepochs.
Therefore,wemusttrydifferenttechniquestopreventacommonproblemofover-fittingfromoccurring.Weneedmoretechniquestomakeourneuralnetworkmodellearningbetterandworkingsmarter.Wecanachievethisbyusingoneofthemostcustomarilyusedactivationfunctions,thesigmoidactivationfunction.
SIGMOIDACTIVATIONFUNCTION
Toimproveourneuralnetworkmodel,wewillusesigmoidactivationfunction.Itwillsquashourinputintoa0,1interval.
model=Sequential()model.add(Dense(512,activation='sigmoid',input_shape=(784,)))model.add(Dense(10,activation='softmax'))Youwillseethatthestructureofyourneuralnetworkremainedthesameasyoujusthavechangedtheactivationfunctionofyourdenselayer.Youcantrythesameforfiveepochs.
Trainon60000samples,validateon10000samplesEpoch1/560000/60000[==============================]-3s-loss:0.4224-acc:0.8864-val_loss:0.2617-val_acc:0.9237Epoch2/560000/60000[==============================]-3s-loss:0.2359-acc:0.9310-val_loss:0.1989-val_acc:0.9409Epoch3/560000/60000[==============================]-3s-loss:0.1785-acc:0.9477-val_loss:0.1501-val_acc:0.9550Epoch4/560000/60000[==============================]-3s-loss:0.1379-acc:0.9598-val_loss:0.1272-val_acc:0.9629Epoch5/560000/60000[==============================]-3s-loss:0.1116-acc:0.9673-val_loss:0.1131-val_acc:0.9668
Testloss:0.113Testaccuracy:0.967
Thislooksmuchbetter.Yougetalinearcombinationofyourinputwiththebiasandtheweightsevenafteryoustackedmanylayers.Thisisverysimilartoaneuralnetworkwithoutanyhiddenlayers.Youcanaddsomemorelayersjusttoseewhatwilloccurasshownbelow.
model=Sequential()model.add(Dense(512,input_shape=(784,)))
foriinrange(5):model.add(Dense(512))
model.add(Dense(10,activation='softmax'))
Whenyoudothis,yougetyourneuralnetworkmodellookingasindicatedbelow.
Dense_1(None,784)==>(None,512)dense_2(None,512)==>(None,512)dense_3(None,512)==>(None,512)dense_4(None,512)==>(None,512)dense_5(None,512)==>(None,512)dense_6(None,512)==>(None,10)
_________________________________________________________________Layer(type)OutputShapeParam#=================================================================dense_1(Dense)(None,512)401920_________________________________________________________________dense_2(Dense)(None,512)262656_________________________________________________________________dense_3(Dense)(None,512)262656_________________________________________________________________dense_4(Dense)(None,512)262656_________________________________________________________________dense_5(Dense)(None,512)262656_________________________________________________________________dense_16(Dense)(None,10)5130=================================================================Totalparams:1,720,330Trainableparams:1,720,330Non-trainableparams:0_________________________________________________________________NoneYougetresultsforfiveepochsasfollows.
Trainon60000samples,validateon10000samples
Epoch1/560000/60000[==============================]-17s-loss:1.3217-acc:0.7310-val_loss:0.7553-val_acc:0.7928Epoch2/560000/60000[==============================]-16s-loss:0.5304-acc:0.8425-val_loss:0.4121-val_acc:0.8787Epoch3/560000/60000[==============================]-15s-loss:0.4325-acc:0.8724-val_loss:0.3683-val_acc:0.9005Epoch4/560000/60000[==============================]-16s-loss:0.3936-acc:0.8852-val_loss:0.3638-val_acc:0.8953Epoch5/560000/60000[==============================]-16s-loss:0.3712-acc:0.8945-val_loss:0.4163-val_acc:0.8767
Testloss:0.416Testaccuracy:0.877
Thisisquitebad.Youcanseethatyourneuralnetworkmodelisjustunabletolearnwhatyouwant.Thishappenedbecausewithoutnonlinearity,yourneuralnetworkisjustabasiclinearclassifierunableofacquiringanynonlinearrelationships.
Ontheotherhand,sigmoidisalwaysanonlinearfunction,sowecannotrepresentitasalinealcombinationofourinput.Thatisexactlywhatbringsnonlinearitytoyourneuralnetworkmodel,soitcanlearnanynonlinearrelationships.Now,trainyourneuralnetworkmodel.Trainthefive-hiddenlayermodelusingsigmoidactivations.
Trainon60000samples,validateon10000samplesEpoch1/560000/60000[==============================]-16s-loss:0.8012-acc:0.7228-val_loss:0.3798-val_acc:0.8949Epoch2/560000/60000[==============================]-15s-loss:0.3078-acc:0.9131-val_loss:0.2642-val_acc:0.9264
Epoch3/560000/60000[==============================]-15s-loss:0.2031-acc:0.9419-val_loss:0.2095-val_acc:0.9408Epoch4/560000/60000[==============================]-15s-loss:0.1545-acc:0.9544-val_loss:0.2434-val_acc:0.9282Epoch5/560000/60000[==============================]-15s-loss:0.1236-acc:0.9633-val_loss:0.1504-val_acc:0.9548
Testloss:0.15Testaccuracy:0.955
Thisismuchbetter.Inthiscase,youareprobablyover-fitting,butyoucanseethatyougotagreatboostinyourmodel’sperformancejustbyusingtheactivationfunction.Sigmoidactivationfunctionsaregreatastheyhavemanyphenomenalpropertieslikedifferentiability,nonlinearityandthis0,1rangegivesusanamazingprobabilityofreturnvaluesthatisanicefunction.
However,thisapproachhasitsdrawbacks.Forinstance,whenyouusebackpropagation,youmustback-propagatethederivativefromyourinputbacktoitsinitialweights.Youwanttopassyourregressionorclassificationerrorinthatfinaloutputvaluebackthroughyourwholeneuralnetwork.
Therefore,youmustderiveyourlayersaswellasupdateallweights.However,withsigmoid,thereisanissuewithaderivative.Withsigmoid,themaxvalueofthederivativeisquitesmalljustaround0.25.Thismeansyoucanonlypassasmallfractionofyourerrortoyourpreviousneuralnetworklayers.Thisissuemaycauseyourmodeltolearnslow,soitneedsmoreepochsanddata.Tosolvethisproblem,youcanusethetanhfunction.
TANHACTIVATIONFUNCTION
Tanhactivationfunction,justlikesigmoid,isdifferentiableandnonlinear.Tanhactivationfunctionsgiveoutputwhichisinthe-1,1rangewhichisnotasniceas0,1range.However,thisisokayforneuralnetworkhiddenlayers.Tanhfunctionsalsohavemaxedderivative,whichisgoodforourissuehereaswecaneasilypassourerror,whichwasnotthecasewithsigmoidfunctions.
Tousethetanhactivationfunction,youmustchangetheactivationattributeofyourdenselayer.
model=Sequential()model.add(Dense(512,activation=’tanh’,input_shape=(784,)))model.add(Dense(10,activation=’softmax’))Again,youcanseethatthestructureofyourneuralnetworkisthesame.Now,trainforfiveepochs.
Trainon60000samples,validateon10000samplesEpoch1/560000/60000[==============================]-5s-loss:0.3333-acc:0.9006-val_loss:0.2106-val_acc:0.9383Epoch2/560000/60000[==============================]-3s-loss:0.1754-acc:0.9489-val_loss:0.1485-val_acc:0.9567Epoch3/560000/60000[==============================]-3s-loss:0.1165-acc:0.9657-val_loss:0.1082-val_acc:0.9670Epoch4/560000/60000[==============================]-3s-loss:0.0843-acc:0.9750-val_loss:0.0920-val_acc:0.9717Epoch5/560000/60000[==============================]-3s-loss:0.0653-acc:0.9806-val_loss:0.0730-val_acc:0.9782
Testloss:0.073Testaccuracy:0.978
Youcanseethatyouimprovedyourtestaccuracybymorethanonepercentjustbyusingadifferentactivationfunction.Now,youprobablywonder,canyoudobetter?Fortunately,youcanthankstothereluactivationfunction.
RELUACTIVATIONFUNCTION
Therangeofreluactivationfunctionsis0toinfinity.However,unliketanhandsigmoidfunctions,reluisbothdifferentiableatzeroeventhoughtherearesolutionstothis.
Thebestthingaboutreluactivationfunctionisitsgradient,whichisalwaysequaltoone,sothiswayyoucaneasilypassthemaximumamountoftheerrorduringbackpropagation.Now,trainyourmodelandseetheresults.
Trainon60000samples,validateon10000samplesEpoch1/560000/60000[==============================]-5s-loss:0.2553-acc:0.9263-val_loss:0.1505-val_acc:0.9516Epoch2/560000/60000[==============================]-3s-loss:0.1041-acc:0.9693-val_loss:0.0920-val_acc:0.9719Epoch3/560000/60000[==============================]-3s-loss:0.0690-acc:0.9790-val_loss:0.0833-val_acc:0.9744Epoch4/560000/60000[==============================]-4s-loss:0.0493-acc:0.9844-val_loss:0.0715-val_acc:0.9781Epoch5/560000/60000[==============================]-3s-loss:0.0376-acc:0.9885-val_loss:0.0645-val_acc:0.9823
Testloss:0.064Testaccuracy:0.982
Now,yougothebestresultof98.2%.Thisisquiteamazing,andyoudidnotuseanyhiddenlayer.
Itisveryimportanttosaythereisnobestactivationfunctionyoucanuse.Onemaybebetterinsomecaseswhileanotherisbetterinotherinstances.Anotherimportantthingtosayisthatusingdifferentactivationfunctionsdoesnotinanywayaffectwhatyourneuralnetworkcanlearnandhowfast.
CHAPTER5:MNISTHANDWRITTENRECOGNITION
Inthissectionofthebook,wearegoingtobuildasimpleneuralnetworkinKerasandtrainitonaGPU-enabledserver.ThismodelwillbeabletorecognizehandwrittendigitsthankstotheMNISTdataset.Asyoualreadyknow,MNISTcontains70000images,10000fortestingand60000fortraining.Allimagesare28x28pixels,centeredtoreducepreprocessingtimes.
Tostart,youmustsetyourenvironmentwithKerasusingTheanoorTensorFlowasthebackend.Inthisexample,wearegoingtousetheTensorFlowandKeraspackagesasshownbelow.
condainstall-qy-canacondatensorflow-gpuh5py
pipinstallkeras
Theseimportsarequitestandard.Oncedone,youmustimportKerasimportsasfollows.Youwillimportplottingandarray-handling.Oncecomplete,youmustkeepyourKerasbackendTensorFlowquiet.
importnumpyasnp
importmatplotlib
matplotlib.use('agg')
importmatplotlib.pyplotasplt
importos
os.environ['TFCPPMINLOGLEVEL']='3'
fromkeras.datasetsimportmnist
fromkeras.modelsimportSequential,load_model
fromkeras.layers.coreimportDense,Dropout,Activation
fromkeras.utilsimportnp_utils
Afterthatisdone,Keraswillimportthedatasetandbuilditonyourneuralnetwork.Thefollowingstepistopreparethedatasetwearegoingtouse,MNIST.YoumustloadthedatasetusingthisveryhandyfunctionthatwillsplitMNISTintotestsetsandtrainsets.
(Xtrain,ytrain),(Xtest,ytest)=mnist.load_data()
Thenextstepistoinspectseveralexamples.TakeintoconsiderationthatMNISTcontainsonlygrayscaleimages,soformoreadvanceddatasets,wewill
useRGBorthree-colorchannels.
fig=plt.figure()
foriinrange(9):
plt.subplot(3,3,i+1)
plt.tight_layout()
plt.imshow(X_train[i],cmap='gray',interpolation='none')
plt.title("Class{}".format(y_train[i]))
plt.xticks([])
plt.yticks([])
fig
Next,youbegintotrainyourmodeltoclassifyimages.Todothis,youmustunrollthewidthheightpixelformatintoonehugevector,yourinputvectors.Therefore,graphthedistributionofyourpixelvaluesasfollows.
fig=plt.figure()
plt.subplot(2,1,1)
plt.imshow(X_train[0],cmap='gray',interpolation='none')
plt.title("Class{}".format(y_train[0]))
plt.xticks([])
plt.yticks([])
plt.subplot(2,1,2)
plt.hist(X_train[0].reshape(784))
plt.title("PixelValueDistribution")
fig
Justasexpected,yougetapixelvaluerangingfromzeroto255.Inthiscase,thebackgroundmajorityisclosertozerowhilethosepixelscloserto255representMNISTdigits.Tospeedupthemodeltraining,youshouldnormalizetheinputdata.Normalizingyourinputdata,youalsoreducethechanceofyourmodelgettingstuckinlocaloptimaasyouareusingstochasticgradientdescenttofindtheoptimalweightsforyourneuralnetwork.
Thenextstepisreshapingyourinputstoasinglevectorandnormalizingthepixelvaluetobebetweenzeroandone.Todoso,youmustprinttheshapebeforeyoucannormalizeandreshapeit.
Print("X_trainshape",X_train.shape)
Print("y_trainshape",y_train.shape)
Print("X_testshape",X_test.shape)
print("y_testshape",y_test.shape)
Afterthat,youmustbuildyourinputvectorfrom28x28pixelsasseenbelow.
X_train=X_train.reshape(60000,784)
X_test=X_test.reshape(10000,784)
X_train=X_train.astype('float32')
X_test=X_test.astype('float32')
Thenextstepistonormalizethedatatoboostyourmodeltraining.
X_train/=255
X_test/=255
Thefollowingstepistoprintyourfinalinputshape,whichisreadyfortraining.
print("Trainmatrixshape",X_train.shape)
print("Testmatrixshape",X_test.shape)
('X_trainshape',(60000,28,28))
('y_trainshape',(60000,))
('X_testshape',(10000,28,28))
('y_testshape',(10000,))
('Trainmatrixshape',(60000,784))
('Testmatrixshape',(10000,784))
Asyoucansee,Yinthistrainingmodelholdsintegervaluesfromzerotonine.Useitformodeltraining.
print(np.unique(y_train,return_counts=True))
(array([0,1,2,3,4,5,6,7,8,9],dtype=uint8),array([5923,6742,5958,6131,5842,5421,5918,6265,5851,5949]))
Thenextstepistoencodeyourcategories,digitsfromzerotonineusingone-hotencoding.Youwillgettheresultthatisavectorwithalengthequaltoyournumberofcategories.Inaddition,thevectoryougetisallzerosexceptinthemiddleposition.
N_classes=10
Print("",y_train.shape)
Y_train=nputils.tocategorical(ytrain,nclasses)
Y_test=nputils.tocategorical(y_test,n_classes)
print(":",Y_train.shape)
ThenextstepistoturntoKerastobuildyourneuralnetwork.Atthispoint,yourpixelvectorservesastheinput.Therearetwohidden512-nodelayersaswell.Therefore,thereismodelcomplexityyouwilluseforrecognizingdigits.Wemustaddanotherfully-connectedlayerforthetendifferentoutputclassesduetomulti-classclassification.Further,wearegoingtousethesequentialmodel.Thefirststepistostackmorelayersusingtheaddargument.
WhenyouaddyourfirstlayerintheKerassequentialmodel,youmustspecifyyourinputshapeforKerastocreatethepropermatriceswhiletheshapeforotherlayersisinferredbyKerasautomatically.Tointroducenonlinearitiesintoyournetworkandtoevaluateitbeyondcapabilitiesofabasicperceptron,youmustaddactivationfunctionstoyourhiddenlayers.
Thisdifferentiationforthemodeltrainingusingbackpropagationisoccurringbehindthescenes.Youmustaddadropoutmodelasthebestwayforpreventingmodelover-fitting.Youwillusethesoftmaxactivationasthestandardforeverymulti-classtargets.Whenbuildingyourmodel,thefirststepistobuildalinearstackoflayers.
model=Sequential()
model.add(Dense(512,input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
Followingthat,youmustcompileyourmodelusingthecompileargument.Inthisstep,youmustspecifyyourobjectiveorlossfunction.Inthisexample,wearegoingtoseecategoricalcrossentropy,butyoucanuseanyotherlossfunctions.
Whenitcomestotheoptimizers,inthisexamplewearegoingtousetheAdamoptimizerwithdefaultsettings.Youcaninstantiateyouroptimizerandsetparametersjustbeforeyouusethemodelcompileargument.Youmustchoosewhichmetricsyouwanttoevaluateduringmodeltrainingandtesting.Youcanhavemetricsdisplaysduringyourtestingandtrainingstageifyoulike.Tocompileyourmodel,doasfollows.
Model.compile(loss='categorical_crossentropy',metrics=['accuracy'],optimizer='adam')
Onceyoucompileyourmodel,youcanmovetomodeltraining.Youmustspecifyhowmanytimesyouwanttoiterateyourtrainingsetorepochs.Youmustspecifyhowmanysamplesyouwanttoupdatetothemodel’sbatchsizeorweights.
Keepinmindthatbiggerthebatch,themorestablestochasticgradientdescentupdatesbecome.However,beawareofGPUmemorylimitation.Inthisexample,wearegoingwithabatchsizeof8and128epochs.
Tohandleyourmodeltrainingprocesscorrectly,youshouldgraphthelearningcurveforyourmodellookingonlyatthemodelaccuracyandloss.Beforeyoucontinue,youshouldsaveyourmodel.Oncecompleted,yougettoworkwithyourtrainedmodelandfinallyevaluateitsperformance.Tosavemetricsrunasshownbelow.
history=model.fit(Xtrain,Ytrain,
batch_size=128,epochs=8,
verbose=2,
validation_data=(Xtest,Ytest))
Then,youmustsaveyourmodel.
Save_dir="/results/"
Model_name='keras_mnist.h5'
Modelpath=os.path.join(savedir,model_name)
Model.save(model_path)
print('Savedtrainedmodelat%s'%model_path)
Thenextstepistoplotthemetrics.
fig=plt.figure()
plt.subplot(2,1,1)
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('modelaccuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train','test'],loc='lowerright')
plt.subplot(2,1,2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('modelloss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train','test'],loc='upperright')
plt.tight_layout()
fig
Youwillnoticethatthelossonyourtrainingsetisdecreasingrapidlywhenitcomestothefirsttwoepochs.Thismeansthatyourneuralmodelislearningtoclassifyhandwrittendigitsquitefast.Whenitcomestothetestset,thelosswillnotbedecreasingasfast,butitwillstaywithintherangeofthetrainingloss.Thismeansthatyourmodelisableofgeneralizingwelltodata,whichisunseen.
Thefollowingstepistoevaluateyourmodel’sperformance.Inthisstep,youaregoingtoseehowwellyourmodelperformsonthegiventestset.Toassessyourmodel,usemodelevaluatesfromtheargumentthatcomputeanymetricdefinedduringthemodelcompileprocess.Inthisexample,themodelaccuracyiscomputedonthe10000imagesoftestingexamplesusingthemodelweightsgivenbyoursavedmodel.
Mnistmodel=loadmodel
Loss_andmetrics=mnistmodel.evaluate(Xtest,Ytest,verbose=2)
print("TestLoss",lossandmetrics[0])
print("TestAccuracy",lossandmetrics[1])
('TestLoss',0.06264158328680787)
('TestAccuracy',0.98299999999999998)
Youwillgetthismodelaccuracythatlooksquitegood.However,youshouldlookatnineexampleseachsoyouevaluatebothincorrectlyandcorrectlyclassifiedexamples.Thefirststepistoloadthemodelandcreatepredictionsonyourtestset.
Mnistmodel=loadmodel
Predictedclasses=mnistmodel.predictclasses(Xtest)
Then,seewhatyoupredictedcorrectlyandincorrectly.
Correctindices=np.nonzero(predictedclasses==y_test)[0]
Incorrect_indices=np.nonzero(predictedclasses!=ytest)[0]
Print()
Print(len(correct_indices),"classifiedcorrectly")
Print(len(incorrect_indices),"classifiedincorrectly")
Thefollowingstepistoadaptthefiguresizetoaccommodateeighteensubplotsasfollows.
Plt.rcParams['figure.figsize']=(7,14)
Figure_evaluation=plt.figure()
Then,youmustplotninecorrectandnineincorrectpredictions.
(correct_indices[:9]):
Plt.subplot(6,3,i+1)
Plt.imshow(X_test[correct].reshape(28,28),cmap='gray',interpolation='none')
Plt.title("Predicted{},Class{}".format(predicted_classes[correct],y_test[correct]))
Plt.xticks([])
Plt.yticks([])
(incorrect_indices[:9]):
Plt.subplot(6,3,i+10)
Plt.imshow(X_test[incorrect].reshape(28,28),cmap='gray',interpolation='none')
Plt.title("Predicted{},Class{}".format(predicted_classes[incorrect],y_test[incorrect]))
Plt.xticks([])
Plt.yticks([])
Figure_evaluation
9696/10000[============================>.]-ETA:0s()
(9830,'classifiedcorrectly')
(170,'classifiedincorrectly')
Asyoucansee,theseincorrectpredictionsarequiteforgivableasinsomecasesitishardforthehumanreadertorecognize.Inthissectionofthebook,weusedKeraswithitsbackendTensorFlowonaGPUenabledservertotrainourneuralnetworktorecognizethehandwrittendigitsinjustunder20secondsofoveralltrainingtime.
CHAPTER6:NEURALNETWORKMODELSFORMULTI-CLASSCLASSIFICATIONPROBLEMS
Asyoualreadyknow,KerasisahighlypowerfulpartofthePythonlibraryfordeeplearning,whichwrapstheefficientnumericallibrariesTensorFlowandTheano.Inthissectionofthebook,youaregoingtolearnhowtouseKerastodevelopandevaluateyourneuralnetworkmodelsyoucanuseforassortedmulti-classclassificationproblems.
Afterthat,youwillknowhowtoloaddatafromCSVtoKeras,youwillhowtopreparemulti-classclassificationdataforfurthermodelingwithyourneuralnetworks,andyouwillknowhowtoevaluateyourKerasneuralnetworkmodelsusingscikit-learn.
Inthisspecificexample,wearegoingtoseeoneofthestandardmachinelearningproblemsnamedirisflowersdataset.Thisproblemiswell-studied,soitisthebestexampletousewhenyouwanttopracticemoreonneuralnetworksasallfourinputvariablesarenumericmeaningandtheyhavethesamescalerepresentedincentimeters.Inaddition,everyinstancedescribesthepropertiesoftheflowermeasurements,sotheoutputvariablesareveryspecificirisspecies.
Thisisastandardmulti-classclassificationproblem.Thismeanstherewillbemorethantwoclassesyoumustpredict,astherewillbethreeflowerspecies.
ThisisthebestillustrationtousewhenyouwanttopracticeonneuralnetworkmodelsinKerasbecausethesethreeclassvaluesrequireveryspecializedhandling.Sincethisirisflowerdatasetisaverycommonandwell-studiedproblem,expecttogetyourmodelaccuracysomewherebetween95%to97%.Tostart,youmustdownloadthisirisflowerdatasetfromtheMachineLearningrepository.Oncedownloaded,placeitinyourworkingdirectory.Thenextstepistoimportallclassesandfunction.Thisincludesboththedataloadingfrompandasaswellasdatapreparation.Youneedmodelevaluationfromscikit-learn.
importnumpy
importpandas
fromkeras.modelsimportSequential
fromkeras.layersimportDense
fromkeras.wrappers.scikit_learnimportKerasClassifier
fromkeras.utilsimportnp_utils
fromsklearn.modelselectionimportcrossval_score
fromsklearn.model_selectionimportKFold
fromsklearn.preprocessingimportLabelEncoder
fromsklearn.pipelineimportPipeline
Thenextstepistoinitializearandomnumbergeneratortoaconstantvalueofseven.Thisisveryimportant,asyouwanttoensurethattheresultsyougetfromthisneuralnetworkmodelcaninfactbeachievedagain.
Thisstepensuresthatthestochasticprocessofmodeltrainingcanbereproduced.Therefore,yournextstepistofixrandomseedforreproducibilityasseenbelow.
seed=7
numpy.random.seed(seed)
Oncefixed,youmustloadthedatasetdirectly.Youcandoitdirectlyastheoutputvariablecontainsstrings,sothebestwayistoloadthedatausingpandas.Inaddition,youcansplitattributedintoinputvariablesasXandoutputvariablesasY.
dataframe=pandas.read_csv("iris.csv",header=None)
dataset=dataframe.values
X=dataset[:,0:4].astype(float)
Y=dataset[:,4]
ONE-HOTENCODING
Asalreadymentioned,inthisexample,theoutputvariablecontainsthreestringvalues.Therefore,youmustencodetheoutputvariables.Whenyoumodelmulti-classclassificationproblemswithneuralnetworks,thebestwayistoreshapeyouroutputattributedfromavectorwhichcontainsvaluesforeveryclasstoamatrixthathasaBooleanforeveryclassvaluenomattergiveninstanceoftheclassvalue.Thisiscalledcreatingdummyvariablesorone-hotencodingofcategoricalvariables.
Forinstance,inthisproblemhere,therearethreeclassvaluesnamedIris-versicolor,Iris-setosaandIris-virginica.
Iris-setosa
Iris-versicolor
Iris-virginica
Inthiscase,youcanturnthisintoasingleone-hotencodedbinarymatrixforeverydatainstance,whichwouldlikeasshownbelow.
Iris-setosa,Iris-versicolor,Iris-virginica
1,0,0
0,1,00,0,1
Youcandothisbyjustencodingthestringstointegerswiththescikit-learnclass
LabelEncoder.Oncecompleted,youcanconverttheintegervectortoasingleone-hotencodingusingthefunctiontocategoricalinKerasasdemonstratedbelow.
encoder=LabelEncoder()
encoder.fit(Y)
encoded_Y=encoder.transform(Y)
dummy_y=nputils.tocategorical(encoded_Y)
DEFININGNEURALNETWORKMODELSWITHSCIKIT-LEARN
Oncedonewithone-hotencoding,youmustdefineyourneuralnetworkmodel.TheKeraslibrarycomeswithwrapperclasses,whichallowyoutouseyourneuralnetworkmodelsyoucreatedinKerasinscikit-learn.Plus,thereisaKerasClassifierclassthatcanbeusedasscikit-learnestimator.
ThisKerasClassifiertakesthenameofthefunctionsasanargument.Considerthatthisfunctionmustreturntoyourneuralnetworkmodelwhichisreadyfortraining.Further,wearegoingtocreateafunctionforthisirisclassificationproblem.
Onceyourunthecode,youwillcreateasimple,fully-connectedneuralnetworkcontainingonehiddenlayerwitheightneurons.Thishiddenlayerwillusearectifiedactivationfunctionthatisaverygoodpractice.Sincewealreadyusedone-hotencodingonthisirisdataset,youroutputlayersmustcreatethreeoutputvaluesforeachclass.Oncedone,theoutputvaluewiththebiggestvaluebecomestheclasspredictedbyyourneuralnetworkmodelasfollows.
4inputs->[8hiddennodes]->3outputs
Onethingshouldbenoted.Weusedasoftmaxactivationfunctioninouroutputlayertoensurethattheoutputvaluesofourmodelareintherangeofzeroandone,sotheymaybeusedasourpredictedprobabilities.
Finally,wemustusetheefficientAdamgradientdescentoptimizationmodelalongsidelogarithmiclossfunctionrepresentedasthecategoricalcrossentropyargumentinKeras.Therefore,thenextstepistodefineyourbaselinemodel.Oncecompleted,youmustcreateitandcompileitasbelow.
defbaseline_model()
model=Sequential()
model.add(Dense(8,input_dim=4,activation='relu'))
model.add(Dense(3,activation='softmax'))
model.(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
returnmodel
Afterthat,youcanfinallycreateyourKerasClassifierinscikit-learn.YoucanpassargumentsaswellduringtheconstructionofyourKerasClassifier,whichwillbequicklypassedontoyourfitfunction,whichyouwillusefortrainingyourneuralnetworkmodel.Following,wearegoingtopassthenumberofepochsastwohundredandbatchsizeas5tousethemduringmodeltraining.Bearinmindthatdebuggingisalsoturnedoffaswesetverbosetozero.
estimator=KerasClassifier(buildfn=baselinemodel,epochs=200,batch_size=5,verbose=0)
EVALUATINGMODELSWITHK-FOLDCROSSVALIDATION
Oncefinishedwiththepreviousstep,youmustevaluateyourneuralnetworkmodelonyourtrainingdata.Thepowerfulscikit-learnhasexcellentcapabilityofevaluatingneuralnetworkmodelsusingseveraldifferenttechniques.Thebestwayforevaluatingyourneuralnetworkmodelsisusingk-foldcrossvalidation.
Usingk-foldcrossvalidation,youcanevaluateyourmodelonyourdatasetusingaten-foldcrossvalidationargumentork-fold.Theprocessofevaluatingyourmodelwilltakeabouttenseconds.
Whenfinished,yourmodelwillreturnasanobjectwhichdescribestheevaluationofthetenconstructedmodelsforeachofthesplitsinthedatasetasshownbelow.
results=cross_valscore(estimator,X,dummyy,cv=kfold)
print("Baseline:%.2f%%(%.2f%%)"%(results.mean()*100,results.std()*100))
Oncecompleted,youwillseethattheresultsaresummarizedasboththestandardandmeandeviationofyourneuralnetworkmodelaccuracyonthedatasetweused.
Thisisaveryreasonableestimationoftheoverallperformanceofyourneuralnetworkmodelonthisunseendata.Thisiswellwithintherealmofknown
resultsforthisspecificproblemasyougetaccuracyasseenbelow.
Accuracy:97.33%(4.42%)
CHAPTER7:RECURRENTNEURALNETWORKS
Inthislastsectionofthebook,youaregoingtolearnhowtocreaterecurrentneuralnetworksinKeras.Recurrentneuralnetworksareaclassofneuralnetworkmodels,whichexploitthesequentialnatureoftheirinput.Suchinputscanbespeech,text,timeseries,andeverythingelsewheretheoccurrenceofanelementinthesequenceisdependentontheelements,whichappearedbeforeit.
AnRNNmodelcanbethoughtofasagraphofrecurrentneuralnetworkcellswhereeverycellperformsthesameoperationoneachelementinthesequence.
Recurrentneuralnetworksareveryflexible,sotheyhavebeenusedtosolvediverseproblemslikelanguagemodeling,speechrecognition,sentimentanalysis,machinetranslationandimagecaptioning,tonameafew.
Recurrentneuralnetworkscanbereadilyadaptedtomanykindsofproblemsjustbyrearrangingthewaythecellsaresituatedinthegraphs.Inthissectionofthebook,youaregoingtolearnmoreaboutLSTMorlongshort-termmemoryandGRUorgatedrecurrentunitmodels,abouttheirpowersandtheirlimitations.
BothGRUandLSTMaredrop-inreplacementsforthebasicrecurrentneuralnetworkcell,sojustbyreplacingtherecurrentneuralnetworkcellwithoneofthesetwovariationsyoucangetamajorperformanceboostinyournetwork.
WhileGRUandLSTMarenottheonlyvariants,theyhaveproventobethemostefficientforsolvingmostsequenceproblems.
SEQUENCECLASSIFICATIONWITHLSTMRECURRENTNEURALNETWORKS
Sequenceclassificationisacommonpredictivemodelingprobleminwhichyouhaveasequenceofinputsplacedovertimeorspace,andyourtaskistopredictaspecificcategoryforthatsequence.ApowerfultypeofneuralnetworkmodelcreatedtohandleproblemslikethisisaLSTMrecurrentneuralnetwork.
Thelongshort-termmemoryisatypeofrecurrentneuralnetworkordinarilyusedindeeplearningproblemsduetoitslargearchitectureswhichcanbesuccessfullytrained.Inthissection,youaregoingtolearnsequenceclassificationinKerasusingLSTMrecurrentneuralnetworks.
Whatmakesthisproblemdifficultisthatthespecificsequencescanvaryinlength,theymaysometimescontaintheverylargevocabularyoftheirinputsymbolsandtheymayrequireyourneuralnetworkmodeltolearnthatlong-termcontextofdiversedependenciesexistingbetweendifferentsymbolsinyourinputsentence.
TheproblemwearegoingtosolveistheIMDBmoviesentimentclassificationproblem.EachmoviereviewontheIMDBisavariablesequenceofwordswhileeachsentimentofeverymoviereviewmustbeclassified.WewillusetheIMDBdatasetthatcontains25,000moviereviewsbothgoodorbadfortestingandtraining.Theproblemhereistodeterminewhetherthemoviehasanegativeorpositivesentiment.
KerascomeswithbuiltinaccesstotheIMDBdataset.Toloadit,useIMDBloaddatafunction.Onceloaded,youcanuseitforyourdeeplearningmodels.Thewordsherehavebeenreplacedbyintegers,whichindicatethespecificorderedfrequencyofeachwordintheIMDBdatasetwhilethesentencesineachmoviereviewarecomprisedofacertainsequenceofintegers.
WORDEMBEDDING
Ourfirstmoveistomapeachmoviereviewintoarealvectordomain.Thisisaverypopulartechniqueusedwithtextnamedwordembedding.Inthistechnique,wordsareencodedasreal-valuedvectorsinahighdimensionalspaceinwhichthesimilaritybetweenwords,intermsoftheirmeaning,translatestotheclosenessofthatvectorspace.
KerasisgoodforthisasitprovidesahighlyeffectivelyandconvenientwayforconvertingpositiveintegerwordrepresentationsintoawordembeddingusingEmbeddinglayers.Therefore,wearegoingtomapeachwordontoathirty-two-lengthreal-valuedvector.Inaddition,wearegoingtolimitthetotalnumberofwordsweareinterestedinneuralnetworkmodelingtothefivethousandmostfrequentwordsandzeroouttheremaining.
Moreover,wearegoingtoconstraineachmoviereviewtobefivehundredwordsaswetruncatelongmoviereviewsandpadthoseshorterreviewswithzerovalues.Thefirststepistoprepareandmodeldata.Oncedone,youarereadytocreateyourLSTMmodel,whichwillclassifythesentimentofmoviereviews.
YourfirststepistoquicklydevelopabasicLSTMforthisIMDBproblem.Startwithimportingfunctionsandclasses,whicharerequiredforthismodel.Then,initializetherandomnumbergeneratortoaconstantvaluetomakesureyoucaneffortlesslyreproducetheresultsyouget.
importnumpy
fromkeras.datasetsimportimdb
fromkeras.modelsimportSequential
fromkeras.layersimportDense
fromkeras.layersimportLSTM
fromkeras.layers.embeddingsimportEmbedding
fromkeras.preprocessingimportsequence
numpy.random.seed(7)
Oncedone,youmustloadtheIMDBdataset.Additionally,youwillconstrainthedatasettothetopfivethousandwords.Youmustsplitthedatasetintotestandtrainsets.
Top_words=5000
(Xtrain,ytrain),(Xtest,ytest)=imdb.load_data(numwords=topwords)
Thefollowingstepistruncatingandpaddingyourinputsequences,sotheyaresameinthelength.Yourmodelwilllearnthesezerovaluesthatcarrynoinformation,sosamelengthvectorsarerequiredforcomputationhere.
Maxreviewlength=500
Xtrain=sequence.padsequences(Xtrain,maxlen=maxreview_length)
X_test=sequence.pad_sequences(Xtest,maxlen=maxreview_length)
Oncecompleted,youmustdefine,compileandfinallyfityourLSTMmodel.
Thefirstlayerinyourembeddedlayerusesthirty-two-lengthvectors,whichrepresenteveryword.ThefollowinglayerisyourLSTMlayerthathashundredsmartormemoryunits.Furthermore,youmustuseadenseoutputlayercontainingasigmoidactivationfunctionandasingleneurontomakezeroandonepredictionsforyourtwoclasses,goodorbadreviews,intheproblem.
Sincethisisabinaryclassificationproblem,youmustuseloglossasyourlossfunctionsinadditiontotheefficientAdamoptimizer.Yourmodelswillbefitforonlytwoepochs,soitwillquicklyover-fitthisproblem.Tospaceoutweightupdates,youwilluseabigbatchsizeofsixty-fourmoviereviews.
Embeddingvecorlength=32
model=Sequential()
model.add(Embedding(top_words,embeddingvecorlength,inputlength=maxreview_length))
model.add(LSTM(100))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
print(model.summary())
model.fit(Xtrain,ytrain,validation_data=(Xtest,ytest),epochs=3,batch_size=64)
Thenextstepistoestimatetheperformanceofyourmodelonafewunseenmoviereviewsasfollows.
scores=model.evaluate(Xtest,ytest,verbose=0)
print("Accuracy:%.2f%%"%(scores[1]*100))
Runningthiscode,youwillgetasindicatedbelow.
Epoch1/3
16750/16750[==============================]-107s-loss:0.5570-acc:0.7149
Epoch2/3
16750/16750[==============================]-107s-loss:0.3530-acc:0.8577
Epoch3/3
16750/16750[==============================]-107s-loss:0.2559-acc:0.9019
Accuracy:86.79%
APPLYINGDROPOUT
ThisverysimpleLSTMmodelwithlittletuningprovidesgreatresultsonthisIMDBdatasetproblem.UsethismodelasatemplatewhichyoucanapplytoavarietyofLSTMneuralnetworkstoyourownsequenceclassificationproblems.
RecurrentneuralnetworkslikeLSTMfrequentlycomewithover-fittingproblemsyoucansolvebyapplyingthedropoutKeraslayerbetweenlayers.YoujustaddnewlayersbetweenyourembeddingandLSTMlayersandbetweenyourLSTMandDenseoutputlayersasfollows.
model=Sequential()
model.add(Embedding(top_words,embeddingvecorlength,inputlength=maxreview_length))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(1,activation='sigmoid'))
Runningthisyouwillgetfollowingresult.
Epoch1/3
16750/16750[==============================]-112s-loss:0.6623-acc:0.5935
Epoch2/3
16750/16750[==============================]-113s-loss:0.5159-acc:0.7484
Epoch3/3
16750/16750[==============================]-113s-loss:0.4502-acc:0.7981
Accuracy:82.82%
Asyoucansee,thedropoutlayerhasanimpactontrainingwithalowerfinalaccuracyandslowertrendinconvergence.ThisLSTMmodelprobablycoulduseseveralmoreepochsoftrainingforbetterskill.DropoutcanalsobeappliedtotherecurrentconnectionsofthememoryunitswiththeLSTMseparatelyandprecisely.
KerascomeswithamazingcapabilityontheLSTMlayers.Youcanusethedropoutfunctionforconfiguringyourinputdropoutandyourrecurrentdropout.Youcanmodifythecodeandadddropouttoyourrecurrentconnectionsandtotheinputasillustratedbelow.
model=Sequential()
model.add(Embedding(top_words,embedding_vecor_length,inputlength=maxreview_length))
model.add(LSTM(100,dropout=0.2,recurrent_dropout=0.2))
model.add(Dense(1,activation='sigmoid'))
YoucanseethatthisLSTMspecificdropouthasmoreeffectonthelayer-wisedropoutandontheconvergenceofyournetwork.Dropoutisaverypowerfultechniqueyoushoulduseforcombatingover-fittingissuesinyourLSTMmodels.Makesureyouusebothmethodseventhoughyoumaygetbetterresultswhenusingthisgate-specificdropoutmethod.
NATURALLANGUAGEPROCESSINGWITHRECURRENTNEURALNETWORKS
Inthissectionofthebook,wearegoingtosolveanaturallanguageprocessingproblemusingrecurrentneuralnetworksinKeras.Thisnaturallanguageprocessingproblemaimstoextractthemeaningofspeechutterances.Wearegoingtobreakthisproblemintosolvablepracticalissuesofunderstatingthespeakerinalimitedcontext.Here,wewanttoidentifytheintentofaspeakeraskingforinfoaboutflights.
WearegoingtouseAirlineTravelInformationSystemorATIS.ThisdatasetwasobtainedbyDARPAbackintheearly90s.Thedatasetconsistsofspokenqueriesonnumerousflights.ATIScontains4,978sentencesand56,590wordsbothinthetestandtrainset.Thenumberofclassescontainedin128.Ourapproachhereistouserecurrentneuralnetworksandwordembedding.
Asyoualreadyknow,wordembeddingmapswordstovectorsinahigh-dimensionalspace.Thewordembeddingwhenlearnedrightcanlearnsyntacticandsemanticinformationofthewordsinthisspace.Thisembeddingspacewillbelearnedbyyourmodelthatyoudefinelater.
Forthisproblem,aconvolutionallayercandogreatwhenitcomestopoolinglocalinformation,buttheyarenotcapableofcapturingtherealdatasequentially,sowearegoingtouserecurrentneuralnetworkswhichwillhelpustacklethisconsecutiveinformationasnaturallanguage.
Arecurrentneuralnetworkmodelhassuchamemorythatstoresthesummaryofcountlesssequencesthemodelhasseenbefore.
ThismeansyoucanuserecurrentneuralnetworkstosolvecomplexwordtaggingproblemslikePOSorpartofspeechtaggingorslotfillingasinthisproblem.
Forthisproblem,youmustpassthewordembeddingssequenceastheinputofyourrecurrentneuralnetwork.
AsyouaregoingtouseIOBrepresentationforyourlabels,itisnecessarytocalculatethescoresofyourmodel.Therefore,youwillruncodeasshownbelowforyourscorecalculation.Priortothathowever,youmustdownloadthecorrespondingATISfile.
gitclonehttps://github.com/chsasank/ATIS.keras.git
cdATIS.keras
IrecommendyoutouseJupyterNotebook.Afterthat,youmustloadyourdatausingdataloadatisfullargument.Keraswilldownloadthedatathefirsttimeyourunit.
Labelsandwordsareencodedasindexestoyourdatasetvocabularyandvocabularyisstoredinlabels2idx.
importnumpyasnp
importdata.load
train_set,valid_set,dicts=data.load.atisfull()w2idx,labels2idx=dicts['words2idx'],dicts['labels2idx']
train_x,,trainlabel=train_setval_x,,vallabel=valid_setThenextstepistocreateanindextolabelandworddictsasseenbelow.
idx2w={w2idx[k]:kforkinw2idx}
idx2la={labels2idx[k]:kforkinlabels2idx}
Then,createconllevalscriptasfollows.
Words_train=[list(map(lambdax:idx2w[x],w))forwintrain_x]
labels_train=[list(map(lambdax:idx2la[x],y))foryintrain_label]
words_val=[list(map(lambdax:idx2w[x],w))forwinval_x]
labels_val=[list(map(lambdax:idx2la[x],y))foryinval_label]
n_classes=len(idx2la)
n_vocab=len(idx2w)
Thenextstepistoprintanexamplelabelandsentence.
print("Examplesentence:{}".format(words_train[0]))
print("Encodedform:{}".format(train_x[0]))
print()
print("It'slabel:{}".Format(labels_train[0]))
print("Encodedform:{}".format(train_label[0]))
Thisiswhatyouget.
Examplesentence:[...]
Encodedform:[2325425021962087762103540582341376211234481321]
It'slabel:[...]
Encodedform:[126126126126126481263599126126126781261412612612]
ThenextstepistodefineyourKerasmodel.Kerascomeswithbuiltembeddinglayeryoucanuseforwordembeddings.Itwillexpectintegerindices.
YoualsomustuseTimeDistributedargumenttopasstheouputofyourrecurrentneuralnetworkateachtimesteptoafullyconnectedlayers.Ifyoudonotperformthisstep,youroutputatthetimeofthefinalstepwillbepassedonyournextlayer.
fromkeras.modelsimportSequential
fromkeras.layers.embeddingsimportEmbedding
fromkeras.layers.recurrentimportSimpleRNN
fromkeras.layers.coreimportDense,Dropout
fromkeras.layers.wrappersimportTimeDistributed
fromkeras.layersimportConvolution1D
model=Sequential()
model.add(Embedding(n_vocab,100))model.add(Dropout(0.25))
model.add(SimpleRNN(100,return_sequences=True))model.add(TimeDistributed(Dense(n_classes,activation='softmax')))model.compile('rmsprop','categorical_crossentropy')Thenext
stepistotrainyourmodel.Youwillpasseverysentenceasabatchtoyourmodel.Youcannotusemodelfitargumentasitexpectsallincludedsentencestobethesamesize.Therefore,youaregoingtousemodeltrainonbatchargument.
importprogressbar
n_epochs=30
print("Trainingepoch{}".format(i))
bar=progressbar.ProgressBar(maxvalue=len(trainx))
label=trainlabel[nbatch]
Then,youmustmakelabelsone-hot.Whenthatstepisfinished,youmustmakeyourmodelvieweachsentenceasabatchasindicatedbelow.
label=np.eye(n_classes)[label][np.newaxis,:]
sent=sent[np.newaxis,:]
model.trainonbatch(sent,label)Tomeasuretheaccuracyofyourmodel,youaregoingtousethemodelpredictonbatchargumentalongsidemetricsaccuracyconllevalargument.
frommetrics.accuracyimportconlleval
labelspredval=[]
bar=progressbar.ProgressBar(max_value=len(val_x))
forn_batch,sentinbar(enumerate(val_x)):
label=vallabel[nbatch]
label=np.eye(n_classes)[label][np.newaxis,:]
sent=sent[np.newaxis,:]
pred=model.predictonbatch(sent)
pred=np.argmax(pred,-1)[0]
labelspredval.append(pred)
labelspredval=[list(map(lambdax:idx2la[x],y))\
foryinlabelspredval]
con_dict=conlleval(labelspredval,labels_val,
words_val,'measure.txt')
print('Precision={},Recall={},F1={}'.format(
condict['r'],condict['p'],con_dict['f1']))
Withthismodel,youshouldgetaroundninety-twoF1Score.Onedrawbackonthismodelisthatthereisnolookahead.Youcanhandilyimplementitbyaddingaconvolutionallayerbeforerecurrentneuralnetworklayersandjustafterwordembeddingsasfollows.
model=Sequential()
model.add(Embedding(n_vocab,100))
model.add(Convolution1D(128,5,border_mode='same',activation='relu'))
model.add(Dropout(0.25))
model.add(GRU(100,return_sequences=True))
model.add(TimeDistributed(Dense(n_classes,activation='softmax')))
model.compile('rmsprop','categorical_crossentropy')
Withthisgreatlyimprovedmodel,youshouldgetaroundninety-fourF1Score.Toimproveyourmodelevenfurther,youcanuseotherwordembeddingcorpuseslikeWikipedia.YoucantryotherrecurrentneuralnetworkvariantslikeGRUandLSTMthatallowmoreexperimentation.
LASTWORDS
Deeplearningisanewareaofbroadermachinelearningthathasbeenintroducedwiththemaingoalofmovingmachinelearningclosertoartificialintelligence,whichwasoneofitsoriginalgoals.Ifyouwanttobreakdeeperintoartificialintelligence,first,youneedtofocusondeeplearninganditspowers.Deeplearningarguablyisoneofthemosthighlysoughttechskills.
Thisbookwillhelpyoubecomegoodatdeeplearningbasics.Thebookwillhelpyoustartyourdeeplearningjourneyproperly.Sinceyouaredonewiththereadingpart,youknowalotaboutneuralnetworksmodels,howtobuildthemandhowtosolvedifferentdeeplearningproblemslikenaturallanguageprocessingandspeechrecognition.Therefore,youcanfocusonmoreadvanceddeeplearningproblemsinthefuture.
Inthebook,yousurveyedseveralneuralnetworkmodelsandtheirapplicationstothereal-worldproblems.YoucanusethisknowledgeinsolvingyourowndeeplearningtasksasyoubuildyourownneuralnetworkmodelsusingKeras.Onethingisforsure,youshouldtakeadvantageoftheknowledgeyougainedthroughthebookandfocusonmorecomplexdeeplearningproblems.
DeeplearningistheonlyfieldofAIthatwentviralanditsfuturelooksverybright.Therefore,youshouldnotstophere.Youshouldfocusonimprovingyourskillandgainingmoreknowledge.Machinelearningalreadyplaysamassivepartinyoureverydaylifeanddeeplearningisnotfarawayfrombecominga
largerpartofmodernsocietyaswell.
MachinelearningwasjustthebeginningasmoreandmoretechcompanieslikeMicrosoft,GoogleandFacebookspendmillionsondeeplearningandadvancedneuralnetworksresearchascomputersgetsmartereveryday.
However,deeplearningisnotaboutself-awaremachines.Itisabouthowingeniousneuralnetworkmodelsandcodearegivingmachinestheabilitytodothingswepreviouslythoughtimpossible.Therefore,deeplearningdoesconcernourfuture.Letthebookbeyourguideintothisworld,butdonotstophereandmakesureyoutakeastepfurtherbylearningsomethingneweveryday.