Getting Started with TensorFlow - pudn.comread.pudn.com/downloads779/ebook/3085493/Getting...
Transcript of Getting Started with TensorFlow - pudn.comread.pudn.com/downloads779/ebook/3085493/Getting...
GettingStartedwithTensorFlow
TableofContents
GettingStartedwithTensorFlowCreditsAbouttheAuthorAbouttheReviewerwww.PacktPub.com
eBooks,discountoffers,andmoreWhysubscribe?
PrefaceWhatthisbookcoversWhatyouneedforthisbookWhothisbookisforConventionsReaderfeedbackCustomersupport
DownloadingtheexamplecodeDownloadingthecolorimagesofthisbookErrataPiracyQuestions
1.TensorFlow–BasicConceptsMachinelearninganddeeplearningbasics
SupervisedlearningUnsupervisedlearningDeeplearning
TensorFlow–AgeneraloverviewPythonbasics
SyntaxDatatypesStringsControlflowFunctionsClassesExceptionsImportingalibrary
InstallingTensorFlow
InstallingonMacorLinuxdistributionsInstallingonWindowsInstallationfromsourceTestingyourTensorFlowinstallation
FirstworkingsessionDataFlowGraphsTensorFlowprogrammingmodel
HowtouseTensorBoardSummary
2.DoingMathwithTensorFlowThetensordatastructure
One-dimensionaltensorsTwo-dimensionaltensors
TensorhandlingThree-dimensionaltensorsHandlingtensorswithTensorFlow
PreparetheinputdataComplexnumbersandfractals
PreparethedataforMandelbrotsetBuildandexecutetheDataFlowGraphforMandelbrot'ssetVisualizetheresultforMandelbrot'ssetPreparethedataforJulia'ssetBuildandexecutetheDataFlowGraphforJulia'ssetVisualizetheresult
ComputinggradientsRandomnumbers
UniformdistributionNormaldistributionGeneratingrandomnumberswithseeds
Montecarlo'smethodSolvingpartialdifferentialequations
InitialconditionModelbuildingGraphexecution
ComputationalfunctionusedSummary
3.StartingwithMachineLearningThelinearregressionalgorithm
Datamodel
CostfunctionsandgradientdescentTestingthemodel
TheMNISTdatasetDownloadingandpreparingthedata
ClassifiersThenearestneighboralgorithm
BuildingthetrainingsetCostfunctionandoptimization
TestingandalgorithmevaluationDataclustering
Thek-meansalgorithmBuildingthetrainingsetCostfunctionsandoptimization
TestingandalgorithmevaluationSummary
4.IntroducingNeuralNetworksWhatareartificialneuralnetworks?
NeuralnetworkarchitecturesSingleLayerPerceptronThelogisticregression
TensorFlowimplementationBuildingthemodelLaunchthesessionTestevaluationSourcecode
MultiLayerPerceptronMultiLayerPerceptronclassification
BuildthemodelLaunchthesessionSourcecode
MultiLayerPerceptronfunctionapproximationBuildthemodelLaunchthesession
Summary5.DeepLearning
DeeplearningtechniquesConvolutionalneuralnetworks
CNNarchitectureTensorFlowimplementationofaCNN
InitializationstepFirstconvolutionallayerSecondconvolutionallayerDenselyconnectedlayerReadoutlayerTestingandtrainingthemodelLaunchingthesessionSourcecode
RecurrentneuralnetworksRNNarchitectureLSTMnetworksNLPwithTensorFlow
DownloadthedataBuildingthemodelRunningthecode
Summary6.GPUProgrammingandServingwithTensorFlow
GPUprogrammingTensorFlowServing
HowtoinstallTensorFlowServingBazelgRPC
TensorFlowservingdependenciesInstallServing
HowtouseTensorFlowServingTrainingandexportingtheTensorFlowmodelRunningasession
LoadingandexportingaTensorFlowmodelTesttheserver
Summary
GettingStartedwithTensorFlow
GettingStartedwithTensorFlowCopyright©2016PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:July2016
Productionreference:1190716
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
Birmingham
B32PB,UK.
ISBN978-1-78646-857-4
www.packtpub.com
CreditsAuthor
GiancarloZaccone
CopyEditor
AlphaSingh
Reviewer
JayaniWithanawasam
ProjectCoordinator
ShwetaHBirwatkar
CommissioningEditor
VeenaPagare
Proofreader
SafisEditing
AcquisitionEditor
VinayArgekar
Indexer
MariammalChettiyar
ContentDevelopmentEditor
SumeetSawant
ProductionCoordinator
NileshMohite
TechnicalEditor
DeeptiTuscano
CoverWork
NileshMohite
AbouttheAuthorGiancarloZacconehasmorethan10yearsofexperiencemanagingresearchprojectsinboththescientificandindustrialdomains.HeworkedasresearcherattheC.N.R,theNationalResearchCouncil,wherehewasinvolvedinprojectsrelatedtoparallelnumericalcomputingandscientificvisualization.
Currently,heisaseniorsoftwareengineerataconsultingcompanydevelopingandmaintainingsoftwaresystemsforspaceanddefenceapplications.
Giancarloholdsamaster'sdegreeinphysicsfromtheFedericoIIofNaplesanda2ndlevelpostgraduatemastercourseinscientificcomputingfromLaSapienzaofRome.
HehasalreadybeenaPacktauthorforthefollowingbook:PythonParallelProgrammingCookbook.
Youcancontacthimathttps://it.linkedin.com/in/giancarlozaccone
AbouttheReviewerJayaniWithanawasamisaseniorsoftwareengineeratZaiziAsia-ResearchandDevelopmentteam.SheistheauthorofthebookApacheMahoutEssentials,onscalablemachinelearning.ShewasasummitspeakeratAlfrescoSummit2014-London.Hertalkwasaboutapplicationsofmachinelearningtechniquesinsmartenterprisecontentmanagement(ECM)solutions.Shepresentedherresearch“ContentExtractionandContextInferencebasedInformationRetrieval”attheWomeninMachineLearning(WiML)2015workshop,whichwasco-locatedwiththeNeuralInformationProcessingSystems(NIPS)2015conference-Montreal,Canada.
JayaniiscurrentlypursuinganMScinArtificialIntelligenceattheUniversityofMoratuwa,SriLanka.Shehasstrongresearchinterestsinmachinelearningandcomputervision.
Youcancontactherathttps://lk.linkedin.com/in/jayaniwithanawasam
www.PacktPub.com
eBooks,discountoffers,andmoreDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusatcustomercare@packtpub.comformoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt'sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt'sentirelibraryofbooks.
Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
PrefaceTensorFlowisanopensourcesoftwarelibraryusedtoimplementmachinelearninganddeeplearningsystems.
Behindthesetwonamesarehiddenaseriesofpowerfulalgorithmsthatshareacommonchallenge:toallowacomputertolearnhowtoautomaticallyrecognizecomplexpatternsandmakethesmartestdecisionspossible.
Machinelearningalgorithmsaresupervisedorunsupervised;simplifyingasmuchaspossible,wecansaythatthebiggestdifferenceisthatinsupervisedlearningtheprogrammerinstructsthecomputerhowtodosomething,whereasinunsupervisedlearningthecomputerwilllearnallbyitself.
Deeplearningisinsteadanewareaofmachinelearningresearchthathasbeenintroducedwiththeobjectiveofmovingmachinelearningclosertoartificialintelligencegoals.Thismeansthatdeeplearningalgorithmstrytooperatelikethehumanbrain.
Withtheaimofconductingresearchinthesefascinatingareas,theGoogleteamdevelopedTensorFlow,whichisthesubjectofthisbook.
TointroduceTensorFlow’sprogrammingfeatures,wehaveusedthePythonprogramminglanguage.Pythonisfunandeasytouse;itisatruegeneral-purposelanguageandisquicklybecomingamust-havetoolinthearsenalofanyself-respectingprogrammer.
ItisnottheaimofthisbooktocompletelydescribeallTensorFlowobjectsandmethods;insteadwewillintroducetheimportantsystemconceptsandleadyouupthelearningcurveasfastandefficientlyaswecan.EachchapterofthebookpresentsadifferentaspectofTensorFlow,accompaniedbyseveralprogrammingexamplesthatreflecttypicalissuesofmachineanddeeplearning.
Althoughitislargeandcomplex,TensorFlowisdesignedtobeeasytouseonceyoulearnaboutitsbasicdesignandprogrammingmethodology.
ThepurposeofGettingStartedwithTensorFlowistohelpyoudojustthat.
Enjoyreading!
WhatthisbookcoversChapter1,TensorFlow–BasicConcepts,containsgeneralinformationonthestructureofTensorFlowandtheissuesforwhichitwasdeveloped.ItalsoprovidesthebasicprogrammingguidelinesforthePythonlanguageandafirstTensorFlowworkingsessionaftertheinstallationprocedure.ThechapterendswithadescriptionofTensorBoard,apowerfultoolforoptimizationanddebugging.
Chapter2,DoingMathwithTensorFlow,describestheabilityofmathematicalprocessingofTensorFlow.Itcoversprogrammingexamplesonbasicalgebrauptopartialdifferentialequations.Also,thebasicdatastructureinTensorFlow,thetensor,isexplained.
Chapter3,StartingwithMachineLearning,introducessomemachinelearningmodels.Westarttoimplementthelinearregressionalgorithm,whichisconcernedwithmodelingrelationshipsbetweendata.Themainfocusofthechapterisonsolvingtwobasicproblemsinmachinelearning;classification,thatis,howtoassigneachnewinputtooneofthepossiblegivencategories;anddataclustering,whichisthetaskofgroupingasetofobjectsinsuchawaythatobjectsinthesamegrouparemoresimilartoeachotherthantothoseinothergroups.
Chapter4,IntroducingNeuralNetworks,providesaquickanddetailedintroductionofneuralnetworks.Thesearemathematicalmodelsthatrepresenttheinterconnectionbetweenelements,theartificialneurons.Theyaremathematicalconstructsthattosomeextentmimicthepropertiesoflivingneurons.Neuralnetworksbuildthefoundationonwhichreststhearchitectureofdeeplearningalgorithms.Twobasictypesofneuralnetsarethenimplemented:theSingleLayerPerceptronandtheMultiLayerPerceptronforclassificationproblems.
Chapter5,DeepLearning,givesanoverviewofdeeplearningalgorithms.Onlyinrecentyearshasdeeplearningcollectedalargenumberofresultsconsideredunthinkableafewyearsago.We’llshowhowtoimplementtwofundamentaldeeplearningarchitectures,convolutionalneuralnetworks(CNN)andrecurrentneuralnetworks(RNN),forimagerecognitionandspeechtranslationproblemsrespectively.
Chapter6,GPUProgrammingandServingwithTensorFlow,showstheTensorFlowfacilitiesforGPUcomputingandintroducesTensorFlowServing,ahigh-performanceopensourceservingsystemformachinelearningmodelsdesignedfor
productionenvironmentsandoptimizedforTensorFlow.
WhatyouneedforthisbookAlltheexampleshavebeenimplementedusingPythonversion2.7onanUbuntuLinux64-bitmachine,includingtheTensorFlowlibraryversion0.7.1.
YouwillalsoneedthefollowingPythonmodules(preferablythelatestversion):
PipBazelMatplotlibNumPyPandas
WhothisbookisforThereadershouldhaveabasicknowledgeofprogrammingandmathconcepts,andatthesametime,wanttobeintroducedtothetopicsofmachineanddeeplearning.Afterreadingthisbook,youwillbeabletomasterTensorFlow’sfeaturestobuildpowerfulapplications.
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:"Theinstructionsforflowcontrolareif,for,andwhile."
Anycommand-lineinputoroutputiswrittenasfollows:
>>>myvar=3
>>>myvar+=2
>>>myvar
5
>>>myvar-=1
>>>myvar
4
Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:"TheshortcutsinthisbookarebasedontheMacOSX10.5+scheme."
Note
Warningsorimportantnotesappearinaboxlikethis.
Tip
Tipsandtricksappearlikethis.
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook-whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.Tosendusgeneralfeedback,[email protected],andmentionthebook'stitleinthesubjectofyourmessage.Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
Youcandownloadthecodefilesbyfollowingthesesteps:
1. Loginorregistertoourwebsiteusingyoure-mailaddressandpassword.2. HoverthemousepointerontheSUPPORT tabatthetop.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchbox.5. Selectthebookforwhichyou'relookingtodownloadthecodefiles.6. Choosefromthedrop-downmenuwhereyoupurchasedthisbookfrom.7. ClickonCodeDownload.
Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:
WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux
ThecodebundleforthebookisalsohostedonGitHubathttps://github.com/PacktPublishing/Getting-Started-with-TensorFlow.Wealsohaveothercodebundlesfromourrichcatalogofbooksandvideosavailableathttps://github.com/PacktPublishing/.Checkthemout!
DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefromhttp://www.packtpub.com/sites/default/files/downloads/GettingStartedwithTensorFlow_ColorImages.pdf
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks-maybeamistakeinthetextorthecode-wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusatcopyright@packtpub.comwithalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,[email protected],andwewilldoourbesttoaddresstheproblem.
Chapter1.TensorFlow–BasicConceptsInthischapter,we'llcoverthefollowingtopics:
MachinelearninganddeeplearningbasicsTensorFlow–AgeneraloverviewPythonbasicsInstallingTensorFlowFirstworkingsessionDataFlowGraphTensorFlowprogrammingmodelHowtouseTensorBoard
MachinelearninganddeeplearningbasicsMachinelearningisabranchofartificialintelligence,andmorespecificallyofcomputerscience,whichdealswiththestudyofsystemsandalgorithmsthatcanlearnfromdata,synthesizingnewknowledgefromthem.
Thewordlearnintuitivelysuggeststhatasystembasedonmachinelearning,may,onthebasisoftheobservationofpreviouslyprocesseddata,improveitsknowledgeinordertoachievebetterresultsinthefuture,orprovideoutputclosertothedesiredoutputforthatparticularsystem.
Theabilityofaprogramorasystembasedonmachinelearningtoimproveitsperformanceinaparticulartask,thankstopastexperience,isstronglylinkedtoitsabilitytorecognizepatternsinthedata.Thistheme,calledpatternrecognition,isthereforeofvitalimportanceandofincreasinginterestinthecontextofartificialintelligence;itisthebasisofallmachinelearningtechniques.
Thetrainingofamachinelearningsystemcanbedoneindifferentways:
SupervisedlearningUnsupervisedlearning
SupervisedlearningSupervisedlearningisthemostcommonformofmachinelearning.Withsupervisedlearning,asetofexamples,thetrainingset,issubmittedasinputtothesystemduringthetrainingphase,whereeachexampleislabeledwiththerespectivedesiredoutputvalue.Forexample,let'sconsideraclassificationproblem,wherethesystemmustattributesomeexperimentalobservationsinoneoftheNdifferentclassesalreadyknown.Inthisproblem,thetrainingsetispresentedasasequenceofpairsofthetype{(X1,Y1),.....,(Xn,Yn)}whereXiaretheinputvectors(featurevectors)andYirepresentsthedesiredclassforthecorrespondinginputvector.Mostsupervisedlearningalgorithmsshareonecharacteristic:thetrainingisperformedbytheminimizationofaparticularlossfunction(costfunction),whichrepresentstheoutputerrorwithrespecttothedesiredoutputsystem.
Thecostfunctionmostusedforthistypeoftrainingcalculatesthestandarddeviationbetweenthedesiredoutputandtheonesuppliedbythesystem.Aftertraining,theaccuracyofthemodelismeasuredonasetofdisjointedexamplesfromthetrainingset,theso-calledvalidationset.
Supervisedlearningworkflow
Inthisphasethemodel'sgeneralizationcapabilityisthenverified:wewilltestiftheoutputiscorrectforanunusedinputduringthetrainingphase.
Unsupervisedlearning
Inunsupervisedlearning,thetrainingexamplesprovidedbythesystemarenotlabeledwiththerelatedbelongingclass.Thesystem,therefore,developsandorganizesthedata,lookingforcommoncharacteristicsamongthem,andchangingthembasedontheirinternalknowledge.
Unsupervisedlearningalgorithmsareparticularlyusedinclusteringproblems,inwhichanumberofinputexamplesarepresent,youdonotknowtheclassapriori,andyoudonotevenknowwhatthepossibleclassesare,orhownumeroustheyare.Thisisaclearcasewhenyoucannotusesupervisedlearning,becauseyoudonotknowapriorithenumberofclasses.
Unsupervisedlearningworkflow
Deeplearning
Deeplearningtechniquesrepresentaremarkablestepforwardtakenbymachinelearninginrecentdecades,havingprovidedresultsneverseenbeforeinmanyapplications,suchasimageandspeechrecognitionorNaturalLanguageProcessing(NLP).Thereareseveralreasonsthatledtodeeplearningbeingdevelopedandplacedatthecenterofthefieldofmachinelearningonlyinrecentdecades.Onereason,perhapsthemainone,issurelyrepresentedbyprogressinhardware,withtheavailabilityofnewprocessors,suchasgraphicsprocessingunits(GPUs),whichhavegreatlyreducedthetimeneededfortrainingnetworks,loweringthembyafactorof10or20.Anotherreasoniscertainlytheevermorenumerousdatasetsonwhichtotrainasystem,neededtotrainarchitecturesofa
certaindepthandwithahighdimensionalityfortheinputdata.
Deeplearningworkflow
Deeplearningisbasedonthewaythehumanbrainprocessesinformationandlearns,respondingtoexternalstimuli.Itconsistsinamachinelearningmodelatseverallevelsofrepresentationinwhichthedeeperlevelstakeasinputtheoutputsofthepreviouslevels,transformingthemandalwaysabstractingmore.Eachlevelcorrespondsinthishypotheticalmodeltoadifferentareaofthecerebralcortex:whenthebrainreceivesimages,itprocessesthemthroughvariousstagessuchasedgedetectionandformperception,thatis,fromaprimitiverepresentationleveltothemostcomplex.Forexample,inanimageclassificationproblem,eachblockgraduallyextractsthefeatures,atvariouslevelsofabstraction,inputtingofdataalreadyprocessed,bymeansoffilteringoperations.
TensorFlow–AgeneraloverviewTensorFlow(https://www.tensorflow.org/)isasoftwarelibrary,developedbyGoogleBrainTeamwithinGoogle'sMachineLearningIntelligenceresearchorganization,forthepurposesofconductingmachinelearninganddeepneuralnetworkresearch.TensorFlowthencombinesthecomputationalalgebraofcompilationoptimizationtechniques,makingeasythecalculationofmanymathematicalexpressionswheretheproblemisthetimerequiredtoperformthecomputation.
Themainfeaturesinclude:
Defining,optimizing,andefficientlycalculatingmathematicalexpressionsinvolvingmulti-dimensionalarrays(tensors).Programmingsupportofdeepneuralnetworksandmachinelearningtechniques.TransparentuseofGPUcomputing,automatingmanagementandoptimizationofthesamememoryandthedataused.YoucanwritethesamecodeandruniteitheronCPUsorGPUs.Morespecifically,TensorFlowwillfigureoutwhichpartsofthecomputationshouldbemovedtotheGPU.Highscalabilityofcomputationacrossmachinesandhugedatasets.
TensorFlowhomepage
TensorFlowisavailablewithPythonandC++support,andweshallusePython2.7forlearning,asindeedPythonAPIisbettersupportedandmucheasiertolearn.ThePythoninstallationdependsonyoursystems;thedownloadpage(https://www.python.org/downloads/)containsalltheinformationneededforitsinstallation.Inthenextsection,weexplainverybrieflythemainfeaturesofthePythonlanguage,withsomeprogrammingexamples.
PythonbasicsPythonisastronglytypedanddynamiclanguage(datatypesarenecessarybutitisnotnecessarytoexplicitlydeclarethem),case-sensitive(varandVARaretwodifferentvariables),andobject-oriented(everythinginPythonisanobject).
SyntaxInPython,alineterminatorisnotrequired,andtheblocksarespecifiedwiththeindentation.Indenttobeginablockandremoveindentationtoconcludeit,that'sall.Instructionsthatrequireanindentedblockendwithacolon(:).Commentsbeginwiththehashsign(#)andaresingle-line.Stringsonmultiplelinesareusedformulti-linecomments.Assignmentsareaccomplishedwiththeequalsign(=).Forequalitytestsweusethedoubleequal(==)symbol.Youcanincreaseanddecreaseavaluebyusing+=and-=followedbytheaddend.Thisworkswithmanydatatypes,includingstrings.Youcanassignandusemultiplevariablesonthesameline.
Followingaresomeexamples:
>>>myvar=3
>>>myvar+=2
>>>myvar
5
>>>myvar-=1
>>>myvar
4
"""Thisisacomment"""
>>>mystring="Hello"
>>>mystring+="world."
>>>printmystring
Helloworld.
Thefollowingcodeswapstwovariablesinoneline:
>>>myvar,mystring=mystring,myvar
DatatypesThemostsignificantstructuresinPythonarelists,tuples,anddictionaries.ThesetsareintegratedinPythonsinceversion2.5(forpreviousversions,theyareavailableinthesetslibrary).Listsaresimilartosingle-dimensionalarraysbutyoucancreateliststhatcontainotherlists.Dictionariesarearraysthatcontainpairsofkeysandvalues(hashtable),andtuplesareimmutablemono-dimensionalobjects.InPythonarrayscanbeofanytype,soyoucanmixintegers,strings,andsooninyourlists/dictionariesandtuples.Theindexofthefirstobjectinanytypeofarrayisalwayszero.Negativeindicesareallowedandcountingfromtheendofthearray,-1isthelastelement.Variablescanrefertofunctions.
>>>example=[1,["list1","list2"],("one","tuple")]
>>>mylist=["Element1",2,3.14]
>>>mylist[0]
"Element1"
>>>mylist[-1]
3.14
>>>mydict={"Key1":"Val1",2:3,"pi":3.14}
>>>mydict["pi"]
3.14
>>>mytuple=(1,2,3)
>>>myfunc=len
>>>printmyfunc(mylist)
3
Youcangetanarrayrangeusingacolon(:).Notspecifyingthestartingindexoftherangeimpliesthefirstelement;notindicatingthefinalindeximpliesthelastelement.Negativeindicescountfromthelastelement(-1isthelastelement).Thenrunthefollowingcommand:
>>>mylist=["firstelement",2,3.14]
>>>printmylist[:]
['firstelement',2,3.1400000000000001]
>>>printmylist[0:2]
['firstelement',2]
>>>printmylist[-3:-1]
['firstelement',2]
>>>printmylist[1:]
[2,3.14]
StringsPythonstringsareindicatedeitherwithasinglequotationmark(')ordouble(")andareallowedtouseanotationwithinadelimitedstringontheother("Hesaid'hello'."Itisvalid).Stringsofmultiplelinesareenclosedintriple(orsingle)quotes(""").Pythonsupportsunicode;justusethesyntax:"Thisisaunicodestring".Toinsertvaluesintoastring,usethe%operator(modulo)andatuple.Each%isreplacedbyatupleelement,fromlefttoright,andispermittedtouseadictionaryforthereplacements.
>>>print"Nome:%s\nNumber:%s\nString:%s"%(myclass.nome,3,3*"-
")
Name:Poromenos
Number:3
String:---
strString="""thisisastring
onmultiplelines."""
>>>print"This%(verbo)sun%(name)s."%{"name":"test","verb":
"is"}
Thisisatest.
ControlflowTheinstructionsforflowcontrolareif,for,andwhile.Thereistheselectcontrolflow;initsplace,weuseif.Theforcontrolflowisusedtoenumeratethemembersofalist.Togetalistofnumbers,youuserange(number).
rangelist=range(10)
>>>printrangelist
[0,1,2,3,4,5,6,7,8,9]
Let'scheckifnumberisoneofthenumbersinthetuple:
fornumberinrangelist:
ifnumberin(3,4,7,9):
#"Break"endstheforinstructionwithouttheelseclause
break
else:
#"Continue"continueswiththenextiterationoftheloop
continue
else:
#thisisanoptional"else"
#executedonlyiftheloopisnotinterruptedwith"break".
pass#itdoesnothing
ifrangelist[1]==2:
print"thesecondelement(listsare0-based)is2"
elifrangelist[1]==3:
print"thesecondelementis3"
else:
print"Idon'tknow"
whilerangelist[1]==1:
pass
FunctionsFunctionsaredeclaredwiththekeyworddef.Anyoptionalargumentsmustbedeclaredafterthosethataremandatoryandmusthaveavalueassigned.Whencallingfunctionsusingargumentstonameyoumustalsopassthevalue.Functionscanreturnatuple(tupleunpackingenablesthereturnofmultiplevalues).Lambdafunctionsarein-line.Parametersarepassedbyreference,butimmutabletypes(tuples,integers,strings,andsoon)cannotbechangedinthefunction.Thishappensbecauseitisonlypassedthroughthepositionoftheelementinmemory,andassigninganotherobjecttothevariableresultsinthelossoftheobjectreferenceearlier.
Forexample:
#equaltoadeff(x):returnx+1
funzionevar=lambdax:x+1
>>>printfunzionevar(1)
2
defpassing_example(my_list,my_int):
my_list.append("newelement")
my_int=4
returnmy_list,my_int
>>>input_my_list=[1,2,3]
>>>input_my_int=10
>>>printpassing_example(input_my_list,input_my_int)
([1,2,3,'newelement'],10)
>>>my_list
[1,2,3,'newelement']
>>>my_int
10
ClassesPythonsupportsmultipleinheritanceofclasses.Thevariablesandprivatemethodsaredeclaredbyconvection(itisnotaruleoflanguage)byprecedingthemwithtwounderscores(__).Wecanassignattributes(properties)toarbitraryinstancesofaclass.
Thefollowingisanexample:
classMyclass:
common=10
def__init__(self):
self.myvariable=3
defmyfunc(self,arg1,arg2):
returnself.myvariable
#Wecreateaninstanceoftheclass
>>>instance=Myclass()
>>>instance.myfunc(1,2)
3
#Thisvariableissharedbyallinstances
>>>instance2=Myclass()
>>>instance.common
10
>>>instance2.common
10
#Noteherehowweusetheclassname
#Insteadoftheinstance.
>>>Myclass.common=30
>>>instance.common
30
>>>instance2.common
30
#Thisdoesnotupdatethevariableintheclass,
#Insteadassignanewobjecttothevariable
#ofthefirstinstance.
>>>instance.common=10
>>>instance.common
10
>>>instance2.common
30
>>>Myclass.common=50
#Thevalueisnotchangedbecause"common"isaninstancevariable.
>>>instance.common
10
>>>instance2.common
50
#ThisclassinheritsfromMyclass.Multipleinheritance
#isdeclaredlikethis:
#classAltraClasse(Myclass1,Myclass2,MyclassN)
classAnotherClass(Myclass):
#Thetopic"self"isautomaticallypassed
#andmakesreferencetoinstanceoftheclass,soyoucanset
#ofinstancevariablesasabove,butwithintheclass.
def__init__(self,arg1):
self.myvariable=3
printarg1
>>>instance=AnotherClass("hello")
hello
>>>instance.myfunc(1,2)
3
#Thisclassdoesnothaveamember(property).testmember,but
#Wecanaddoneallinstancewhenwewant.Note
#.testThatwillbeamemberofonlyoneinstance.
>>>instance.test=10
>>>instance.test
10
ExceptionsExceptionsinPythonarehandledwithtry-exceptblocks[exception_name]:
defmy_func():
try:
#Divisionbyzerocausesanexception
10/0
exceptZeroDivisionError:
print"Oops,error"
else:
#noexception,let'sproceed
pass
finally:
#Thiscodeisexecutedwhentheblock
#Try..exceptisalreadyexecutedandallexceptions
#Werehandled,evenifthereisanew
#Exceptiondirectlyintheblock.
print"finish"
>>>my_func()
Oops,error.
finish
ImportingalibraryExternallibrariesareimportedwithimport[libraryname].Youcanalsousetheform[libraryname]import[funcname]toimportindividualfeatures.Here'sanexample:
importrandom
fromtimeimportclock
randomint=random.randint(1,100)
>>>printrandomint
64
InstallingTensorFlowTheTensorFlowPythonAPIsupportsPython2.7andPython3.3+.TheGPUversion(Linuxonly)requirestheCudaToolkit>=7.0andcuDNN>=v2.
WhenworkinginaPythonenvironment,itisrecommendedyouusevirtualenv.ItwillisolateyourPythonconfigurationfordifferentprojects;usingvirtualenvwillnotoverwriteexistingversionsofPythonpackagesrequiredbyTensorFlow.
InstallingonMacorLinuxdistributionsThefollowingarethestepstoinstallTensorFlowonMacandLinuxsystem:
1. Firstinstallpipandvirtualenv(optional)iftheyarenotalreadyinstalled:
ForUbuntu/Linux64-bit:
$sudoapt-getinstallpython-pippython-devpython-
virtualenv
ForMacOSX:
$sudoeasy_installpip
$sudopipinstall--upgradevirtualenv
2. Thenyoucancreateavirtualenvironmentvirtualenv.Thefollowingcommandscreateavirtualenvironmentvirtualenvinthe~/tensorflowdirectory:
$virtualenv--system-site-packages~/tensorflow
3. Thenextstepistoactivatevirtualenvasfollows:
$source~/tensorflow/bin/activate.csh
(tensorflow)$
4. Henceforth,thenameoftheenvironmentwe'reworkinginprecedesthecommandline.Onceactivated,PipisusedtoinstallTensorFlowwithinit.
ForUbuntu/Linux64-bit,CPU:
(tensorflow)$pipinstall--upgrade
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.5.0-
cp27-none-linux_x86_64.whl
ForMacOSX,CPU:
(tensorflow)$pipinstall--upgrade
https://storage.googleapis.com/tensorflow/mac/tensorflow-0.5.0-py2-
none-any.whl
IfyouwanttouseyourGPUcardwithTensorFlow,theninstallanotherpackage.IrecommendyouvisittheofficialdocumentationtoseeifyourGPUmeetsthespecificationsrequiredtosupportTensorFlow.
Note
ToenableyourGPUwithTensorFlow,youcanreferto(https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#optional-linux-enable-gpu-support)foracompletedescription.
Finally,whenyou'vefinished,youmustdisablethevirtualenvironment:
(tensorflow)$deactivate
Note
Giventheintroductorynatureofthisbook,IsuggestthereadertovisitthedownloadandsetupTensorFlowpageat(https://www.tensorflow.org/versions/r0.7/get_started/os_setup.html#download-and-setup)tofindmoreinformationaboutotherwaystoinstallTensorFlow.
InstallingonWindowsIfyoucan'tgetaLinux-basedsystem,youcaninstallUbuntuonavirtualmachine;justuseafreeapplicationcalledVirtualBox,whichletsyoucreateavirtualPConWindowsandinstallUbuntuinthelatter.Soyoucantrytheoperatingsystemwithoutcreatingpartitionsordealingwithcumbersomeprocedures.
Note
AfterinstallingVirtualBox,youcaninstallUbuntu(www.ubuntu.com)andthenfollowtheinstallationforLinuxmachinestoinstallTensorFlow.
InstallationfromsourceHowever,itmayhappenthatthePipinstallationcausesproblems,particularlywhenusingthevisualizationtoolTensorBoard(seehttps://github.com/tensorflow/tensorflow/issues/530).Tofixthisproblem,IsuggestyoubuildandinstallTensorFlow,startingformsourcefiles,throughthefollowingsteps:
1. ClonetheTensorFlowrepository:
gitclone--recurse-submodules
https://github.com/tensorflow/tensorflow2. InstallBazel(dependenciesandinstaller),followingtheinstructionsat:
http://bazel.io/docs/install.html.3. RuntheBazelinstaller:
chmod+xbazel-version-installer-os.sh
./bazel-version-installer-os.sh--user
4. InstallthePythondependencies:
sudoapt-getinstallpython-numpyswigpython-dev
5. Configure(GPUornoGPU?)yourinstallationintheTensorFlowdownloadedrepository:
./configure
6. CreateyourownTensorFlowPippackageusingbazel:
bazelbuild-copt
//tensorflow/tools/pip_package:build_pip_package
7. TobuildwithGPUsupport,usebazelbuild-copt--config=cudafollowedagainby:
//tensorflow/tools/pip_package:build_pip_package
8. Finally,installTensorBoardwherethenameofthe.whlfilewilldependonyourplatform.
pipinstall/tmp/tensorflow_pkg/tensorflow-0.7.1-py2-none-
linux_x86_64.whl
9. GoodLuck!
Note
Pleaserefertohttps://www.tensorflow.org/versions/r0.7/get_started/os_setup.html#installation-for-linuxforfurtherinformation.
TestingyourTensorFlowinstallationOpenaterminalandtypethefollowinglinesofcode:
>>>importtensorflowastf
>>>hello=tf.constant("helloTensorFlow!")
>>>sess=tf.Session()
Toverifyyourinstallation,justtype:
>>>print(sess.run(hello))
Youshouldhavethefollowingoutput:
HelloTensorFlow!
>>>
FirstworkingsessionFinallyitistimetomovefromtheorytopractice.IwillusethePython2.7IDEtowritealltheexamples.TogetaninitialideaofhowtouseTensorFlow,openthePythoneditorandwritethefollowinglinesofcode:
x=1
y=x+9
print(y)
importtensorflowastf
x=tf.constant(1,name='x')
y=tf.Variable(x+9,name='y')
print(y)
Asyoucaneasilyunderstandinthefirstthreelines,theconstantx,setequalto1,isthenaddedto9tosetthenewvalueofthevariabley,andthentheendresultofthevariableyisprintedonthescreen.
Inthelastfourlines,wehavetranslatedaccordingtoTensorFlowlibrarythefirstthreevariables.
Ifweruntheprogram,wehavethefollowingoutput:
10
<tensorflow.python.ops.variables.Variableobjectat
0x7f30ccbf9190>
TheTensorFlowtranslationofthefirstthreelinesoftheprogramexampleproducesadifferentresult.Let'sanalyzethem:
1. ThefollowingstatementshouldneverbemissedifyouwanttousetheTensorFlowlibrary.Ittellsusthatweareimportingthelibraryandcallittf:
importtensorflowastf
2. Wecreateaconstantvaluecalledx,withavalueequaltoone:
x=tf.constant(1,name='x')
3. Thenwecreateavariablecalledy.Thisvariableisdefinedwiththesimpleequationy=x+9:
y=tf.Variable(x+9,name='y')
4. Finally,printouttheresult:
print(y)
Sohowdoweexplainthedifferentresult?Thedifferenceliesinthevariabledefinition.Infact,thevariableydoesn'trepresentthecurrentvalueofx+9,insteaditmeans:whenthevariableyiscomputed,takethevalueoftheconstantxandadd9toit.Thisisthereasonwhythevalueofyhasneverbeencarriedout.Inthenextsection,I'lltrytofixit.
SoweopenthePythonIDEandenterthefollowinglines:
Runningtheprecedingcode,theoutputresultisfinallyasfollows:
10
Wehaveremovedtheprintinstruction,butwehaveinitializedthemodelvariables:
model=tf.initialize_all_variables()
And,mostly,wehavecreatedasessionforcomputingvalues.Inthenextstep,werunthemodel,createdpreviously,andfinallyrunjustthevariableyandprintoutitscurrentvalue.
withtf.Session()assession:
session.run(model)
print(session.run(y))
Thisisthemagictrickthatpermitsthecorrectresult.Inthisfundamentalstep,theexecutiongraphcalledDataFlowGraphiscreatedinthesession,withallthedependenciesbetweenthevariables.Theyvariabledependsonthevariablex,andthatvalueistransformedbyadding9toit.Thevalueisnotcomputeduntilthesessionisexecuted.
ThislastexampleintroducedanotherimportantfeatureinTensorFlow,theDataFlowGraph.
DataFlowGraphsAmachinelearningapplicationistheresultoftherepeatedcomputationofcomplexmathematicalexpressions.InTensorFlow,acomputationisdescribedusingtheDataFlowGraph,whereeachnodeinthegraphrepresentstheinstanceofamathematicaloperation(multiply,add,divide,andsoon),andeachedgeisamulti-dimensionaldataset(tensors)onwhichtheoperationsareperformed.
TensorFlowsupportstheseconstructsandtheseoperators.Let'sseeindetailhownodesandedgesaremanagedbyTensorFlow:
Node:InTensorFlow,eachnoderepresentstheinstantionofanoperation.Eachoperationhas>=inputsand>=0outputs.Edges:InTensorFlow,therearetwotypesofedge:
NormalEdges:Theyarecarriersofdatastructures(tensors),whereanoutputofoneoperation(fromonenode)becomestheinputforanotheroperation.SpecialEdges:Theseedgesarenotdatacarriersbetweentheoutputofanode(operator)andtheinputofanothernode.Aspecialedgeindicatesacontroldependencybetweentwonodes.Let'ssupposewehavetwonodesAandBandaspecialedgesconnectingAtoB;itmeansthatBwillstartitsoperationonlywhentheoperationinAends.SpecialedgesareusedinDataFlowGraphtosetthehappens-beforerelationshipbetweenoperationsonthetensors.
Let'sexploresomefeaturesinDataFlowGraphingreaterdetail:
Operation:Thisrepresentsanabstractcomputation,suchasaddingormultiplyingmatrices.Anoperationmanagestensors.Itcanjustbepolymorphic:thesameoperationcanmanipulatedifferenttensorelementtypes.Forexample,theadditionoftwoint32tensors,theadditionoftwofloattensors,andsoon.Kernel:Thisrepresentstheconcreteimplementationofthatoperation.Akerneldefinestheimplementationoftheoperationonaparticulardevice.Forexample,anaddmatrixoperationcanhaveaCPUimplementationandaGPUone.Inthefollowingsection,wehaveintroducedtheconceptofsessionstocreateadelexecutiongraphinTensorFlow.Let'sexplainthistopic:Session:WhentheclientprogramhastoestablishcommunicationwiththeTensorFlowruntimesystem,asessionmustbecreated.Assoonasthesession
iscreatedforaclient,aninitialgraphiscreatedandisempty.Ithastwofundamentalmethods:
session.extend:Inacomputation,theusercanextendtheexecutiongraph,requestingtoaddmoreoperations(nodes)andedges(data).session.run:UsingTensorFlow,sessionsarecreatedwithsomegraphs,andthesefullgraphsareexecutedtogetsomeoutputs,orsometimes,subgraphsareexecutedthousands/millionsoftimesusingruninvocations.Basically,themethodrunstheexecutiongraphtoprovideoutputs.
FeaturesinDataFlowGraph
TensorFlowprogrammingmodelAdoptingDataFlowGraphasexecutionmodel,youdividethedataflowdesign(graphbuildinganddataflow)fromitsexecution(CPU,GPUcards,oracombination),usingasingleprogramminginterfacethathidesallthecomplexities.ItalsodefineswhattheprogrammingmodelshouldbelikeinTensorFlow.
Let'sconsiderthesimpleproblemofmultiplyingtwointegers,namelyaandb.
Thefollowingarethestepsrequiredforthissimpleproblem:
1. Defineandinitializethevariables.Eachvariableshoulddefinethestateofacurrentexecution.AfterimportingtheTensorFlowmoduleinPython:
importtensorflowastf
2. Wedefinethevariablesaandbinvolvedinthecomputation.Thesearedefinedviaamorebasicstructure,calledtheplaceholder:
a=tf.placeholder("int32")
b=tf.placeholder("int32")
3. Aplaceholderallowsustocreateouroperationsandtobuildourcomputationgraph,withoutneedingthedata.
4. Thenweusethesevariables,asinputsforTensorFlow'sfunctionmul:
y=tf.mul(a,b)
thisfunctionwillreturntheresultofthemultiplicationthe
inputintegersaandb.
5. Managetheexecutionflow,thismeansthatwemustbuildasession:
sess=tf.Session()
6. Visualizetheresults.Werunourmodelonthevariablesaandb,feedingdataintothedataflowgraphthroughtheplaceholderspreviouslydefined.
printsess.run(y,feed_dict={a:2,b:5})
HowtouseTensorBoardTensorBoardisavisualizationtool,devotedtoanalyzingDataFlowGraphandalsotobetterunderstandthemachinelearningmodels.Itcanviewdifferenttypesofstatisticsabouttheparametersanddetailsofanypartofacomputergraphgraphically.Itoftenhappensthatagraphofcomputationcanbeverycomplex.Adeepneuralnetworkcanhaveupto36,000nodes.Forthisreason,TensorBoardcollapsesnodesinhigh-levelblocks,highlightingthegroupswithidenticalstructures.Doingsoallowsabetteranalysisofthegraph,focusingonlyonthecoresectionsofthecomputationgraph.Also,thevisualizationprocessisinteractive;usercanpan,zoom,andexpandthenodestodisplaythedetails.
ThefollowingfigureshowsaneuralnetworkmodelwithTensorBoard:
ATensorBoardvisualizationexample
TensorBoard'salgorithmscollapsenodesintohigh-levelblocksandhighlightgroupswiththesamestructures,whilealsoseparatingouthigh-degreenodes.Thevisualizationtoolisalsointeractive:theuserscanpan,zoomin,expand,andcollapsethenodes.
TensorBoardisequallyusefulinthedevelopmentandtuningofamachinelearningmodel.Forthisreason,TensorFlowletsyouinsertso-calledsummaryoperationsintothegraph.Thesesummaryoperationsmonitorchangingvalues(duringtheexecutionofacomputation)writteninalogfile.ThenTensorBoardisconfiguredtowatchthislogfilewithsummaryinformationanddisplayhowthisinformationchangesovertime.
Let'sconsiderabasicexampletounderstandtheusageofTensorBoard.Wehavethefollowingexample:
importtensorflowastf
a=tf.constant(10,name="a")
b=tf.constant(90,name="b")
y=tf.Variable(a+b*2,name="y")
model=tf.initialize_all_variables()
withtf.Session()assession:
merged=tf.merge_all_summaries()
writer=tf.train.SummaryWriter\
("/tmp/tensorflowlogs",session.graph)
session.run(model)
print(session.run(y))
Thatgivesthefollowingresult:
190
Let'spointintothesessionmanagement.Thefirstinstructiontoconsiderisasfollows:
merged=tf.merge_all_summaries()
Thisinstructionmustmergeallthesummariescollectedinthedefaultgraph.
ThenwecreateSummaryWriter.Itwillwriteallthesummaries(inthiscasetheexecutiongraph)obtainedfromthecode'sexecutionintothe/tmp/tensorflowlogsdirectory:
writer=tf.train.SummaryWriter\
("/tmp/tensorflowlogs",session.graph)
Finally,werunthemodelandsobuildtheDataFlowGraph:
session.run(model)
print(session.run(y))
TheuseofTensorBoardisverysimple.Let'sopenaterminalandenterthefollowing:
$tensorboard--logdir=/tmp/tensorflowlogs
Amessagesuchasthefollowingshouldappear:
startigtensorboardonport6006
Then,byopeningawebbrowser,weshoulddisplaytheDataFlowGraphwithauxiliarynodes:
DataFlowGraphdisplaywithTensorBoard
NowwewillbeabletoexploretheDataFlowGraph:
ExploretheDataFlowGraphdisplaywithTensorBoard
TensorBoardusesspecialiconsforconstantsandsummarynodes.Tosummarize,wereportinthenextfigurethetableofnodesymbolsdisplayed:
NodesymbolsinTensorBoard
SummaryInthischapter,weintroducedthemaintopics:machinelearninganddeeplearning.Whilemachinelearningexploresthestudyandconstructionofalgorithmsthatcanlearnfrom,andmakepredictionsondata,deeplearningisbasedpreciselyonthewaythehumanbrainprocessesinformationandlearns,respondingtoexternalstimuli.
Inthisvastscientificresearchandpracticalapplicationarea,wecanfirmlyplacetheTensorFlowsoftwarelibrary,developedbytheGoogle'sresearchgroupforartificialintelligence(GoogleBrainProject)andreleasedasopensourcesoftwareonNovember9,2015.
AfterelectingthePythonprogramminglanguageasthedevelopmenttoolforourexamplesandapplications,wesawhowtoinstallandcompilethelibrary,andthencarriedoutafirstworkingsession.ThisallowedustointroducetheexecutionmodelofTensorFlowandDataFlowGraph.Itledustodefinewhatourprogrammingmodelshouldbe.
Thechapterendedwithanexampleofhowtouseanimportanttoolfordebuggingmachinelearningapplications:TensorBoard.
Inthenextchapter,wewillcontinueourjourneyintotheTensorFlowlibrary,withtheintentionofshowingitsversatility.Startingfromthefundamentalconcept,tensors,wewillseehowtousethelibraryforpurelymathapplications.
Chapter2.DoingMathwithTensorFlowInthischapter,wewillcoverthefollowingtopics:
ThetensordatastructureHandlingtensorswithTensorFlowComplexnumbersandfractalsComputingderivativesRandomnumbersSolvingpartialdifferentialequations
ThetensordatastructureTensorsarethebasicdatastructuresinTensorFlow.Aswehavealreadysaid,theyrepresenttheconnectingedgesinaDataFlowGraph.Atensorsimplyidentifiesamultidimensionalarrayorlist.
Itcanbeidentifiedbythreeparameters,rank,shape,andtype:
rank:Eachtensorisdescribedbyaunitofdimensionalitycalledrank.Itidentifiesthenumberofdimensionsofthetensor.Forthisreason,arankisknownasorderorn-dimensionsofatensor(forexample,arank2tensorisamatrixandarank1tensorisavector).shape:Theshapeofatensoristhenumberofrowsandcolumnsithas.type:Itisthedatatypeassignedtothetensor'selements.
Well,nowwetakeconfidencewiththisfundamentaldatastructure.Tobuildatensor,wecan:
Buildann-dimensionalarray;forexample,byusingtheNumPylibraryConvertthen-dimensionalarrayintoaTensorFlowtensor
Onceweobtainthetensor,wecanhandleitusingtheTensorFlowoperators.Thefollowingfigureprovidesavisualexplanationoftheconceptsintroduced:
Visualizationofmultidimensionaltensors
One-dimensionaltensorsTobuildaone-dimensionaltensor,weusetheNumpyarray(s)command,wheresisaPythonlist:
>>>importnumpyasnp
>>>tensor_1d=np.array([1.3,1,4.0,23.99])
UnlikeaPythonlist,thecommasbetweentheelementsarenotshown:
>>>printtensor_1d
[1.31.4.23.99]
TheindexingisthesameasPythonlists.Thefirstelementhasposition0,thethirdelementhasposition2,andsoon:
>>>printtensor_1d[0]
1.3
>>>printtensor_1d[2]
4.0
Finally,youcanviewthebasicattributesofthetensor,therankofthetensor:
>>>tensor_1d.ndim
1
Thetupleofthetensor'sdimensionisasfollows:
>>>tensor_1d.shape
(4L,)
Thetensor'sshapehasjustfourvaluesinarow.
Thedatatypeinthetensor:
>>>tensor_1d.dtype
dtype('float64')
Now,let'sseehowtoconvertaNumPyarrayintoaTensorFlowtensor:
importTensorFlowastf
TheTensorFlowfunctiontf_convert_to_tensorconvertsPythonobjectsof
varioustypestotensorobjects.Itacceptstensorobjects,Numpyarrays,Pythonlists,andPythonscalars:
tf_tensor=tf.convert_to_tensor(tensor_1d,dtype=tf.float64)
RunningtheSession,wecanvisualizethetensoranditselementsasfollows:
withtf.Session()assess:
printsess.run(tf_tensor)
printsess.run(tf_tensor[0])
printsess.run(tf_tensor[2])
Thatgivesthefollowingresults:
>>
[1.31.4.23.99]
1.3
4.0
>>>
Two-dimensionaltensorsTocreateatwo-dimensionaltensorormatrix,weagainusearray(s),butswillbeasequenceofarray:
>>>importnumpyasnp
>>>tensor_2d=np.array([(1,2,3,4),(4,5,6,7),(8,9,10,11),
(12,13,14,15)])
>>>printtensor_2d
[[1234]
[4567]
[891011]
[12131415]]
>>>
Avalueintensor_2disidentifiedbytheexpressiontensor_2d[row,col],whererowistherowpositionandcolisthecolumnposition:
>>>tensor_2d[3][3]
15
Youcanalsousethesliceoperator:toextractasubmatrix:
>>>tensor_2d[0:2,0:2]
array([[1,2],
[4,5]])
Inthiscase,weextracteda2×2submatrix,containingrow0and1,andcolumns0and1oftensor_2d.TensorFlowhasitsownsliceoperator.Inthenextsubsectionwewillseehowtouseit.
Tensorhandling
Let'sseehowwecanapplyalittlemorecomplexoperationstothesedatastructures.Considerthefollowingcode:
1. Importthelibraries:
importTensorFlowastf
importnumpyasnp
2. Let'sbuildtwointegerarrays.Theserepresentstwo3×3matrices:
matrix1=np.array([(2,2,2),(2,2,2),(2,2,2)],dtype='int32')
matrix2=np.array([(1,1,1),(1,1,1),(1,1,1)],dtype='int32')
3. Visualizethem:
print"matrix1="
printmatrix1
print"matrix2="
printmatrix2
4. TousethesematricesinourTensorFlowenvironment,theymustbetransformedintoatensordatastructure:
matrix1=tf.constant(matrix1)
matrix2=tf.constant(matrix2)
5. WeusedtheTensorFlowconstantoperatortoperformthetransformation.6. ThematricesarereadytobemanipulatedwithTensorFlowoperators.Inthis
case,wecalculateamatrixmultiplicationandamatrixsum:
matrix_product=tf.matmul(matrix1,matrix2)
matrix_sum=tf.add(matrix1,matrix2)
7. Thefollowingmatrixwillbeusedtocomputeamatrixdeterminant:
matrix_3=np.array([(2,7,2),(1,4,2),
(9,0,2)],dtype='float32')
print"matrix3="
printmatrix_3
matrix_det=tf.matrix_determinant(matrix_3)
8. It'stimetocreateourgraphandrunthesession,withthetensorsandoperatorscreated:
withtf.Session()assess:
result1=sess.run(matrix_product)
result2=sess.run(matrix_sum)
result3=sess.run(matrix_det)
9. Theresultswillbeprintedoutbyrunningthefollowingcommand:
print"matrix1*matrix2="
printresult1
print"matrix1+matrix2="
printresult2
print"matrix3determinantresult="
printresult3
Thefollowingfigureshowstheresults,afterrunningthecode:
TensorFlowprovidesnumerousmathoperationsontensors.Thefollowingtablesummarizesthem:
TensorFlowoperator Description
tf.add Returnsthesum
tf.sub
Returnssubtraction
tf.mul Returnsthemultiplication
tf.div Returnsthedivision
tf.mod Returnsthemodule
tf.abs Returnstheabsolutevalue
tf.neg Returnsthenegativevalue
tf.sign Returnsthesign
tf.inv Returnstheinverse
tf.square Returnsthesquare
tf.round Returnsthenearestinteger
tf.sqrt Returnsthesquareroot
tf.pow Returnsthepower
tf.exp Returnstheexponential
tf.log Returnsthelogarithm
tf.maximum Returnsthemaximum
tf.minimum Returnstheminimum
tf.cos Returnsthecosine
tf.sin Returnsthesine
Three-dimensionaltensorsThefollowingcommandsbuildathree-dimensionaltensor:
>>>importnumpyasnp
>>>tensor_3d=np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>>>printtensor_3d
[[[12]
[34]]
[[56]
[78]]]
>>>
Thethree-dimensionaltensorcreatedisa2x2x2matrix:
>>>tensor_3d.shape
(2L,2L,2L)
Toretrieveanelementfromathree-dimensionaltensor,weuseanexpressionofthefollowingform:
tensor_3d[plane,row,col]
Followingthesesettings:
Matrix3×3representation
Soallthefourelementsinthefirstplaneidentifiedbythevalueofthevariableplaneequaltozero:
>>>tensor_3d[0,0,0]
1
>>>tensor_3d[0,0,1]
2
>>>tensor_3d[0,1,0]
3
>>>tensor_3d[0,1,1]
4
Thethree-dimensionaltensorsallowtointroducethenexttopic,linkedtothemanipulationofimagesbutmoregenerallyintroducesustooperateassimpletransformationsontensors.
HandlingtensorswithTensorFlowTensorFlowisdesignedtohandletensorsofallsizesandoperatorsthatcanbeusedtomanipulatethem.Inthisexample,inordertoseearraymanipulations,wearegoingtoworkwithadigitalimage.Asyouprobablyknow,acolordigitalimagethatisaMxNx3sizematrix(athreeordertensor),whosecomponentscorrespondtothecomponentsofred,green,andblueintheimage(RGBspace),meansthateachfeatureintherectangularboxfortheRGBimagewillbespecifiedbythreecoordinates,i,j,andk.
TheRGBtensor
ThefirstthingIwanttoshowyouishowtouploadanimage,andthentoextractasub-imagefromtheoriginal,usingtheTensorFlowsliceoperator.
Preparetheinputdata
Usingtheimreadcommandinmatplotlib,weimportadigitalimageinstandardformatcolors(JPG,BMP,TIF):
importmatplotlib.imageasmp_image
filename="packt.jpeg"
input_image=mp_image.imread(filename)
However,wecanseetherankandtheshapeofthetensor:
print'inputdim={}'.format(input_image.ndim)
print'inputshape={}'.format(input_image.shape)
You'llseetheoutput,whichis(80,144,3).Thismeanstheimageis80pixelshigh,144pixelswide,and3colorsdeep.
Finally,usingmatplotlib,itispossibletovisualizetheimportedimage:
importmatplotlib.pyplotasplt
plt.imshow(input_image)
plt.show()
Thestartingimage
Inthisexample,sliceisabidimensionalsegmentofthestartingimage,whereeachpixelhastheRGBcomponents,soweneedaplaceholdertostoreallthevaluesoftheslice:
importTensorFlowastf
my_image=tf.placeholder("uint8",[None,None,3])
Forthelastdimension,we'llneedonlythreevalues.ThenweusetheTensorFlowoperatorslicetocreateasub-image:
slice=tf.slice(my_image,[10,0,0],[16,-1,-1])
ThelaststepistobuildaTensorFlowworkingsession:
withtf.Session()assession:
result=session.run(slice,feed_dict={my_image:input_image})
print(result.shape)
plt.imshow(result)
plt.show()
Theresultingshapeisthenasthefollowingimageshows:
Theinputimageaftertheslice
Inthisnextexample,wewillperformageometrictransformationoftheinputimage,usingthetransposeoperator:
importTensorFlowastf
Weassociatetheinputimagetoavariablewecallx:
x=tf.Variable(input_image,name='x')
Wetheninitializeourmodel:
model=tf.initialize_all_variables()
Next,webuildupthesessionwiththatwerunourcode:
withtf.Session()assession:
Toperformthetransposeofourmatrix,usethetransposefunctionofTensorFlow.Thismethodperformsaswapbetweentheaxes0and1oftheinputmatrix,whilethezaxisisleftunchanged:
x=tf.transpose(x,perm=[1,0,2])
session.run(model)
result=session.run(x)
plt.imshow(result)
plt.show()
Theresultisthefollowing:
Thetransposedimage
ComplexnumbersandfractalsFirstofall,welookathowPythonhandlescomplexnumbers.Itisasimplematter.Forexample,settingx=5+4jinPython,wemustwritethefollowing:
>>>x=5.+4j
Itmeansthat>>>xisequalto5+4j.
Atthesametime,youcanwritethefollowing:
>>>x=complex(5,4)
>>>x
(5+4j)
Wealsonotethat:
Pythonusesjtomean√-1insteadofiinmath.Ifyouputanumberbeforethej,Pythonwillconsideritasanimaginarynumber,otherwise,itsavariable.Itmeansthatifyouwanttowritetheimaginarynumberi,youmustwrite1jratherthanj.
TogettherealandimaginarypartsofaPythoncomplexnumber,youcanusethefollowing:
>>>x.real
5.0
>>>x.imag
4.0
>>>
Weturnnowtoourproblem,namelyhowtodisplaythefractalswithTensorFlow.TheMandelbrotsetisoneofthemostfamousfractals.Afractalisageometricobjectthatisrepeatedinitsstructureatdifferentscales.Fractalsareverycommoninnature,andanexampleisthecoastofGreatBritain.
TheMandelbrotsetisdefinedforthecomplexnumberscforwhichthefollowingsuccessionistrueandbounded:
Z(n+1)=Z(n)2+c,whereZ(0)=0
ThesetisnamedafteritscreatorBenoîtMandelbrot,aPolishmathematician
famousformakingfamousfractals.However,hewasabletogiveashapeorgraphicrepresentationtothesetofMandelbrotonlywiththehelpofcomputerprogramming.In1985,hepublishedinScientificAmericanthefirstalgorithmtocalculatetheMandelbrotset.Thealgorithm(foreachpointcomplexpointZ):
1. Zhasinitialvalueequalto0,Z(0)=0.2. Choosethecomplexnumbercasthecurrentpoint.IntheCartesianplane,the
abscissaaxis(horizontalline)representstherealpart,whiletheaxisofordinates(verticalline)representstheimaginarypartofc.
3. Iteration:Z(n+1)=Z(n)2+cStopwhenZ(n)2islargerthanthemaximumradius;
NowweseethroughsimplestepshowwecantranslatethealgorithmmentionedearlierusingTensorFlow.
PreparethedataforMandelbrotsetImportthenecessarylibrariestoourexample:
importTensorFlowastf
importnumpyasnp
importmatplotlib.pyplotasplt
WebuildacomplexgridthatwillcontainourMandelbrot'sset.Theregionofthecomplexplaneisbetween-1.3and+1.3ontherealaxisandbetween-2jand+1jontheimaginaryaxis.Eachpixellocationineachimagewillrepresentadifferentcomplexvalue,z:
Y,X=np.mgrid[-1.3:1.3:0.005,-2:1:0.005]
Z=X+1j*Y
c=tf.constant(Z.astype(np.complex64))
Thenwedefinedatastructures,orthetensorTensorFlowthatcontainsallthedatatobeincludedinthecalculation.Wethendefinetwovariables.Thefirstistheoneonwhichwewillmakeouriteration.Ithasthesamedimensionsasthecomplexgrid,butitisdeclaredasvariable,thatis,itsvalueswillchangeinthecourseofthecalculation:
zs=tf.Variable(c)
Thenextvariableisinitializedtozero.Italsohasthesamesizeasthevariablezs:
ns=tf.Variable(tf.zeros_like(c,tf.float32))
BuildandexecutetheDataFlowGraphforMandelbrot'ssetInsteadtointroduceasessionweinstantiateanInteractiveSession():
sess=tf.InteractiveSession()
Itrequires,asweshallsee,theTensor.eval()andOperation.run()methods.Thenweinitializeallthevariablesinvolvedthroughtherun()method:
tf.initialize_all_variables().run()
Starttheiteration:
zs_=zs*zs+c
Definethestopconditionoftheiteration:
not_diverged=tf.complex_abs(zs_)<4
Thenweusethegroupoperatorthatgroupsmultipleoperations:
step=tf.group(zs.assign(zs_),\
ns.assign_add(tf.cast(not_diverged,tf.float32)))
ThefirstoperationisthestepiterationZ(n+1)=Z(n)2+ctocreateanewvalue.
Thesecondoperationaddsthisvaluetothecorrespondentelementvariableinns.Whenthisopfinishes,allopsininputhavefinished.Thisoperatorhasnooutput.
Thenweruntheoperatorfortwohundredsteps:
foriinrange(200):step.run()
VisualizetheresultforMandelbrot'ssetTheresultwillbethetensorns.eval().Usingmatplotlib,let'svisualizetheresult:
plt.imshow(ns.eval())
plt.show()
TheMandelbrotset
Ofcourse,theMandelbrotsetisnottheonlyfractalwecanvisualize.JuliasetsarefractalsthathavebeennamedafterGastonMauriceJuliaforhisworkinthisfield.TheirbuildingprocessisverysimilartothatusedfortheMandelbrotset.
PreparethedataforJulia'ssetLet'sdefinetheoutputcomplexplane.Itisbetween-2and+2ontherealaxisandbetween-2jand+2jontheimaginaryaxis:
Y,X=np.mgrid[-2:2:0.005,-2:2:0.005]
Andthecurrentpointlocation:
Z=X+1j*Y
ThedefinitionoftheJulia'ssetrequiresredefingZasaconstanttensor:
Z=tf.constant(Z.astype("complex64"))
Thustheinputtensorssupportingourcalculationisasfollows:
zs=tf.Variable(Z)
ns=tf.Variable(tf.zeros_like(Z,"float32"))
BuildandexecutetheDataFlowGraphforJulia'ssetAsinthepreviousexample,wecreateourowninteractivesession:
sess=tf.InteractiveSession()
Wetheninitializetheinputtensors:
tf.initialize_all_variables().run()
TocomputethenewvaluesoftheJuliaset,wewillusetheiterativeformulaZ(n+1)=Z(n)2–c,wheretheinitialpointcwillbeequaltotheimaginarynumber0.75i:
c=complex(0.0,0.75)
zs_=zs*zs-c
Thegroupingoperatorandthestopiteration'sconditionwillbethesameasintheMandelbrotcomputation:
not_diverged=tf.complex_abs(zs_)<4
step=tf.group(zs.assign(zs_),\
ns.assign_add(tf.cast(not_diverged,"float32")))
Finally,weruntheoperatorfortwohundredsteps:
foriinrange(200):step.run()
VisualizetheresultTovisualizetheresultrunthefollowingcommand:
plt.imshow(ns.eval())
plt.show()
TheJuliaset
ComputinggradientsTensorFlowhasfunctionstosolveothermorecomplextasks.Forexample,wewilluseamathematicaloperatorthatcalculatesthederivativeofywithrespecttoitsexpressionxparameter.Forthispurpose,weusethetf.gradients()function.
Letusconsiderthemathfunctiony=2x².Wewanttocomputethegradientdiywithrespecttox=1.Thefollowingisthecodetocomputethisgradient:
1. First,importtheTensorFlowlibrary:
importTensorFlowastf
2. Thexvariableistheindependentvariableofthefunction:
x=tf.placeholder(tf.float32)
3. Let'sbuildthefunction:
y=2*x*x
4. Finally,wecallthetf.gradients()functionwithyandxasarguments:
var_grad=tf.gradients(y,x)
5. Toevaluatethegradient,wemustbuildasession:
withtf.Session()assession:
6. Thegradientwillbeevaluatedonthevariablex=1:
var_grad_val=session.run(var_grad,feed_dict={x:1})
7. Thevar_grad_valvalueisthefeedresult,tobeprinted:
print(var_grad_val)
8. Thatgivesthefollowingresult:
>>
[4.0]
>>
RandomnumbersThegenerationofrandomnumbersisessentialinmachinelearningandwithinthetrainingalgorithms.Whenrandomnumbersaregeneratedbyacomputer,theyaregeneratedbyaPseudoRandomNumberGenerator(PRNG).Thetermpseudocomesfromthefactthatthecomputerisastainlogicallyprogrammedrunningofinstructionsthatcanonlysimulaterandomness.Despitethislogicallimitation,computersareveryefficientatgeneratingrandomnumbers.TensorFlowprovidesoperatorstocreaterandomtensorswithdifferentdistributions.
UniformdistributionGenerally,whenweneedtoworkwithrandomnumbers,wetrytogetrepeatedvalueswiththesamefrequency,uniformlydistributed.TheoperatorTensorFlowprovidesvaluesbetweenminvalandmaxval,allwiththesameprobability.Let'sseeasimpleexamplecode:
random_uniform(shape,minval,maxval,dtype,seed,name)
WeimporttheTensorFlowlibraryandmatplotlibtodisplaytheresults:
importTensorFlowastf
importmatplotlib.pyplotasplt
Theuniformvariableisa1-dimensionaltensor,theelements100,containingvaluesrangingfrom0to1,distributedwiththesameprobability:
uniform=tf.random_uniform([100],minval=0,maxval=1,dtype=tf.float32)
Let'sdefinethesession:
sess=tf.Session()
Inoursession,weevaluatethetensoruniform,usingtheeval()operator:
withtf.Session()assession:
printuniform.eval()
plt.hist(uniform.eval(),normed=True)
plt.show()
Asyoucansee,allintermediatevaluesbetween0and1haveapproximatelythesamefrequency.Thisbehavioriscalleduniformdistribution.Theresultofexecutionisthereforeasfollows:
Uniformdistribution
NormaldistributionInsomespecificcases,youmayneedtogeneraterandomnumbersthatdifferbyafewunits.Inthiscase,weusedthenormaldistributionofrandomnumbers,alsocalledGaussiandistribution,thatincreasestheprobabilityofthenextissuesextractionat0.Eachintegerrepresentsthestandarddeviation.Asshownfromthefutureissuestothemarginsoftherangehaveaverylowchanceofbeingextracted.ThefollowingistheimplementationwithTensorFlow:
importTensorFlowastf
importmatplotlib.pyplotasplt
norm=tf.random_normal([100],mean=0,stddev=2)
withtf.Session()assession:
plt.hist(norm.eval(),normed=True)
plt.show()
Wecreateda1d-tensorofshape[100]consistingofrandomnormalvalues,withmeanequalto0andstandarddeviationequalto2,usingtheoperatortf.random_normal.Thefollowingistheresult:
Normaldistribution
GeneratingrandomnumberswithseedsWerecallthatoursequenceispseudo-random,becausethevaluesarecalculatedusingadeterministicalgorithmandprobabilityplaysnorealrole.Theseedisjustastartingpointforthesequenceandifyoustartfromthesameseedyouwillendupwiththesamesequence.Thisisveryuseful,forexample,todebugyourcode,whenyouaresearchingforanerrorinaprogramandyoumustbeabletoreproducetheproblembecauseeveryrunwouldbedifferent.
Considerthefollowingexamplewherewehavetwouniformdistributions:
uniform_with_seed=tf.random_uniform([1],seed=1)
uniform_without_seed=tf.random_uniform([1])
Inthefirstuniformdistribution,webeganwiththeseed=1.Thismeansthatrepeatedlyevaluatingthetwodistributions,thefirstuniformdistributionwillalwaysgeneratethesamesequenceofvalues:
print("FirstRun")
withtf.Session()asfirst_session:
print("uniformwith(seed=1)={}"\
.format(first_session.run(uniform_with_seed)))
print("uniformwith(seed=1)={}"\
.format(first_session.run(uniform_with_seed)))
print("uniformwithoutseed={}"\
.format(first_session.run(uniform_without_seed)))
print("uniformwithoutseed={}"\
.format(first_session.run(uniform_without_seed)))
print("SecondRun")
withtf.Session()assecond_session:
print("uniformwith(seed=1)={}\
.format(second_session.run(uniform_with_seed)))
print("uniformwith(seed=1)={}\
.format(second_session.run(uniform_with_seed)))
print("uniformwithoutseed={}"\
.format(second_session.run(uniform_without_seed)))
print("uniformwithoutseed={}"\
.format(second_session.run(uniform_without_seed)))
Asyoucansee,thisistheendresult.Theuniformdistributionwithseed=1alwaysgivesthesameresult:
>>>
FirstRun
uniformwith(seed=1)=[0.23903739]
uniformwith(seed=1)=[0.22267115]
uniformwithoutseed=[0.92157185]
uniformwithoutseed=[0.43226039]
SecondRun
uniformwith(seed=1)=[0.23903739]
uniformwith(seed=1)=[0.22267115]
uniformwithoutseed=[0.50188708]
uniformwithoutseed=[0.21324408]
>>>
Montecarlo'smethod
WeendthesectiononrandomnumberswithasimplenoteabouttheMontecarlomethod.Itisanumericalprobabilisticmethodwidelyusedintheapplicationofhigh-performancescientificcomputing.Inourexample,wewillcalculatethevalueofπ:
importTensorFlowastf
trials=100
hits=0
Generatepseudo-randompointsinsidethesquare[-1,1]×[-1,1],usingtherandom_uniformfunction:
x=tf.random_uniform([1],minval=-1,maxval=1,dtype=tf.float32)
y=tf.random_uniform([1],minval=-1,maxval=1,dtype=tf.float32)
pi=[]
Startthesession:
sess=tf.Session()
Insidethesession,wecalculatethevalueofπ:theareaofthecircleisπandthatofthesquareis4.Therelationshipbetweenthenumbersinsidethecircleandthetotalofgeneratedpointsmustconverge(veryslowly)toπ,andwecounthowmanypointsfallinsidethecircleequationx2+y2=1.
withsess.as_default():
foriinrange(1,trials):
forjinrange(1,trials):
ifx.eval()**2+y.eval()**2<1:
hits=hits+1
pi.append((4*float(hits)/i)/trials)
plt.plot(pi)
plt.show()
Thefigureshowstheconvergenceduringthenumberofteststotheπvalue
SolvingpartialdifferentialequationsApartialdifferentialequation(PDE)isadifferentialequationinvolvingpartialderivativesofanunknownfunctionofseveralindependentvariables.PDEsarecommonlyusedtoformulateandsolvemajorphysicalproblemsinvariousfields,fromquantummechanicstofinancialmarkets.Inthissection,wetaketheexamplefromhttps://www.TensorFlow.org/versions/r0.8/tutorials/pdes/index.html,showingtheuseofTensorFlowinatwo-dimensionalPDEsolutionthatmodelsthesurfaceofsquarepondwithafewraindropslandingonit.Theeffectwillbetoproducebi-dimensionalwavesontheponditself.Wewon'tconcentrateonthecomputationalaspectsoftheproblem,asthisisbeyondthescopeofthisbook;insteadwewillfocusonusingTensorFlowtodefinetheproblem.
Thestartingpointistoimportthesefundamentallibraries:
importTensorFlowastf
importnumpyasnp
importmatplotlib.pyplotasplt
InitialconditionFirstwehavetodefinethedimensionsoftheproblem.Let'simaginethatourpondisa500x500square:
N=500
Thefollowingtwo-dimensionaltensoristhepondattimet=0,thatis,theinitialconditionofourproblem:
u_init=np.zeros([N,N],dtype=np.float32)
Wehave40randomraindropsonit
forninrange(40):
a,b=np.random.randint(0,N,2)
u_init[a,b]=np.random.uniform()
Thenp.random.randint(0,N,2)isaNumPyfunctionthatreturnsrandomintegersfrom0toNonatwo-dimensionalshape.
Usingmatplotlib,wecanshowtheinitialsquarepond:
plt.imshow(U.eval())
plt.show()
Zoomingonthepondinitsinitialcondition:thecoloreddotsrepresenttheraindropsfallen
Thenwedefinethefollowingtensor:
ut_init=np.zeros([N,N],dtype=np.float32)
Itisthetemporalevolutionofthepond.Attimet=tenditwillcontainthefinalstateofthepond.
ModelbuildingWemustdefinesomefundamentalparameters(usingTensorFlowplaceholders)andatimestepofthesimulation:
eps=tf.placeholder(tf.float32,shape=())
Wemustalsodefineaphysicalparameterofthemodel,namelythedampingcoefficient:
damping=tf.placeholder(tf.float32,shape=())
ThenweredefineourstartingtensorsasTensorFlowvariables,sincetheirvalueswillchangeoverthecourseofthesimulation:
U=tf.Variable(u_init)
Ut=tf.Variable(ut_init)
Finally,webuildourPDEmodel.Itrepresentstheevolutionintimeofthepondaftertheraindropshavefallen:
U_=U+eps*Ut
Ut_=Ut+eps*(laplace(U)-damping*Ut)
Asyoucansee,weintroducedthelaplace(U)functiontoresolvethePDE(itwillbedescribedinthelastpartofthissection).
UsingtheTensorFlowgroupoperator,wedefinehowourpondintimetshouldevolve:
step=tf.group(
U.assign(U_),
Ut.assign(Ut_))
Let'srecallthatthegroupoperatorgroupsmultipleoperationsasasingleone.
GraphexecutionInoursessionwewillseetheevolutionintimeofthepondby1000steps,whereeachtimestepisequalto0.03s,whilethedampingcoefficientissetequalto0.04.
Let'sinitializetheTensorFlowvariables:
tf.initialize_all_variables().run()
Thenwerunthesimulation:
foriinrange(1000):
step.run({eps:0.03,damping:0.04})
ifi%50==0:
clear_output()
plt.imshow(U.eval())
plt.show()
Every50stepsthesimulationresultwillbedisplayedasfollows:
Thepondafter400simulationsteps
Computationalfunctionused
Let'snowseewhatistheLaplace(U)functionandtheancillaryfunctionsused:
defmake_kernel(a):
a=np.asarray(a)
a=a.reshape(list(a.shape)+[1,1])
returntf.constant(a,dtype=1)
defsimple_conv(x,k):
x=tf.expand_dims(tf.expand_dims(x,0),-1)
y=tf.nn.depthwise_conv2d(x,k,[1,1,1,1],padding='SAME')
returny[0,:,:,0]
deflaplace(x):
laplace_k=make_kernel([[0.5,1.0,0.5],
[1.0,-6.,1.0],
[0.5,1.0,0.5]])
returnsimple_conv(x,laplace_k)
Thesefunctionsdescribethephysicsofthemodel,thatis,asthewaveiscreatedandpropagatesinthepond.Iwillnotgointothedetailsofthesefunctions,theunderstandingofwhichisbeyondthescopeofthisbook.
Thefollowingfigureshowsthewavesonthepondaftertheraindropshavefallen.
Zoomingonthepond
SummaryInthischapter,welookedatsomeofthemathematicalpotentialofTensorFlow.Fromthefundamentaldefinitionofatensor,thebasicdatastructureforanytypeofcomputation,wesawwithsomeexampleshowtohandlethesedatastructuresusingtheTensorFlow'smathoperators.Usingcomplexnumbers,weexploredtheworldoffractals.Thenweintroducedtheconceptofrandomnumbers.Theseareinfactusedinmachinelearningformodeldevelopmentandtesting,sothechapterendedwithanexampleofdefiningandsolvingamathematicalproblemusingdifferentialequationswithpartialderivatives.
Inthenextchapter,finallywe'llstarttoseeTensorFlowinactionrightinthefieldforwhichitwasdeveloped-inmachinelearning,solvingcomplexproblemssuchasclassificationanddataclustering.
Chapter3.StartingwithMachineLearningInthischapter,wewillcoverthefollowingtopics:
LinearregressionTheMNISTdatasetClassifiersThenearestneighboralgorithmDataclusteringThek-meansalgorithm
ThelinearregressionalgorithmInthissection,webeginourexplorationofmachinelearningtechniqueswiththelinearregressionalgorithm.Ourgoalistobuildamodelbywhichtopredictthevaluesofadependentvariablefromthevaluesofoneormoreindependentvariables.
Therelationshipbetweenthesetwovariablesislinear;thatis,ifyisthedependentvariableandxtheindependent,thenthelinearrelationshipbetweenthetwovariableswilllooklikethis:y=Ax+b.
Thelinearregressionalgorithmadaptstoagreatvarietyofsituations;foritsversatility,itisusedextensivelyinthefieldofappliedsciences,forexample,biologyandeconomics.
Furthermore,theimplementationofthisalgorithmallowsustointroduceinatotallyclearandunderstandablewaythetwoimportantconceptsofmachinelearning:thecostfunctionandthegradientdescentalgorithms.
DatamodelThefirstcrucialstepistobuildourdatamodel.Wementionedearlierthattherelationshipbetweenourvariablesislinear,thatis:y=Ax+b,whereAandbareconstants.Totestouralgorithm,weneeddatapointsinatwo-dimensionalspace.
WestartbyimportingthePythonlibraryNumPy:
importnumpyasnp
Thenwedefinethenumberofpointswewanttodraw:
number_of_points=500
Weinitializethefollowingtwolists:
x_point=[]
y_point=[]
Thesepointswillcontainthegeneratedpoints.
Wethensetthetwoconstantsthatwillappearinthelinearrelationofywithx:
a=0.22
b=0.78
ViaNumPy'srandom.normalfunction,wegenerate300randompointsaroundtheregressionequationy=0.22x+0.78:
foriinrange(number_of_points):
x=np.random.normal(0.0,0.5)
y=a*x+b+np.random.normal(0.0,0.1)
x_point.append([x])
y_point.append([y])
Finally,viewthegeneratedpointsbymatplotlib:
importmatplotlib.pyplotasplt
plt.plot(x_point,y_point,'o',label='InputData')
plt.legend()
plt.show()
Linearregression:Thedatamodel
Costfunctionsandgradientdescent
ThemachinelearningalgorithmthatwewanttoimplementwithTensorFlowmustpredictvaluesofyasafunctionofxdataaccordingtoourdatamodel.ThelinearregressionalgorithmwilldeterminethevaluesoftheconstantsAandb(fixedforourdatamodel),whicharethenthetrueunknownsoftheproblem.
Thefirststepistoimportthetensorflowlibrary:
importtensorflowastf
ThendefinetheAandbunknowns,usingtheTensorFlowtf.Variable:
A=tf.Variable(tf.random_uniform([1],-1.0,1.0))
TheunknownfactorAwasinitializedusingarandomvaluebetween-1and1,whilethevariablebisinitiallysettozero:
b=tf.Variable(tf.zeros([1]))
Sowewritethelinearrelationshipthatbindsytox:
y=A*x_point+b
Nowwewillintroduce,thiscostfunction:thathasparameterscontainingapairofvaluesAandbtobedeterminedwhichreturnsavaluethatestimateshowwelltheparametersarecorrect.Inthisexample,ourcostfunctionismeansquareerror:
cost_function=tf.reduce_mean(tf.square(y-y_point))
Itprovidesanestimateofthevariabilityofthemeasures,ormoreprecisely,ofthedispersionofvaluesaroundtheaveragevalue;asmallvalueofthisfunctioncorrespondstoabestestimatefortheunknownparametersAandb.
Tominimizecost_function,weuseanoptimizationalgorithmwiththegradientdescent.Givenamathematicalfunctionofseveralvariables,gradientdescentallowstofindalocalminimumofthisfunction.Thetechniqueisasfollows:
Evaluate,atanarbitraryfirstpointofthefunction'sdomain,thefunctionitselfanditsgradient.Thegradientindicatesthedirectioninwhichthefunctiontendstoaminimum.Selectasecondpointinthedirectionindicatedbythegradient.Ifthefunctionforthissecondpointhasavaluelowerthanthevaluecalculatedatthefirstpoint,thedescentcancontinue.
Youcanrefertothefollowingfigureforavisualexplanationofthealgorithm:
Thegradientdescentalgorithm
Wealsoremarkthatthegradientdescentisonlyalocalfunctionminimum,butitcanalsobeusedinthesearchforaglobalminimum,randomlychoosinganewstartpointonceithasfoundalocalminimumandrepeatingtheprocessmanytimes.Ifthenumberofminimaofthefunctionislimited,andthereareveryhighnumberofattempts,thenthereisagoodchancethatsoonerorlatertheglobalminimumwillbeidentified.
UsingTensorFlow,theapplicationofthisalgorithmisverysimple.Theinstructionareasfollows:
optimizer=tf.train.GradientDescentOptimizer(0.5)
Here0.5isthelearningrateofthealgorithm.
Thelearningratedetermineshowfastorslowwemovetowardstheoptimal
weights.Ifitisverylarge,weskiptheoptimalsolution,andifitistoosmall,weneedtoomanyiterationstoconvergetothebestvalues.
Anintermediatevalue(0.5)isprovided,butitmustbetunedinordertoimprovetheperformanceoftheentireprocedure.
Wedefinetrainastheresultoftheapplicationofthecost_function(optimizer),throughitsminimizefunction:
train=optimizer.minimize(cost_function)
Testingthemodel
Nowwecantestthealgorithmofgradientdescentonthedatamodelyoucreatedearlier.Asusual,wehavetoinitializeallthevariables:
model=tf.initialize_all_variables()
Sowebuildouriteration(20computationsteps),allowingustodeterminethebestvaluesofAandb,whichdefinethelinethatbestfitsthedatamodel.Instantiatetheevaluationgraph:
withtf.Session()assession:
Weperformthesimulationonourmodel:
session.run(model)
forstepinrange(0,21):
Foreachiteration,weexecutetheoptimizationstep:
session.run(train)
Everyfivesteps,weprintourpatternofdots:
if(step%5)==0:
plt.plot(x_point,y_point,'o',
label='step={}'
.format(step))
Andthestraightlinesareobtainedbythefollowingcommand:
plt.plot(x_point,
session.run(A)*
x_point+
session.run(B))
plt.legend()
plt.show()
Thefollowingfigureshowstheconvergenceoftheimplementedalgorithm:
Linearregression:startcomputation(step=0)
Afterjustfivesteps,wecanalreadysee(inthenextfigure)asubstantialimprovementinthefitoftheline:
Linearregression:situationafter5computationsteps
Thefollowing(andfinal)figureshowsthedefinitiveresultafter20steps.Wecanseetheefficiencyofthealgorithmused,withthestraightlineefficiencyperfectlyacrossthecloudofpoints.
Linearregression:finalresult
Finallywereport,tofurtherourunderstanding,thecompletecode:
importnumpyasnp
importmatplotlib.pyplotasplt
importtensorflowastf
number_of_points=200
x_point=[]
y_point=[]
a=0.22
b=0.78
foriinrange(number_of_points):
x=np.random.normal(0.0,0.5)
y=a*x+b+np.random.normal(0.0,0.1)
x_point.append([x])
y_point.append([y])
plt.plot(x_point,y_point,'o',label='InputData')
plt.legend()
plt.show()
A=tf.Variable(tf.random_uniform([1],-1.0,1.0))
B=tf.Variable(tf.zeros([1]))
y=A*x_point+B
cost_function=tf.reduce_mean(tf.square(y-y_point))
optimizer=tf.train.GradientDescentOptimizer(0.5)
train=optimizer.minimize(cost_function)
model=tf.initialize_all_variables()
withtf.Session()assession:
session.run(model)
forstepinrange(0,21):
session.run(train)
if(step%5)==0:
plt.plot(x_point,y_point,'o',
label='step={}'
.format(step))
plt.plot(x_point,
session.run(A)*
x_point+
session.run(B))
plt.legend()
plt.show()
TheMNISTdatasetTheMNISTdataset(availableathttp://yann.lecun.com/exdb/mnist/),iswidelyusedfortrainingandtestinginthefieldofmachinelearning,andwewilluseitintheexamplesofthisbook.Itcontainsblackandwhiteimagesofhandwrittendigitsfrom0to9.
Thedatasetisdividedintotwogroups:60,000totrainthemodelandanadditional10,000totestit.Theoriginalimages,inblackandwhite,werenormalizedtofitintoaboxofsize28×28pixelsandcenteredbycalculatingthecenterofmassofthepixels.ThefollowingfigurerepresentshowthedigitscouldberepresentedintheMNISTdataset:
MNISTdigitsampling
EachMNISTdatapointisanarrayofnumbersdescribinghowdarkeachpixelis.Forexample,forthefollowingdigit(thedigit1),wecouldhave:
Pixelrepresentationofthedigit1
DownloadingandpreparingthedataThefollowingcodeimportstheMNISTdatafilesthatwearegoingtoclassify.IamusingascriptfromGooglethatcanbedownloadedfrom:
https://github.com/tensorflow/tensorflow/blob/r0.7/tensorflow/examples/tutorials/mnist/input_data.pyThismustberuninthesamefolderwherethefilesarelocated.
Nowwewillshowhowtoloadanddisplaythedata:
importinput_data
importnumpyasnp
importmatplotlib.pyplotasplt
Usinginput_data,weloadthedatasets:
mnist_images=input_data.read_data_sets\
("MNIST_data/",\
one_hot=False)
train.next_batch(10)returnsthefirst10images:
pixels,real_values=mnist_images.train.next_batch(10)
Thisalsoreturnstwolists:thematrixofthepixelsloadedandthelistthatcontainstherealvaluesloaded:
print"listofvaluesloaded",real_values
example_to_visualize=5
print"elementN°"+str(example_to_visualize+1)\
+"ofthelistplotted"
>>
ExtractingMNIST_data/train-labels-idx1-ubyte.gz
ExtractingMNIST_data/t10k-images-idx3-ubyte.gz
ExtractingMNIST_data/t10k-labels-idx1-ubyte.gz
listofvaluesloaded[7346181098]
elementN6ofthelistplotted
>>
Whiledisplayinganelement,wecanusematplotlib,asfollows:
image=pixels[example_to_visualize,:]
image=np.reshape(image,[28,28])
plt.imshow(image)
plt.show()
Hereistheresult:
AMNISTimageofthenumbereight
ClassifiersInthecontextofmachinelearning,thetermclassificationidentifiesanalgorithmicprocedurethatassignseachnewinputdatum(instance)tooneofthepossiblecategories(classes).Ifweconsideronlytwoclasses,wetalkaboutbinaryclassification;otherwisewehaveamulti-classclassification.
Theclassificationfallsintothesupervisedlearningcategory,whichpermitsustoclassifynewinstancesbasedontheso-calledtrainingset.Thebasicstepstofollowtoresolveasupervisedclassificationproblemareasfollows:
1. Buildthetrainingexamplesinordertorepresenttheactualcontextandapplicationonwhichtoaccomplishtheclassification.
2. Choosetheclassifierandthecorrespondingalgorithmimplementation.3. Trainthealgorithmonthetrainingsetandsetanycontrolparametersthrough
validation.4. Evaluatetheaccuracyandperformanceoftheclassifierbyapplyingasetof
newinstances(testset).
ThenearestneighboralgorithmTheK-nearestneighbor(KNN)isasupervisedlearningalgorithmforbothclassificationorregression.Itisasystemthatassignstheclassofthesampletestedaccordingtoitsdistancefromtheobjectsstoredinthememory.
Thedistance,d,isdefinedastheEuclideandistancebetweentwopoints:
Herenisthedimensionofthespace.Theadvantageofthismethodofclassificationistheabilitytoclassifyobjectswhoseclassesarenotlinearlyseparable.Itisastableclassifier,giventhatsmallperturbationsofthetrainingdatadonotsignificantlyaffecttheresultsobtained.Themostobviousdisadvantage,however,isthatitdoesnotprovideatruemathematicalmodel;instead,foreverynewclassification,itshouldbecarriedoutbyaddingthenewdatatoallinitialinstancesandrepeatingthecalculationprocedurefortheselectedKvalue.
Moreover,itrequiresafairlyhighamountofdatatomakerealisticpredictionsandissensitivetothenoiseoftheanalyzeddata.
Inthenextexample,wewillimplementtheKNNalgorithmusingtheMNISTdataset.
Buildingthetrainingset
Let'sstartwiththeimportlibrariesneededforthesimulation:
importnumpyasnp
importtensorflowastf
importinput_data
Toconstructthedatamodelforthetrainingset,usetheinput_data.read_data_setsfunction,introducedearlier:
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
Inourexamplewewilltaketrainingphaseconsistingof100MNISTimages:
train_pixels,train_list_values=mnist.train.next_batch(100)
Whilewetestouralgorithmfor10images:
test_pixels,test_list_of_values=mnist.test.next_batch(10)
Finally,wedefinethetensorstrain_pixel_tensorandtest_pixel_tensorweusetoconstructourclassifier:
train_pixel_tensor=tf.placeholder\
("float",[None,784])
test_pixel_tensor=tf.placeholder\
("float",[784])
Costfunctionandoptimization
Thecostfunctionisrepresentedbythedistanceintermsofpixels:
distance=tf.reduce_sum\
(tf.abs\
(tf.add(train_pixel_tensor,\
tf.neg(test_pixel_tensor))),\
reduction_indices=1)
Thetf.reducefunctionsumcomputesthesumofelementsacrossthedimensionsofatensor.Forexample(fromtheTensorFlowon-linemanual):
#'x'is[[1,1,1]
#[1,1,1]]
tf.reduce_sum(x)==>6
tf.reduce_sum(x,0)==>[2,2,2]
tf.reduce_sum(x,1)==>[3,3]
tf.reduce_sum(x,1,keep_dims=True)==>[[3],[3]]
tf.reduce_sum(x,[0,1])==>6
Finally,tominimizethedistancefunction,weusearg_min,whichreturnstheindexwiththesmallestdistance(nearestneighbor):
pred=tf.arg_min(distance,0)
Testingandalgorithmevaluation
Accuracyisaparameterthathelpsustocomputethefinalresultoftheclassifier:
accuracy=0
Initializethevariables:
init=tf.initialize_all_variables()
Startthesimulation:
withtf.Session()assess:
sess.run(init)
foriinrange(len(test_list_of_values)):
Thenweevaluatethenearestneighborindex,usingthepredfunction,definedearlier:
nn_index=sess.run(pred,\
feed_dict={train_pixel_tensor:train_pixels,\
test_pixel_tensor:test_pixels[i,:]})
Finally,wefindthenearestneighborclasslabelandcompareittoitstruelabel:
print"TestN°",i,"PredictedClass:",\
np.argmax(train_list_values[nn_index]),\
"TrueClass:",np.argmax(test_list_of_values[i])
ifnp.argmax(train_list_values[nn_index])\
==np.argmax(test_list_of_values[i]):
Thenweevaluateandreporttheaccuracyoftheclassifier:
accuracy+=1./len(test_pixels)
print"Result=",accuracy
Aswecansee,eachelementofthetrainingsetiscorrectlyclassified.Theresultofthesimulationshowsthepredictedclasswiththerealclass,andfinallythetotalvalueofthesimulationisreported:
>>>
Extracting/tmp/data/train-labels-idx1-ubyte.gz
Extracting/tmp/data/t10k-images-idx3-ubyte.gz
Extracting/tmp/data/t10k-labels-idx1-ubyte.gz
TestN°0PredictedClass:7TrueClass:7
TestN°1PredictedClass:2TrueClass:2
TestN°2PredictedClass:1TrueClass:1
TestN°3PredictedClass:0TrueClass:0
TestN°4PredictedClass:4TrueClass:4
TestN°5PredictedClass:1TrueClass:1
TestN°6PredictedClass:4TrueClass:4
TestN°7PredictedClass:9TrueClass:9
TestN°8PredictedClass:6TrueClass:5
TestN°9PredictedClass:9TrueClass:9
Result=0.9
>>>
Theresultisnot100%accurate;thereasonisthatitliesinawrongevaluationofthetestno.8insteadof5,theclassifierhasrated6.
Finally,wereportthecompletecodeforKNNclassification:
importnumpyasnp
importtensorflowastf
importinput_data
#BuildtheTrainingSet
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
train_pixels,train_list_values=mnist.train.next_batch(100)
test_pixels,test_list_of_values=mnist.test.next_batch(10)
train_pixel_tensor=tf.placeholder\
("float",[None,784])
test_pixel_tensor=tf.placeholder\
("float",[784])
#CostFunctionanddistanceoptimization
distance=tf.reduce_sum\
(tf.abs\
(tf.add(train_pixel_tensor,\
tf.neg(test_pixel_tensor))),\
reduction_indices=1)
pred=tf.arg_min(distance,0)
#Testingandalgorithmevaluation
accuracy=0.
init=tf.initialize_all_variables()
withtf.Session()assess:
sess.run(init)
foriinrange(len(test_list_of_values)):
nn_index=sess.run(pred,\
feed_dict={train_pixel_tensor:train_pixels,\
test_pixel_tensor:test_pixels[i,:]})
print"TestN°",i,"PredictedClass:",\
np.argmax(train_list_values[nn_index]),\
"TrueClass:",np.argmax(test_list_of_values[i])
ifnp.argmax(train_list_values[nn_index])\
==np.argmax(test_list_of_values[i]):
accuracy+=1./len(test_pixels)
print"Result=",accuracy
DataclusteringAclusteringproblemconsistsintheselectionandgroupingofhomogeneousitemsfromasetofinitialdata.Tosolvethisproblem,wemust:
IdentifyaresemblancemeasurebetweenelementsFindoutiftherearesubsetsofelementsthataresimilartothemeasurechosen
Thealgorithmdetermineswhichelementsformaclusterandwhatdegreeofsimilarityunitesthemwithinthecluster.
Theclusteringalgorithmsfallintotheunsupervisedmethods,becausewedonotassumeanypriorinformationonthestructuresandcharacteristicsoftheclusters.
Thek-meansalgorithmOneofthemostcommonandsimpleclusteringalgorithmsisk-means,whichallowssubdividinggroupsofobjectsintokpartitionsonthebasisoftheirattributes.Eachclusterisidentifiedbyapointorcentroidaverage.
Thealgorithmfollowsaniterativeprocedure:
1. RandomlyselectKpointsastheinitialcentroids.2. Repeat.3. FormKclustersbyassigningallpointstotheclosestcentroid.4. Recomputethecentroidofeachcluster.5. Untilthecentroidsdon'tchange.
Thepopularityofthek-meanscomesfromitsconvergencespeedanditseaseofimplementation.Intermsofthequalityofthesolutions,thealgorithmdoesnotguaranteeachievingtheglobaloptimum.Thequalityofthefinalsolutiondependslargelyontheinitialsetofclustersandmay,inpractice,toobtainamuchworsetheglobaloptimumsolution.Sincethealgorithmisextremelyfast,youcanapplyitseveraltimesandproducesolutionsfromwhichyoucanchooseamongmostsatisfyingone.Anotherdisadvantageofthealgorithmisthatitrequiresyoutochoosethenumberofclusters(k)tofind.
Ifthedataisnotnaturallypartitioned,youwillendupgettingstrangeresults.Furthermore,thealgorithmworkswellonlywhenthereareidentifiablesphericalclustersinthedata.
Letusnowseehowtoimplementthek-meansbytheTensorFlowlibrary.
BuildingthetrainingsetImportallthenecessarylibrariestooursimulation:
importmatplotlib.pyplotasplt
importnumpyasnp
importtensorflowastf
importpandasaspd
Note
Pandasisanopensource,easy-to-usedatastructure,anddataanalysistoolforthePythonprogramminglanguage.Toinstallit,typethefollowingcommand:
sudopipinstallpandas
Wemustdefinetheparametersofourproblem.Thetotalnumberofpointsthatwewanttoclusteris1000points:
num_vectors=1000
Thenumberofpartitionsyouwanttoachievebyallinitial:
num_clusters=4
Wesetthenumberofcomputationalstepsofthek-meansalgorithm:
num_steps=100
Weinitializetheinitialinputdatastructures:
x_values=[]
y_values=[]
vector_values=[]
Thetrainingsetcreatesarandomsetofpoints,whichiswhyweusetherandom.normalNumPyfunction,allowingustobuildthex_valuesandy_valuesvectors:
foriinxrange(num_vectors):
ifnp.random.random()>0.5:
x_values.append(np.random.normal(0.4,0.7))
y_values.append(np.random.normal(0.2,0.8))
else:
x_values.append(np.random.normal(0.6,0.4))
y_values.append(np.random.normal(0.8,0.5))
WeusethePythonzipfunctiontoobtainthecompletelistofvector_values:
vector_values=zip(x_values,y_values)
Thenvector_valuesisconvertedintoaconstant,usablebyTensorFlow:
vectors=tf.constant(vector_values)
Wecanseeourtrainingsetfortheclusteringalgorithmwiththefollowingcommands:
plt.plot(x_values,y_values,'o',label='InputData')
plt.legend()
plt.show()
Thetrainingsetfork-means
Afterrandomlybuildingthetrainingset,wehavetogenerate(k=4)centroid,then
determineanindexusingtf.random_shuffle:
n_samples=tf.shape(vector_values)[0]
random_indices=tf.random_shuffle(tf.range(0,n_samples))
Byadoptingthisprocedure,weareabletodeterminefourrandomindices:
begin=[0,]
size=[num_clusters,]
size[0]=num_clusters
Theyhavetheirownindexesofourinitialcentroids:
centroid_indices=tf.slice(random_indices,begin,size)
centroids=tf.Variable(tf.gather\
(vector_values,centroid_indices))
CostfunctionsandoptimizationThecostfunctionwewanttominimizeforthisproblemisagaintheEuclideandistancebetweentwopoints:
Inordertomanagethetensorsdefinedpreviously,vectorsandcentroids,weusetheTensorFlowfunctionexpand_dims,whichautomaticallyexpandsthesizeofthetwoarguments:
expanded_vectors=tf.expand_dims(vectors,0)
expanded_centroids=tf.expand_dims(centroids,1)
Thisfunctionallowsyoutostandardizetheshapeofthetwotensors,inordertoevaluatethedifferencebythetf.submethod:
vectors_subtration=tf.sub(expanded_vectors,expanded_centroids)
Finally,webuildtheeuclidean_distancescostfunction,usingthetf.reduce_sumfunction,whichcomputesthesumofelementsacrossthedimensionsofatensor,whilethetf.squarefunctioncomputesthesquareofthevectors_subtrationelement-wisetensor:
euclidean_distances=tf.reduce_sum(tf.square\
(vectors_subtration),2)
assignments=tf.to_int32(tf.argmin(euclidean_distances,0))
Hereassignmentsisthevalueoftheindexwiththesmallestdistanceacrossthetensoreuclidean_distances.Letusnowturntotheoptimizationphase,thepurposeofwhichistoimprovethechoiceofcentroids,onwhichtheconstructionoftheclustersdepends.Wepartitionthevectors(whichisourtrainingset)intonum_clusterstensors,usingindicesfromassignments.
Thefollowingcodetakesthenearestindicesforeachsample,andgrabsthoseoutasseparategroupsusingtf.dynamic_partition:
partitions=tf.dynamic_partition\
(vectors,assignments,num_clusters)
Finally,weupdatethecentroids,usingtf.reduce_meanonasinglegrouptofindtheaverageofthatgroup,formingitsnewcentroid:
update_centroids=tf.concat(0,\
[tf.expand_dims\
(tf.reduce_mean(partition,0),0)\
forpartitioninpartitions])
Toformtheupdate_centroidstensor,weusetf.concattoconcatenatethesingleone.
Testingandalgorithmevaluation
It'stimetotestandevaluatethealgorithm.Thefirstprocedureistoinitializeallthevariablesandinstantiatetheevaluationgraph:
init_op=tf.initialize_all_variables()
sess=tf.Session()
sess.run(init_op)
Nowwestartthecomputation:
forstepinxrange(num_steps):
_,centroid_values,assignment_values=\
sess.run([update_centroids,\
centroids,\
assignments])
Todisplaytheresult,weimplementthefollowingfunction:
display_partition(x_values,y_values,assignment_values)
Thistakesthex_valuesandy_valuesvectorsofthetrainingset,andtheassignemnt_valuesvector,todrawtheclusters.
Thecodeforthisvisualizationfunctionisasfollows:
defdisplay_partition(x_values,y_values,assignment_values):
labels=[]
colors=["red","blue","green","yellow"]
foriinxrange(len(assignment_values)):
labels.append(colors[(assignment_values[i])])
color=labels
df=pd.DataFrame\
(dict(x=x_values,y=y_values,color=labels))
fig,ax=plt.subplots()
ax.scatter(df['x'],df['y'],c=df['color'])
plt.show()
Itassociatestoeachclusteritscolorbymeansofthefollowingdatastructure:
colors=["red","blue","green","yellow"]
Itthendrawsthemthroughthescatterfunctionofmatplotlib:
ax.scatter(df['x'],df['y'],c=df['color'])
Let'sdisplaytheresult:
Finalresultofthek-meansalgorithm
Hereisthecompletecodeofthek-meansalgorithm:
importmatplotlib.pyplotasplt
importnumpyasnp
importpandasaspd
importtensorflowastf
defdisplay_partition(x_values,y_values,assignment_values):
labels=[]
colors=["red","blue","green","yellow"]
foriinxrange(len(assignment_values)):
labels.append(colors[(assignment_values[i])])
color=labels
df=pd.DataFrame\
(dict(x=x_values,y=y_values,color=labels))
fig,ax=plt.subplots()
ax.scatter(df['x'],df['y'],c=df['color'])
plt.show()
num_vectors=2000
num_clusters=4
n_samples_per_cluster=500
num_steps=1000
x_values=[]
y_values=[]
vector_values=[]
#CREATERANDOMDATA
foriinxrange(num_vectors):
ifnp.random.random()>0.5:
x_values.append(np.random.normal(0.4,0.7))
y_values.append(np.random.normal(0.2,0.8))
else:
x_values.append(np.random.normal(0.6,0.4))
y_values.append(np.random.normal(0.8,0.5))
vector_values=zip(x_values,y_values)
vectors=tf.constant(vector_values)
n_samples=tf.shape(vector_values)[0]
random_indices=tf.random_shuffle(tf.range(0,n_samples))
begin=[0,]
size=[num_clusters,]
size[0]=num_clusters
centroid_indices=tf.slice(random_indices,begin,size)
centroids=tf.Variable(tf.gather(vector_values,centroid_indices))
expanded_vectors=tf.expand_dims(vectors,0)
expanded_centroids=tf.expand_dims(centroids,1)
vectors_subtration=tf.sub(expanded_vectors,expanded_centroids)
euclidean_distances=
\tf.reduce_sum(tf.square(vectors_subtration),2)
assignments=tf.to_int32(tf.argmin(euclidean_distances,0))
partitions=[0,0,1,1,0]
num_partitions=2
data=[10,20,30,40,50]
outputs[0]=[10,20,50]
outputs[1]=[30,40]
partitions=tf.dynamic_partition(vectors,assignments,num_clusters)
update_centroids=tf.concat(0,[tf.expand_dims
(tf.reduce_mean(partition,0),0)\
forpartitioninpartitions])
init_op=tf.initialize_all_variables()
sess=tf.Session()
sess.run(init_op)
forstepinxrange(num_steps):
_,centroid_values,assignment_values=\
sess.run([update_centroids,\
centroids,\
assignments])
display_partition(x_values,y_values,assignment_values)
plt.plot(x_values,y_values,'o',label='InputData')
plt.legend()
plt.show()
SummaryInthischapter,webegantoexplorethepotentialofTensorFlowforsometypicalproblemsinMachineLearning.Withthelinearregressionalgorithm,theimportantconceptsofcostfunctionandoptimizationusinggradientdescentwereexplained.WethendescribedthedatasetMNISTofhandwrittendigits.Wealsoimplementedamulticlassclassifierusingthenearestneighboralgorithm,whichfallsintotheMachineLearningsupervisedlearningcategory.Thenthechapterconcludedwithanexampleofunsupervisedlearning,byimplementingthek-meansalgorithmforsolvingadataclusteringproblem.
Inthenextchapter,wewillintroduceneuralnetworks.Thesearemathematicalmodelsthatrepresenttheinterconnectionbetweenelementsdefinedasartificialneurons,namelymathematicalconstructsthatmimicthepropertiesoflivingneurons.
We'llalsoimplementsomeneuralnetworklearningmodelsusingTensorFlow.
Chapter4.IntroducingNeuralNetworksInthischapter,wewillcoverthefollowingtopics:
Whatareneuralnetworks?SingleLayerPerceptronLogisticregressionMultiLayerPerceptronMultiLayerPerceptronclassificationMultiLayerPerceptronfunctionapproximation
Whatareartificialneuralnetworks?Anartificialneuralnetwork(ANN)isaninformationprocessingsystemwhoseoperatingmechanismisinspiredbybiologicalneuralcircuits.Thankstotheircharacteristics,neuralnetworksaretheprotagonistsofarealrevolutioninmachinelearningsystemsandmorespecificallyinthecontextofartificialintelligence.AnANNpossessesmanysimpleprocessingunitsvariouslyconnectedtoeachother,accordingtovariousarchitectures.IfwelookattheschemaofanANNreportedlater,itcanbeseenthatthehiddenunitscommunicatewiththeexternallayer,bothininputandoutput,whiletheinputandoutputunitscommunicateonlywiththehiddenlayerofthenetwork.
Eachunitornodesimulatestheroleoftheneuroninbiologicalneuralnetworks.Eachnode,saidartificialneuron,hasaverysimpleoperation:itbecomesactiveifthetotalquantityofsignalthatitreceivesexceedsitsactivationthreshold,definedbytheso-calledactivationfunction.Ifanodebecomesactive,itemitsasignalthatistransmittedalongthetransmissionchannelsuptotheotherunittowhichitisconnected.Eachconnectionpointactsasafilterthatconvertsthemessageintoaninhibitoryorexcitatorysignal,increasingordecreasingtheintensityaccordingtotheirindividualcharacteristics.Theconnectionpointssimulatethebiologicalsynapsesandhavethefundamentalfunctionofweighingtheintensityofthetransmittedsignals,bymultiplyingthembytheweightswhosevaluesdependontheconnectionitself.
ANNschematicdiagram
NeuralnetworkarchitecturesThewaytoconnectthenodes,thetotalnumberoflayers,thatisthelevelsofnodesbetweeninputandoutputsandthenumberofneuronsperlayer-allthesedefinethearchitectureofaneuralnetwork.Forexample,inmultilayernetworks(weintroducetheseinthesecondpartofthischapter),onecanidentifytheartificialneuronsoflayerssuchthat:
EachneuronisconnectedwithallthoseofthenextlayerTherearenoconnectionsbetweenneuronsbelongingtothesamelayerThenumberoflayersandofneuronsperlayerdependsontheproblemtobesolved
Nowwestartourexplorationofneuralnetworkmodels,introducingthemostsimpleneuralnetworkmodel:theSingleLayerPerceptronortheso-calledRosenblatt'sPerceptron.
SingleLayerPerceptronTheSingleLayerPerceptronwasthefirstneuralnetworkmodel,proposedin1958byFrankRosenblatt.Inthismodel,thecontentofthelocalmemoryoftheneuronconsistsofavectorofweights,W=(w1,w2,......,wn).ThecomputationisperformedoverthecalculationofasumoftheinputvectorX=(x1,x2,......,xn),eachofwhichismultipliedbythecorrespondingelementofthevectoroftheweights;thenthevalueprovidedintheoutput(thatis,aweightedsum)willbetheinputofanactivationfunction.Thisfunctionreturns1iftheresultisgreaterthanacertainthreshold,otherwiseitreturns-1.Inthefollowingfigure,theactivationfunctionistheso-calledsignfunction:
+1x>0
sign(x)=
−1otherwise
Itispossibletouseotheractivationfunctions,preferablynon-linear(suchasthesigmoidfunction,whichwewillseeinthenextsection).Thelearningprocedureofthenetisiterative:itslightlymodifiesforeachlearningcycle(calledepoch)thesynapticweightsbyusingaselectedsetcalledatrainingset.Ateachcycle,theweightsmustbemodifiedtominimizeacostfunction,whichisspecifictotheproblemunderconsideration.Finally,whentheperceptronhasbeentrainedonthetrainingset,itwillbetestedonotherinputs(thetestset)inordertoverifyitscapacityforgeneralization.
SchemaofaRosemblatt'sPerceptron
LetusnowseehowtoimplementasinglelayerneuralnetworkforanimageclassificationproblemusingTensorFlow.
ThelogisticregressionThisalgorithmhasnothingtodowiththecanonicallinearregressionwesawinChapter3,StartingwithMachineLearning,butitisanalgorithmthatallowsustosolveproblemsofsupervisedclassification.Infact,toestimatethedependentvariable,nowwemakeuseoftheso-calledlogisticfunctionorsigmoid.Itispreciselybecauseofthisfeaturewecallthisalgorithmlogisticregression.Thesigmoidfunctionhasthefollowingpattern:
Sigmoidfunction
Aswecansee,thedependentvariabletakesvaluesstrictlybetween0and1thatispreciselywhatservesus.Inthecaseoflogisticregression,wewantourfunctiontotelluswhat'stheprobabilityofbelongingtoaparticularelementofourclass.We
recallagainthatthesupervisedlearningbytheneuralnetworkisconfiguredasaniterativeprocessofoptimizationoftheweights;thesearethenmodifiedonthebasisofthenetwork'sperformanceofthetrainingset.Indeedtheaimistominimizethelossfunction,whichindicatesthedegreetowhichthebehaviorofthenetworkdeviatesfromthedesiredone.Theperformanceofthenetworkisthenverifiedonatestset,consistingofimagesotherthanthoseoftrained.
Thebasicstepsoftrainingthatwe'regoingtoimplementareasfollows:
Theweightsareinitializedwithrandomvaluesatthebeginningofthetraining.Foreachelementofthetrainingsettheerroriscalculated,thatis,thedifferencebetweenthedesiredoutputandtheactualoutput.Thiserrorisusedtoadjusttheweights.Theprocessisrepeated,resubmittingtothenetwork,inarandomorder,alltheexamplesofthetrainingsetuntiltheerrormadeontheentiretrainingsetisnotlessthanacertainthreshold,oruntilthemaximumnumberofiterationsisreached.
LetusnowseeindetailhowtoimplementthelogisticregressionwithTensorFlow.TheproblemwewanttosolveistoclassifyimagesfromtheMNISTdataset,whichasexplainedintheChapter3,StartingwithMachineLearningisadatabaseofhandwrittennumbers.
TensorFlowimplementationToimplementTensorFlow,weneedtoperformthefollowingsteps:
1. Firstofall,wehavetoimportallthenecessarylibraries:
importinput_data
importtensorflowastf
importmatplotlib.pyplotasplt
2. Weusetheinput_data.readfunctionintroducedinChapter3,StartingwithMachineLearning,intheMNISTdatasetsection,touploadtheimagestoourproblem:
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
3. Thenwesetthetotalnumberofepochsforthetrainingphase:
training_epochs=25
4. Wemustalsodefineotherparametersthatarenecessarytobuildamodel:
learning_rate=0.01
batch_size=100
display_step=1
5. Nowwemovetotheconstructionofthemodel.
BuildingthemodelDefinexastheinputtensor;itrepresentstheMNISTdataimageofsize28x28=784pixels:
x=tf.placeholder("float",[None,784])
Werecallthatourproblemconsistsofassigningaprobabilityvalueforeachofthepossibleclassesofmembership(thenumbersfrom0to9).Attheendofthiscalculation,wewilluseaprobabilitydistribution,whichgivesusthevalueofwhatisconfidentwithourprediction.
Sotheoutputwe'regoingtogetwillbeanoutputtensorwith10probabilities,eachonecorrespondingtoadigit(ofcoursethesumofprobabilitiesmustbeone):
y=tf.placeholder("float",[None,10])
Toassignprobabilitiestoeachimage,wewillusetheso-calledsoftmaxactivationfunction.
Thesoftmaxfunctionisspecifiedintwomainsteps:
CalculatetheevidencethatacertainimagebelongstoaparticularclassConverttheevidenceintoprobabilitiesofbelongingtoeachofthe10possibleclasses
Toevaluatetheevidence,wefirstdefinetheweightsinputtensorasW:
W=tf.Variable(tf.zeros([784,10]))
Foragivenimage,wecanevaluatetheevidenceforeachclassibysimplymultiplyingthetensorWwiththeinputtensorx.UsingTensorFlow,weshouldhavesomethinglikethefollowing:
evidence=tf.matmul(x,W)
Ingeneral,themodelsincludeanextraparameterrepresentingthebias,whichindicatesacertaindegreeofuncertainty.Inourcase,thefinalformulafortheevidenceisasfollows:
evidence=tf.matmul(x,W)+b
Itmeansthatforeveryi(from0to9)wehaveaWimatrixelements784(28×
28),whereeachelementjofthematrixismultipliedbythecorrespondingcomponentjoftheinputimage(784parts)isaddedandthecorrespondingbiaselementbi.
Sotodefinetheevidence,wemustdefinethefollowingtensorofbiases:
b=tf.Variable(tf.zeros([10]))
Thesecondstepistofinallyusethesoftmaxfunctiontoobtaintheoutputvectorofprobabilities,namelyactivation:
activation=tf.nn.softmax(tf.matmul(x,W)+b)
TensorFlow'stf.nn.softmaxfunctionprovidesaprobability-basedoutputfromtheinputevidencetensor.Onceweimplementthemodel,wecanspecifythenecessarycodetofindtheweightsWandbiasesbnetworkthroughtheiterativetrainingalgorithm.Ineachiteration,thetrainingalgorithmtakesthetrainingdata,appliestheneuralnetwork,andcomparestheresultwiththeexpected.
Note
TensorFlowprovidesmanyotheractivationfunctions.Seehttps://www.tensorflow.org/versions/r0.8/api_docs/index.htmlforbetterreferences.
Inordertotrainourmodelandknowwhenwehaveagoodone,wemustdefinehowtodefinetheaccuracyofourmodel.OurgoalistotrytogetvaluesofparametersWandbthatminimizethevalueofthemetricthatindicateshowbadthemodelis.
Differentmetricscalculateddegreeoferrorbetweenthedesiredoutputandthetrainingdataoutputs.AcommonmeasureoferroristhemeansquarederrorortheSquaredEuclideanDistance.However,therearesomeresearchfindingsthatsuggesttouseothermetricstoaneuralnetworklikethis.
Inthisexample,weusetheso-calledcross-entropyerrorfunction.Itisdefinedas:
cross_entropy=y*tf.lg(activation)
Inordertominimizecross_entropy,wecanusethefollowingcombinationoftf.reduce_meanandtf.reduce_sumtobuildthecostfunction:
cost=tf.reduce_mean\
(-tf.reduce_sum\
(cross_entropy,reduction_indices=1))
Thenwemustminimizeitusingthegradientdescentoptimizationalgorithm:
optimizer=tf.train.GradientDescentOptimizer\
(learning_rate).minimize(cost)
Fewlinesofcodetobuildaneuralnetmodel!
LaunchthesessionIt'stimetobuildthesessionandlaunchourneuralnetmodel.
Wefixthefollowingliststovisualizethetrainingsession:
avg_set=[]
epoch_set=[]
ThenweinitializetheTensorFlowvariables:
init=tf.initialize_all_variables()
Startthesession:
withtf.Session()assess:
sess.run(init)
Asexplained,eachepochisatrainingcycle:
forepochinrange(training_epochs):
avg_cost=0.
total_batch=int(mnist.train.num_examples/batch_size)
Thenweloopoverallthebatches:
foriinrange(total_batch):
batch_xs,batch_ys=\
mnist.train.next_batch(batch_size)
Fitthetrainingusingthebatchdata:
sess.run(optimizer,feed_dict={x:batch_xs,y:batch_ys})
Computetheaveragelossrunningthetrain_stepfunctionwiththegivenimagevalues(x)andtherealoutput(y_):
avg_cost+=sess.run\
(cost,feed_dict={x:batch_xs,\
y:batch_ys})/total_batch
Duringcomputation,wedisplayalogperepochstep:
ifepoch%display_step==0:
print"Epoch:",\
'%04d'%(epoch+1),\
"cost=","{:.9f}".format(avg_cost)
print"Trainingphasefinished"
Let'sgettheaccuracyofourmode.Itiscorrectiftheindexwiththehighestyvalueisthesameasintherealdigitvectorthemeanofthecorrect_predictiongivesustheaccuracy.Weneedtoruntheaccuracyfunctionwithourtestset(mnist.test).
Weusethekeyimagesandlabelsforxandy:
correct_prediction=tf.equal\
(tf.argmax(activation,1),\
tf.argmax(y,1))
accuracy=tf.reduce_mean\
(tf.cast(correct_prediction,"float"))
print"MODELaccuracy:",accuracy.eval({x:mnist.test.images,\
y:mnist.test.labels})
TestevaluationWepreviouslyshowedthetrainingphaseandforeachepochwehaveprintedtherelativecostfunction:
Python2.7.10(default,Oct142015,16:09:02)[GCC5.2.120151010]
onlinux2Type"copyright","credits"or"license()"formore
information.>>>=======================RESTART
============================
>>>
Extracting/tmp/data/train-images-idx3-ubyte.gz
Extracting/tmp/data/train-labels-idx1-ubyte.gz
Extracting/tmp/data/t10k-images-idx3-ubyte.gz
Extracting/tmp/data/t10k-labels-idx1-ubyte.gz
Epoch:0001cost=1.174406662
Epoch:0002cost=0.661956009
Epoch:0003cost=0.550468774
Epoch:0004cost=0.496588717
Epoch:0005cost=0.463674555
Epoch:0006cost=0.440907706
Epoch:0007cost=0.423837747
Epoch:0008cost=0.410590841
Epoch:0009cost=0.399881751
Epoch:0010cost=0.390916621
Epoch:0011cost=0.383320325
Epoch:0012cost=0.376767031
Epoch:0013cost=0.371007620
Epoch:0014cost=0.365922904
Epoch:0015cost=0.361327561
Epoch:0016cost=0.357258660
Epoch:0017cost=0.353508228
Epoch:0018cost=0.350164634
Epoch:0019cost=0.347015593
Epoch:0020cost=0.344140861
Epoch:0021cost=0.341420144
Epoch:0022cost=0.338980592
Epoch:0023cost=0.336655581
Epoch:0024cost=0.334488012
Epoch:0025cost=0.332488823
Trainingphasefinished
Asyoucansee,duringthetrainingphasethecostfunctionisminimized.Attheendofthetest,weshowhowaccuratetheimplementedmodelis:
ModelAccuracy:0.9475
>>>
Finally,usingthefollowinglinesofcode,wecanvisualizethetrainingphaseofthenet:
plt.plot(epoch_set,avg_set,'o',\
label='LogisticRegressionTrainingphase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
Trainingphaseinlogisticregression
Sourcecode#ImportMINSTdata
importinput_data
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
importtensorflowastf
importmatplotlib.pyplotasplt
#Parameters
learning_rate=0.01
training_epochs=25
batch_size=100
display_step=1
#tfGraphInput
x=tf.placeholder("float",[None,784])
y=tf.placeholder("float",[None,10])
#Createmodel
#Setmodelweights
W=tf.Variable(tf.zeros([784,10]))
b=tf.Variable(tf.zeros([10]))
#Constructmodel
activation=tf.nn.softmax(tf.matmul(x,W)+b)
#Minimizeerrorusingcrossentropy
cross_entropy=y*tf.log(activation)
cost=tf.reduce_mean\
(-tf.reduce_sum\
(cross_entropy,reduction_indices=1))
optimizer=tf.train.\
GradientDescentOptimizer(learning_rate).minimize(cost)
#Plotsettings
avg_set=[]
epoch_set=[]
#Initializingthevariables
init=tf.initialize_all_variables()
#Launchthegraph
withtf.Session()assess:
sess.run(init)
#Trainingcycle
forepochinrange(training_epochs):
avg_cost=0.
total_batch=int(mnist.train.num_examples/batch_size)
#Loopoverallbatches
foriinrange(total_batch):
batch_xs,batch_ys=\
mnist.train.next_batch(batch_size)
#Fittrainingusingbatchdata
sess.run(optimizer,\
feed_dict={x:batch_xs,y:batch_ys})
#Computeaverageloss
avg_cost+=sess.run(cost,feed_dict=\
{x:batch_xs,\
y:batch_ys})/total_batch
#Displaylogsperepochstep
ifepoch%display_step==0:
print"Epoch:",'%04d'%(epoch+1),\
"cost=","{:.9f}".format(avg_cost)
avg_set.append(avg_cost)
epoch_set.append(epoch+1)
print"Trainingphasefinished"
plt.plot(epoch_set,avg_set,'o',\
label='LogisticRegressionTrainingphase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
#Testmodel
correct_prediction=tf.equal\
(tf.argmax(activation,1),\
tf.argmax(y,1))
#Calculateaccuracy
accuracy=tf.reduce_mean(tf.cast(correct_prediction,"float"))
print"Modelaccuracy:",accuracy.eval({x:mnist.test.images,\
y:mnist.test.labels})
MultiLayerPerceptronAmorecomplexandefficientarchitectureisthatofMultiLayerPerceptron(MLP).Itissubstantiallyformedfrommultiplelayersofperceptrons,andthereforebythepresenceofatleastonehiddenlayer,thatisnotconnectedeithertotheinputsortotheoutputsofthenetwork:
TheMLParchitecture
Anetworkofthistypeistypicallytrainedusingsupervisedlearning,accordingtotheprinciplesoutlinedinthepreviousparagraph.Inparticular,atypicallearningalgorithmforMLPnetworksistheso-calledbackpropagation'salgorithm.
Note
Thebackpropagationalgorithmisalearningalgorithmforneuralnetworks.Itcomparestheoutputvalueofthesystemwiththedesiredvalue.Onthebasisofthedifferencethuscalculated(namely,theerror),thealgorithmmodifiesthesynapticweightsoftheneuralnetwork,byprogressivelyconvergingthesetofoutputvaluesofthedesiredones.
ItisimportanttonotethatinMLPnetworks,althoughyoudon'tknowthedesiredoutputsoftheneuronsofthehiddenlayersofthenetwork,itisalwayspossibleto
applyasupervisedlearningmethodbasedontheminimizationofanerrorfunctionviatheapplicationofgradient-descenttechniques.
Inthefollowingexample,weshowtheimplementationwithMLPforanimageclassificationproblem(MNIST).
MultiLayerPerceptronclassificationImportthenecessarylibraries:
importinput_data
importtensorflowastf
importmatplotlib.pyplotasplt
Loadtheimagestoclassify:
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
FixsomeparametersfortheMLPmodel:
Learningrateofthenet:
learning_rate=0.001
Theepochs:
training_epochs=20
Thenumberofimagestoclassify:
batch_size=100
display_step=1
Thenumberofneuronsforthefirstlayer:
n_hidden_1=256
Thenumberofneuronsforthesecondlayer:
n_hidden_2=256
Thesizeoftheinput(eachimagehas784pixels):
n_input=784#MNISTdatainput(imgshape:28*28)
Thesizeofoftheoutputclasses:
n_classes=10
Itshouldthereforebenotedthatwhileforagivenapplication,theinputandoutputsizeisperfectlydefined,therearenostrictcriteriaforhowtodefinethenumberof
hiddenlayersandthenumberofneuronsforeachlayer.
Everychoicemustbebasedonexperienceofsimilarapplications,asinourcase:
Whenincreasingthenumberofhiddenlayers,weshouldalsoincreasethesizeofthetrainingsetthatisnecessaryandalsoincreasethenumberofconnectionstobeupdated,duringthelearningphase.Thisresultsinanincreaseinthetrainingtime.Also,iftherearetoomanyneuronsinthehiddenlayer,notonlyaretheremoreweightstobeupdatedbutthenetworkalsohasatendencytolearntoomuchfromthetrainingexamplesset,resultinginapoorgeneralizationability.Butthenifthehiddenneuronsaretoofew,thenetworkisnotabletolearnevenwiththetrainingset.
Buildthemodel
Theinputlayeristhextensor[1×784],whichrepresentstheimagetoclassify:
x=tf.placeholder("float",[None,n_input])
Theoutputtensoryisequaltothenumberofclasses:
y=tf.placeholder("float",[None,n_classes])
Inthemiddle,wehavetwohiddenlayers.Thefirstlayerisconstitutedbythehtensorofweights,whosesizeis[784×256],where256isthetotalnumberofnodesofthelayer:
h=tf.Variable(tf.random_normal([n_input,n_hidden_1]))
Forlayer1,sowehavetodefinetherespectivebiasestensor:
bias_layer_1=tf.Variable(tf.random_normal([n_hidden_1]))
Eachneuronreceivesthepixelsofinputimagetobeclassifiedcombinedwiththehijweightconnectionsandaddedtotherespectivevaluesofthebiasestensor:
layer_1=tf.nn.sigmoid(tf.add(tf.matmul(x,h),bias_layer_1))
Itsendsitsoutputtotheneuronsofthenextlayerthroughtheactivationfunction.Itmustbesaidthatfunctionscanbedifferentfromoneneurontoanother,butinpractice,however,weadoptacommonfeatureforalltheneurons,typicallyofthesigmoidaltype.Sometimestheoutputneuronsareequippedwithalinearactivationfunction.Itisinterestingtonotethattheactivationfunctionsoftheneuronsinthe
hiddenlayerscannotbelinearbecause,inthiscase,theMLPnetworkwouldbeequivalenttoanetworkwithtwolayersandthereforenolongeroftheMLPtype.Thesecondlayermustperformthesamestepsasthefirst.
Thesecondintermediatelayerisrepresentedbytheshapeoftheweightstensor[256×256]:
w=tf.Variable(tf.random_normal([n_hidden_1,n_hidden_2]))
Withthetensorofbiases:
bias_layer_2=tf.Variable(tf.random_normal([n_hidden_2]))
Eachneuroninthissecondlayerreceivesinputsfromtheneuronsoflayer1,combinedwiththeweightWijconnectionsandaddedtotherespectivebiasesoflayer2:
layer_2=tf.nn.sigmoid(tf.add(tf.matmul(layer_1,w),bias_layer_2))
Itsendsitsoutputtothenextlayer,namelytheoutputlayer:
output=tf.Variable(tf.random_normal([n_hidden_2,n_classes]))
bias_output=tf.Variable(tf.random_normal([n_classes]))
output_layer=tf.matmul(layer_2,output)+bias_output
Theoutputlayerreceivesasinputn-stimuli(256)comingfromlayer2,whichisconvertedtotherespectiveclassesofprobabilityforeachnumber.
Asforthelogisticregression,wethendefinethecostfunction:
cost=tf.reduce_mean\
(tf.nn.softmax_cross_entropy_with_logits\
(output_layer,y))
TheTensorFlowfunctiontf.nn.softmax_cross_entropy_with_logitscomputesthecostforasoftmaxlayer.Itisonlyusedduringtraining.Thelogitsaretheunnormalizedlogprobabilitiesoutputthemodel(thevaluesoutputbeforethesoftmaxnormalizationisappliedtothem).
Thecorrespondingoptimizerthatminimizesthecostfunctionis:
optimizer=tf.train.AdamOptimizer\
(learning_rate=learning_rate).minimize(cost)
tf.train.AdamOptimizerusesKingmaandBa'sAdamalgorithmtocontrolthelearningrate.Adamoffersseveraladvantagesoverthesimpletf.train.GradientDescentOptimizer.Infact,itusesalargereffectivestepsize,andthealgorithmwillconvergetothisstepsizewithoutfinetuning.
Asimpletf.train.GradientDescentOptimizercouldequallybeusedinyourMLP,butwouldrequiremorehyperparametertuningbeforeitcouldconvergeasquickly.
Note
TensorFlowprovidestheoptimizerbaseclasstocomputegradientsforalossandapplygradientstovariables.ThisclassdefinestheAPItoaddopstotrainamodel.Youneverusethisclassdirectly,butinsteadinstantiateoneofitssubclasses.Seehttps://www.tensorflow.org/versions/r0.8/api_docs/python/train.html#Optimizertoseetheoptimizerimplemented.
Launchthesession
Thefollowingarethestepstolaunchthesession:
1. Plotthesettings:
avg_set=[]
epoch_set=[]
2. Initializethevariables:
init=tf.initialize_all_variables()
3. Launchthegraph:
withtf.Session()assess:
sess.run(init)
4. Definethetrainingcycle:
forepochinrange(training_epochs):
avg_cost=0.
total_batch=int(mnist.train.num_examples/batch_size)
5. Loopoverallthebatches(100):
foriinrange(total_batch):
batch_xs,batch_ys=
mnist.train.next_batch(batch_size)
6. Fittrainingusingthebatchdata:
sess.run(optimizer,feed_dict={x:batch_xs,y:
batch_ys})
7. Computetheaverageloss:
avg_cost+=sess.run(cost,feed_dict={x:batch_xs,\
y:batch_ys})/total_batch
Displaylogsperepochstep
ifepoch%display_step==0:
print"Epoch:",'%04d'%(epoch+1),\
"cost=","{:.9f}".format(avg_cost)
avg_set.append(avg_cost)
epoch_set.append(epoch+1)
print"Trainingphasefinished"
8. Withtheselinesofcodes,weplotthetrainingphase:
plt.plot(epoch_set,avg_set,'o',label='MLPTrainingphase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
9. Finally,wecantesttheMLPmodel:
correct_prediction=tf.equal(tf.argmax(output_layer,1),\
tf.argmax(y,1))
evaluatingitsaccuracy
accuracy=tf.reduce_mean(tf.cast(correct_prediction,
"float"))
print"ModelAccuracy:",accuracy.eval({x:
mnist.test.images,\
y:mnist.test.labels})
10. Hereistheoutputresultafter20epochs:
Python2.7.10(default,Oct142015,16:09:02)[GCC5.2.1
20151010]onlinux2Type"copyright","credits"or"license()"for
moreinformation.
>>>==========================RESTART
==============================
>>>
Succesfullydownloadedtrain-images-idx3-ubyte.gz9912422bytes.
Extracting/tmp/data/train-images-idx3-ubyte.gz
Succesfullydownloadedtrain-labels-idx1-ubyte.gz28881bytes.
Extracting/tmp/data/train-labels-idx1-ubyte.gz
Succesfullydownloadedt10k-images-idx3-ubyte.gz1648877bytes.
Extracting/tmp/data/t10k-images-idx3-ubyte.gz
Succesfullydownloadedt10k-labels-idx1-ubyte.gz4542bytes.
Extracting/tmp/data/t10k-labels-idx1-ubyte.gz
Epoch:0001cost=1.723947845
Epoch:0002cost=0.539266024
Epoch:0003cost=0.362600502
Epoch:0004cost=0.266637279
Epoch:0005cost=0.205345784
Epoch:0006cost=0.159139332
Epoch:0007cost=0.125232637
Epoch:0008cost=0.098572041
Epoch:0009cost=0.077509963
Epoch:0010cost=0.061127526
Epoch:0011cost=0.048033808
Epoch:0012cost=0.037297983
Epoch:0013cost=0.028884999
Epoch:0014cost=0.022818390
Epoch:0015cost=0.017447586
Epoch:0016cost=0.013652348
Epoch:0017cost=0.010417282
Epoch:0018cost=0.008079228
Epoch:0019cost=0.006203546
Epoch:0020cost=0.004961207
Trainingphasefinished
ModelAccuracy:0.9775
>>>
Weshowthetrainingphaseinthefollowingfigure:
TrainingphaseinMultiLayerPerceptron
Sourcecode#ImportMINSTdata
importinput_data
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
importtensorflowastf
importmatplotlib.pyplotasplt
#Parameters
learning_rate=0.001
training_epochs=20
batch_size=100
display_step=1
#NetworkParameters
n_hidden_1=256#1stlayernumfeatures
n_hidden_2=256#2ndlayernumfeatures
n_input=784#MNISTdatainput(imgshape:28*28)
n_classes=10#MNISTtotalclasses(0-9digits)
#tfGraphinput
x=tf.placeholder("float",[None,n_input])
y=tf.placeholder("float",[None,n_classes])
#weightslayer1
h=tf.Variable(tf.random_normal([n_input,n_hidden_1]))
#biaslayer1
bias_layer_1=tf.Variable(tf.random_normal([n_hidden_1]))
#layer1
layer_1=tf.nn.sigmoid(tf.add(tf.matmul(x,h),bias_layer_1))
#weightslayer2
w=tf.Variable(tf.random_normal([n_hidden_1,n_hidden_2]))
#biaslayer2
bias_layer_2=tf.Variable(tf.random_normal([n_hidden_2]))
#layer2
layer_2=tf.nn.sigmoid(tf.add(tf.matmul(layer_1,w),bias_layer_2))
#weightsoutputlayer
output=tf.Variable(tf.random_normal([n_hidden_2,n_classes]))
#biaroutputlayer
bias_output=tf.Variable(tf.random_normal([n_classes]))
#outputlayer
output_layer=tf.matmul(layer_2,output)+bias_output
#costfunction
cost=tf.reduce_mean\
(tf.nn.softmax_cross_entropy_with_logits(output_layer,y))
#optimizer
optimizer=tf.train.AdamOptimizer\
(learning_rate=learning_rate).minimize(cost)
#Plotsettings
avg_set=[]
epoch_set=[]
#Initializingthevariables
init=tf.initialize_all_variables()
#Launchthegraph
withtf.Session()assess:
sess.run(init)
#Trainingcycle
forepochinrange(training_epochs):
avg_cost=0.
total_batch=int(mnist.train.num_examples/batch_size)
#Loopoverallbatches
foriinrange(total_batch):
batch_xs,batch_ys=mnist.train.next_batch(batch_size)
#Fittrainingusingbatchdata
sess.run(optimizer,feed_dict={x:batch_xs,y:batch_ys})
#Computeaverageloss
avg_cost+=sess.run(cost,\
feed_dict={x:batch_xs,\
y:batch_ys})/total_batch
#Displaylogsperepochstep
ifepoch%display_step==0:
print"Epoch:",'%04d'%(epoch+1),\
"cost=","{:.9f}".format(avg_cost)
avg_set.append(avg_cost)
epoch_set.append(epoch+1)
print"Trainingphasefinished"
plt.plot(epoch_set,avg_set,'o',label='MLPTrainingphase')
plt.ylabel('cost')
plt.xlabel('epoch')
plt.legend()
plt.show()
#Testmodel
correct_prediction=tf.equal(tf.argmax(output_layer,1),\
tf.argmax(y,1))
#Calculateaccuracy
accuracy=tf.reduce_mean(tf.cast(correct_prediction,"float"))
print"ModelAccuracy:",accuracy.eval({x:mnist.test.images,\
y:mnist.test.labels})
MultiLayerPerceptronfunctionapproximationInthefollowingexample,weimplementanMLPnetworkthatwillbeabletolearnthetrendofanarbitraryfunctionf(x).Inthetrainingphasethenetworkwillhavetolearnfromaknownsetofpoints,thatisxandf(x),whileinthetestphasethenetworkwilldeductthevaluesoff(x)onlyfromthexvalues.
Thisverysimplenetworkwillbebuiltbyasinglehiddenlayer.
Importthenecessarylibraries:
importtensorflowastf
importnumpyasnp
importmath,random
importmatplotlib.pyplotasplt
Webuildthedatamodel.Thefunctiontobelearnedwillfollowthetrendofthecosinefunction,evaluatedfor1000pointstowhichweaddaverylittlerandomerror(noise)toreproducearealcase:
NUM_points=1000
np.random.seed(NUM_points)
function_to_learn=lambdax:np.cos(x)+\
0.1*np.random.randn(*x.shape)
OurMLPnetworkwillbeformedbyahiddenlayerof10neurons:
layer_1_neurons=10
Thenetworklearnsfor100pointsatatimetoatotalof1500learningcycles(epochs):
batch_size=100
NUM_EPOCHS=1500
Finally,weconstructthetrainingsetandthetestset:
all_xcontienetuttiipunti
all_x=np.float32(np.random.uniform\
(-2*math.pi,2*math.pi,\
(1,NUM_points))).T
np.random.shuffle(all_x)
train_size=int(900)
Thefirst900pointsareinthetrainingset:
x_training=all_x[:train_size]
y_training=function_to_learn(x_training)
Thelast100willbeinthevalidationset:
x_validation=all_x[train_size:]
y_validation=function_to_learn(x_validation)
Usingmatplotlib,wedisplaythesesets:
plt.figure(1)
plt.scatter(x_training,y_training,c='blue',label='train')
plt.scatter(x_validation,y_validation,c='red',label='validation')
plt.legend()
plt.show()
Trainingandvalidationset
Buildthemodel
First,wecreatetheplaceholdersfortheinputtensor(X)andtheoutputtensor(Y):
X=tf.placeholder(tf.float32,[None,1],name="X")
Y=tf.placeholder(tf.float32,[None,1],name="Y")
Thenwebuildthehiddenlayerof[1x10]dimensions:
w_h=tf.Variable(tf.random_uniform([1,layer_1_neurons],\
minval=-1,maxval=1,\
dtype=tf.float32))
b_h=tf.Variable(tf.zeros([1,layer_1_neurons],\
dtype=tf.float32))
ItreceivestheinputvaluefromtheXinputtensor,combinedwiththeweightw_hijconnectionsandaddedwiththerespectivebiasesoflayer1:
h=tf.nn.sigmoid(tf.matmul(X,w_h)+b_h)
Theoutputlayerisa[10x1]tensor:
w_o=tf.Variable(tf.random_uniform([layer_1_neurons,1],\
minval=-1,maxval=1,\
dtype=tf.float32))
b_o=tf.Variable(tf.zeros([1,1],dtype=tf.float32))
Eachneuroninthissecondlayerreceivesinputsfromtheneuronsoflayer1,combinedwithweightw_oijconnectionsandaddedtogetherwiththerespectivebiasesoftheoutputlayer:
model=tf.matmul(h,w_o)+b_o
Wethendefineouroptimizerforthenewlydefinedmodel:
train_op=tf.train.AdamOptimizer().minimize\
(tf.nn.l2_loss(model-Y))
Wealsonotethatinthiscase,thecostfunctionadoptedisthefollowing:
tf.nn.l2_loss(model-Y)
Thetf.nn.l2_lossfunctionisaTensorFlowthatcomputeshalftheL2normofa
tensorwithoutthesqrt,thatis,theoutputfortheprecedingfunctionisasfollows:
output=sum((model-Y)**2)/2
Thetf.nn.l2_lossfunctioncanbeaviablecostfunctionforourexample.
Launchthesession
Let'sbuildtheevaluationgraph:
sess=tf.Session()
sess.run(tf.initialize_all_variables())
Nowwecanlaunchthelearningsession:
errors=[]
foriinrange(NUM_EPOCHS):
forstart,endinzip(range(0,len(x_training),batch_size),\
range(batch_size,\
len(x_training),batch_size)):
sess.run(train_op,feed_dict={X:x_training[start:end],\
Y:y_training[start:end]})
cost=sess.run(tf.nn.l2_loss(model-y_validation),\
feed_dict={X:x_validation})
errors.append(cost)
ifi%100==0:print"epoch%d,cost=%g"%(i,cost)
Runningthisnetworkfor1400epochs,we'llseetheerrorprogressivelyreducingandeventuallyconverging:
Python2.7.10(default,Oct142015,16:09:02)[GCC5.2.120151010]
onlinux2Type"copyright","credits"or"license()"formore
information.
>>>=======================RESTART============================
>>>
epoch0,cost=55.9286
epoch100,cost=22.0084
epoch200,cost=18.033
epoch300,cost=14.0481
epoch400,cost=9.74721
epoch500,cost=5.83419
epoch600,cost=3.05434
epoch700,cost=1.53706
epoch800,cost=0.91719
epoch900,cost=0.726675
epoch1000,cost=0.668316
epoch1100,cost=0.633737
epoch1200,cost=0.608306
epoch1300,cost=0.590429
epoch1400,cost=0.574602
>>>
Thefollowinglinesofcodeallowustodisplayhowthecostchangesintherunningepochs:
plt.plot(errors,label='MLPFunctionApproximation')
plt.xlabel('epochs')
plt.ylabel('cost')
plt.legend()
plt.show()
TrainingphaseinMultiLayerPerceptron
SummaryInthischapter,weintroducedartificialneuralnetworks.Anartificialneuronisamathematicalmodelthattosomeextentmimicsthepropertiesofalivingneurons.Eachneuronofthenetworkhasaverysimpleoperationwhichconsistsofbecomingactiveifthetotalamountofsignalthatitreceivesexceedsalookattheactivationthreshold.Thelearningprocessistypicallysupervised:theneuralnetusesatrainingsettoinfertherelationshipbetweentheinputandthecorrespondingoutput,whilethelearningalgorithmmodifiestheweightsofthenetinordertominimizeacostfunctionthatrepresentstheforecasterrorrelatingtothetrainingset.Ifthetrainingissuccessful,theneuralnetwillbeabletomakeforecastsevenwheretheoutputisnotknownapriori.Inthischapterweimplemented,usingTensorFlow,someexamplesinvolvingneuralnetworks.WehaveseenneuralnetsusedtosolveclassificationandregressionsproblemsasthelogisticregressionalgorithminaclassificationproblemusingtheRosemblatt'sPerceptron.Attheendofthechapter,weintroducedtheMultiLayerPerceptronarchitecture,whichwehaveseeninactionpriortotheimplementationofanimageclassifier,thenforasimulatorofmathematicalfunctions.
Inthenextchapter,wefinallyintroducedeeplearningmodels;wewillexamineandimplementmorecomplexneuralnetworkarchitectures,suchasaconvolutionalneuralnetworkandarecurrentneuralnetwork.
Chapter5.DeepLearningInthischapter,wewillcoverthefollowingtopics:
DeeplearningtechniquesConvolutionalneuralnetwork(CNN)
CNNarchitectureTensorFlowimplementationofaCNN
Recurrentneuralnetwork(RNN)RNNarchitectureNaturalLanguageProcessingwithTensorFlow
DeeplearningtechniquesDeeplearningtechniquesareacrucialstepforwardtakenbythemachinelearningresearchersinrecentdecades,havingprovidedsuccessfulresultseverseenbeforeinmanyapplications,suchasimagerecognitionandspeechrecognition.
Thereareseveralreasonsthatledtodeeplearningbeingdevelopedandplacedatthecenterofattentioninthescopeofmachinelearning.Oneofthesereasonsisrepresentedbytheprogressinhardware,withtheavailabilityofnewprocessors,suchasgraphicsprocessingunits(GPUs),whichhavegreatlyreducedthetimeneededfortrainingnetworks,loweringthem10/20times.
Anotherreasoniscertainlytheincreasingeaseoffindingevermorenumerousdatasetsonwhichtotrainasystem,neededtotrainarchitecturesofacertaindepthandwithhighdimensionalityoftheinputdata.Deeplearningconsistsofasetofmethodsthatallowasystemtoobtainahierarchicalrepresentationofthedataonmultiplelevels.Thisisachievedbycombiningsimpleunits(notlinear),eachofwhichtransformstherepresentationatitsownlevel,startingfromtheinputlevel,toarepresentationatahigher,levelslightlymoreabstract.Withasufficientnumberofthesetransformations,considerablycomplexinput-outputfunctionscanbelearned.
Withreferencetoaclassificationproblem,forexample,thehighestlevelsofrepresentation,highlighttheaspectsoftheinputdatathatarerelevantfortheclassification,suppressingtheonesthathavenoeffectontheclassificationpurposes.
Hierarchicalfeatureextractioninanimageclassificationsystem
Theprecedingschemedescribesthefeaturesoftheimageclassificationsystem(afacerecognizer):eachblockgraduallyextractsthefeaturesoftheinputimage,goingtoprocessdataalreadypre-processedfromthepreviousblocks,extractingincreasinglycomplexfeaturesoftheinputimage,andthusbuildingthehierarchicaldatarepresentationthatcharacterizesadeeplearning-basedsystem.
Apossiblerepresentationofthefeaturesofthehierarchycouldbeasfollows:
pixel-->edge-->texture-->motif-->part-->object
Inatextrecognitionproblem,however,thehierarchicalrepresentationcanbestructuredasfollows:
character-->word-->wordgroup-->clause-->sentence-->story
Adeeplearningarchitectureis,therefore,amulti-levelarchitecture,consistingofsimpleunits,allsubjecttotraining,manyofwhichcarrynon-lineartransformations.Eachunittransformsitsinputtoimproveitspropertiestoselectandamplifyonlytherelevantaspectsforclassificationpurposes,anditsinvariance,namelyitspropensitytoignoretheirrelevantaspectsandnegligible.
Withmultiplelevelsofnon-lineartransformations,therefore,withadepthapproximatelybetween5and20levels,adeeplearningsystemcanlearnandimplementextremelyintricateandcomplexfunctions,simultaneouslyverysensitivetothesmallestrelevantdetails,andextremelyinsensitiveandindifferenttolargevariationsofirrelevantaspectsoftheinputdatawhichcanbe,inthecaseofobjectrecognition:image'sbackground,brightness,orthepositionoftherepresentedobject.
Thefollowingsectionswillillustrate,withtheaidofTensorFlow,twoimportanttypesofdeepneuralnetworks:theconvolutionalneuralnetworks(CNNs),mainlyaddressedtotheclassificationproblems,andthentherecurrentneuralnetworks(RNNs),targetingNaturalLanguageProcessing(NLP)issues.
ConvolutionalneuralnetworksConvolutionalneuralnetworks(CNNs)areaparticulartypeofneuralnetwork-orienteddeeplearningthathaveachievedexcellentresultsinmanypracticalapplications,inparticulartheobjectrecognitioninimages.
Infact,CNNsaredesignedtoprocessdatarepresentedintheformofmultiplearrays,forexample,thecolorimages,representablebymeansofthreetwo-dimensionalarrayscontainingthepixel'scolorintensity.ThesubstantialdifferencebetweenCNNsandordinaryneuralnetworksisthattheformeroperatedirectlyontheimageswhilethelatteronfeaturesextractedfromthem.TheinputofaCNN,therefore,unlikethatofanordinaryneuralnetwork,willbetwo-dimensional,andthefeatureswillbethepixelsoftheinputimage.
TheCNNisthedominantapproachforalmostalltheproblemsofrecognition.Thespectacularperformanceofferedbynetworksofthistypehaveinfactpromptedthebiggestcompaniesintechnology,suchasGoogleandFacebook,toinvestinresearchanddevelopmentprojectsfornetworksofthiskind,andtodevelopanddistributeproductsimagerecognitionbasedonCNNs.
CNNarchitecture
TheCNNusethreebasicideas:localreceptivefields,convolution,andpooling.
Inconvolutionalnetworks,weconsiderinputassomethingsimilartowhatisshowninthefollowingfigure:
Inputneurons
OneoftheconceptsbehindCNNsislocalconnectivity.CNNs,infact,utilizespatial
correlationsthatmayexistwithintheinputdata.Eachneuronofthefirstsubsequentlayerconnectsonlysomeoftheinputneurons.Thisregioniscalledlocalreceptivefield.Inthefollowingfigure,itisrepresentedbytheblack5x5squarethatconvergestoahiddenneuron:
Frominputtohiddenneurons
Thehiddenneuron,ofcourse,willonlyprocesstheinputdatainsideofitsreceptivefield,notrealizingthechangesoutsideofthat.However,itiseasytoseethat,bysuperimposingseverallayers,thatarelocallyconnected,levelingupyouwillhaveunitsthatprocessmoreandmoreglobaldatacomparedtoinput,inaccordancewiththebasicprincipleofdeeplearning,tobringtheperformancetoalevelofabstractionthatisalwaysgrowing.
Note
Thereasonforthelocalconnectivityresidesinthefactthatindataofarraysform,suchastheimages,thevaluesareoftenhighlycorrelated,formingdistinctgroupsofdatathatcanbeeasilyidentified.
Eachconnectionlearnsaweight(soitwillget5x5=25),insteadofthehiddenneuronwithanassociatedconnectinglearnsatotalbias,thenwearegoingtoconnecttheregionstoindividualneuronsbyperformingashiftfromtimetotime,asinthefollowingfigures:
Theconvolutionoperation
Thisoperationiscalledconvolution.Doingso,ifwehaveanimageof28x28inputsand5x5regions,wewillget24x24neuronsinthehiddenlayer.Wesaidthateachneuronhasabiasand5x5weightsconnectedtotheregion:wewillusetheseweightsandbiasesforall24x24neurons.Thismeansthatalltheneuronsinthefirsthiddenlayerwillrecognizethesamefeatures,justplaceddifferentlyintheinputimage.Forthisreason,themapofconnectionsfromtheinputlayertothehiddenfeaturemapiscalledsharedweightsandbiasiscalledsharedbias,sincetheyareinfactshared.
Obviously,weneedtorecognizeanimageofmorethanamapoffeatures,soacompleteconvolutionallayerismadefrommultiplefeaturemaps.
Multiplefeaturemaps
Intheprecedingfigure,weseethreefeaturemaps;ofcourse,itsnumbercanincreaseinpracticeandyoucangettouseconvolutionallayerswitheven20or40featuremaps.Agreatadvantageinthesharingofweightsandbiasisthesignificantreductionoftheparametersinvolvedinaconvolutionalnetwork.Consideringourexample,foreachfeaturemapweneed25weights(5x5)andabias(shared);thatis26parametersintotal.Assumingwehave20featuremaps,wewillhave520parameterstobedefined.Withafullyconnectednetwork,with784inputneuronsand,forexample,30hiddenlayerneurons,weneed30more784x30biasweights,reachingatotalof23.550parameters.
Thedifferenceisevident.Theconvolutionalnetworksalsousepoolinglayers,whicharelayersimmediatelypositionedaftertheconvolutionallayers;thesesimplifytheoutputinformationofthepreviouslayertoit(theconvolution).Ittakestheinputfeaturemapscomingoutoftheconvolutionallayerandpreparesacondensedfeaturemap.Forexample,wecansaythatthepoolinglayercouldbesummedup,inallitsunits,ina2x2regionofneuronsofthepreviouslayer.
Thistechniqueiscalledpoolingandcanbesummarizedwiththefollowingscheme:
Thepoolingoperationhelpstosimplifytheinformationfromalayertothenext
Obviously,weusuallyhavemorefeaturesmapsandweapplythemaximumpoolingtoeachofthemseparately.
Fromtheinputlayertothesecondhiddenlayer
Sowehavethreefeaturemapsofsize24x24forthefirsthiddenlayer,andthesecondhiddenlayerwillbeofsize12x12,sinceweareassumingthatforeveryunitsummarizea2x2region.
Combiningthesethreeideas,weformacompleteconvolutionalnetwork.Itsarchitecturecanbedisplayedasfollows:
ACNNsarchitecturalschema
Let'ssummarize:therearethe28x28inputneuronsfollowedbyaconvolutionallayerwithalocalreceptivefield5x5and3featuremaps.Weobtainasaresultofahiddenlayerofneurons3x24x24.Thenthereisthemax-poolingappliedto2x2onthe3regionsoffeaturemapsgettingahiddenlayer3x12x12.Thelastlayerisfullyconnected:itconnectsalltheneuronsofthemax-poolinglayertoall10outputneurons,usefultorecognizethecorrespondingoutput.
Thisnetworkwillthenbetrainedbygradientdescentandthebackpropagationalgorithm.
TensorFlowimplementationofaCNN
Inthefollowingexample,wewillseeinactiontheCNNinaproblemofimageclassification.WewanttoshowtheprocessofbuildingaCNNnetwork:whatarethestepstoexecuteandwhatreasoningneedstobedonetorunaproperdimensioningoftheentirenetwork,andofcoursehowtoimplementitwithTensorFlow.
Initializationstep
1. LoadandpreparetheMNISTdata:
importtensorflowastf
importinput_data
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
2. DefinealltheCNNparameters:
learning_rate=0.001
training_iters=100000
batch_size=128
display_step=10
3. MNISTdatainput(eachshapeisof28x28arraypixels):
n_input=784
4. TheMNISTtotalclasses(0-9digits)
n_classes=10
5. Toreducetheoverfitting,weapplythedropouttechnique.Thistermreferstodroppingoutunits(hidden,input,andoutput)inaneuralnetwork.Decidingwhichneuronstoeliminateisrandom;onewayistoapplyaprobability,asweshallseeinourcode.Forthisreason,wedefinethefollowingparameter(tobetuned):
dropout=0.75
6. Definetheplaceholdersfortheinputgraph.ThexplaceholdercontainstheMNISTdatainput(exactly728pixels):
x=tf.placeholder(tf.float32,[None,n_input])
7. Thenwechangetheformof4Dinputimagestoatensor,usingtheTensorFlowreshapeoperator:
_X=tf.reshape(x,shape=[-1,28,28,1])
Thesecondandthirddimensionscorrespondtothewidthandheightoftheimage,whilethelatterdimensionisthetotalnumberofcolorchannels(inourcase1).
Sowecandisplayourinputimageasatwo-dimensionaltensor,ofsize28x28:
Theinputtensorforourproblem
Theoutputtensorwillcontaintheoutputprobabilityforeachdigittoclassify:
y=tf.placeholder(tf.float32,[None,n_classes]).
Firstconvolutionallayer
Eachneuronofthehiddenlayerisconnectedtoasmallsubsetoftheinputtensorofdimension5x5.Thisimpliesthatthehiddenlayerwillhavea24x24size.Wealsodefineandinitializethetensorsofsharedweightsandsharedbias:
wc1=tf.Variable(tf.random_normal([5,5,1,32]))
bc1=tf.Variable(tf.random_normal([32]))
Recallthatinordertorecognizeanimage,weneedmorethanamapoffeatures.Thenumberisjustthenumberoffeaturemapsweareconsideringforthisfirstlayer.Inourcase,theconvolutionallayeriscomposedof32featuremaps.
Thenextstepistheconstructionofthefirstconvolutionlayer,conv1:
conv1=conv2d(_X,wc1,bc1)
Here,conv2disthefollowingfunction:
defconv2d(img,w,b):
returntf.nn.relu(tf.nn.bias_add\
(tf.nn.conv2d(img,w,\
strides=[1,1,1,1],\
padding='SAME'),b))
Forthispurpose,weusedtheTensorFlowtf.nn.conv2dfunction.Itcomputesa2Dconvolutionfromtheinputtensorandthesharedweights.Theresultofthisoperationwillbethenaddedtothebiasesbc1matrix.Forthispurpose,weusedthefunctiontf.nn.conv2dtocomputea2-Dconvolutionfromtheinputtensorandthetensorofsharedweights.Theresultofthisoperationwillbethenaddedtothebiasesbc1matrix.Whiletf.nn.reluistheRelufunction(Rectifiedlinearunit)thatistheusualactivationfunctioninthehiddenlayerofadeepneuralnetwork.
Wewillapplythisactivationfunctiontothereturnvaluethatwehavewiththeconvolutionfunction.Thepaddingvalueis'SAME',whichindicatesthattheoutputtensoroutputwillhavethesamesizeofinputtensor.
Onewaytorepresenttheconvolutionallayer,namelyconv1,isasfollows:
Thefirsthiddenlayer
Aftertheconvolutionoperation,weimposethepoolingstepthatsimplifiestheoutputinformationofthepreviouslycreatedconvolutionallayer.
Inourexample,let'stakea2x2regionoftheconvolutionlayerandwewillsummarizetheinformationateachpointinthepoolinglayer.
conv1=max_pool(conv1,k=2)
Here,forthepoolingoperation,wehaveimplementedthefollowingfunction:
defmax_pool(img,k):
returntf.nn.max_pool(img,\
ksize=[1,k,k,1],\
strides=[1,k,k,1],\
padding='SAME')
Thetf.nn.max_poolfunctionperformsthemaxpoolingontheinput.Ofcourse,weapplythemaxpoolingforeachconvolutionallayer,andtherewillbemanylayersofpoolingandconvolution.Attheendofthepoolingphase,we'llhave12x12x32convolutionalhiddenlayers.
ThenextfigureshowstheCNNslayersafterthepoolingandconvolutionoperation:
TheCNNsafterafirstconvolutionandpoolingoperations
Thelastoperationistoreducetheoverfittingbyapplyingthetf.nn.dropoutTensorFlowoperatorsontheconvolutionallayer.Todothis,wecreateaplaceholderfortheprobability(keep_prob)thataneuron'soutputiskeptduringthedropout:
keep_prob=tf.placeholder(tf.float32)
conv1=tf.nn.dropout(conv1,keep_prob)
Secondconvolutionallayer
Forthesecondhiddenlayer,wemustapplythesameoperationsasthefirstlayer,andsowedefineandinitializethetensorsofsharedweightsandsharedbias:
wc2=tf.Variable(tf.random_normal([5,5,32,64]))
bc2=tf.Variable(tf.random_normal([64]))
Asyoucannote,thissecondhiddenlayerwillhave64featuresfora5x5window,whilethenumberofinputlayerswillbegivenfromthefirstconvolutionalobtainedlayer.Wenextapplyasecondlayertotheconvolutionalconv1tensor,butthistimeweapply64setsof5x5filterseachtothe32conv1layers:
conv2=conv2d(conv1,wc2,bc2)
Itgiveus6414x14arrayswhichwereducewithmaxpoolingto647x7arrays:
conv2=max_pool(conv2,k=2)
Finally,weagainusethedropoutoperation:
conv2=tf.nn.dropout(conv2,keep_prob)
Theresultinglayerisa7x7x64convolutiontensorbecausewestartedfromtheinputtensor12x12andaslidingwindowof5x5,consideringthathasastrideof1.
Buildingthesecondhiddenlayer
Denselyconnectedlayer
Inthisstep,webuildadenselyconnectedlayerthatweusetoprocesstheentireimage.Theweightandbiastensorsareasfollows:
wd1=tf.Variable(tf.random_normal([7*7*64,1024]))
bd1=tf.Variable(tf.random_normal([1024]))
Asyoucannote,thislayerwillbeformedby1024neurons.
Thenwereshapethetensorfromthesecondconvolutionallayerintoabatchofvectors:
dense1=tf.reshape(conv2,[-1,wd1.get_shape().as_list()[0]])
Multiplythistensorbytheweightmatrix,wd1,addthetensorbias,bd1,andapplyaRELUoperation:
dense1=tf.nn.relu(tf.add(tf.matmul(dense1,wd1),bd1))
Wecompletethislayerbyagainusingthedropoutoperator:
dense1=tf.nn.dropout(dense1,keep_prob)
Readoutlayer
Thelastlayerdefinesthetensorswoutandbout:
wout=tf.Variable(tf.random_normal([1024,n_classes]))
bout=tf.Variable(tf.random_normal([n_classes]))
Beforeapplyingthesoftmaxfunction,wemustcalculatetheevidencethattheimagebelongstoacertainclass:
pred=tf.add(tf.matmul(dense1,wout),bout)
Testingandtrainingthemodel
Theevidencemustbeconvertedintoprobabilitiesforeachofthe10possibleclasses(themethodisidenticaltowhatwesawinChapter4,IntroducingNeuralNetworks).Sowedefinethecostfunction,whichevaluatesthequalityofourmodel,byapplyingthesoftmaxfunction:
cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred,
y))
Anditsfunctionoptimization,usingtheTensorFlowAdamOptimizerfunction:
optimizer=
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Thefollowingtensorwillserveintheevaluationphaseofthemodel:
correct_pred=tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_pred,tf.float32))
Launchingthesession
Initializethevariables:
init=tf.initialize_all_variables()
Buildtheevaluationgraph:
withtf.Session()assess:
sess.run(init)
step=1
Let'strainthenetuntiltraining_iters:
whilestep*batch_size<training_iters:
batch_xs,batch_ys=mnist.train.next_batch(batch_size)
Fittrainingusingthebatchdata:
sess.run(optimizer,feed_dict={x:batch_xs,\
y:batch_ys,\
keep_prob:dropout})
ifstep%display_step==0:
Calculatetheaccuracy:
acc=sess.run(accuracy,feed_dict={x:batch_xs,\
y:batch_ys,\
keep_prob:1.})
Calculatetheloss:
loss=sess.run(cost,feed_dict={x:batch_xs,\
y:batch_ys,\
keep_prob:1.})
print"Iter"+str(step*batch_size)+\
",MinibatchLoss="+\
"{:.6f}".format(loss)+\
",TrainingAccuracy="+\
"{:.5f}".format(acc)
step+=1
print"OptimizationFinished!"
Weprinttheaccuracyforthe256MNISTtestimages:
print"TestingAccuracy:",\
sess.run(accuracy,\
feed_dict={x:mnist.test.images[:256],\
y:mnist.test.labels[:256],\
keep_prob:1.})
Runningthecode,wehavethefollowingoutput:
Extracting/tmp/data/train-images-idx3-ubyte.gz
Extracting/tmp/data/train-labels-idx1-ubyte.gz
Extracting/tmp/data/t10k-images-idx3-ubyte.gz
Extracting/tmp/data/t10k-labels-idx1-ubyte.gz
Iter1280,MinibatchLoss=27900.769531,
TrainingAccuracy=0.17188
Iter2560,MinibatchLoss=17168.949219,TrainingAccuracy=0.21094
Iter3840,MinibatchLoss=15000.724609,TrainingAccuracy=0.41406
Iter5120,MinibatchLoss=8000.896484,TrainingAccuracy=0.49219
Iter6400,MinibatchLoss=4587.275391,TrainingAccuracy=0.61719
Iter7680,MinibatchLoss=5949.988281,TrainingAccuracy=0.69531
Iter8960,MinibatchLoss=4932.690430,TrainingAccuracy=0.70312
Iter10240,MinibatchLoss=5066.223633,TrainingAccuracy=0.70312.
...................
....................
Iter81920,MinibatchLoss=442.895020,TrainingAccuracy=0.93750
Iter83200,MinibatchLoss=273.936676,TrainingAccuracy=0.93750
Iter84480,MinibatchLoss=1169.810303,TrainingAccuracy=0.89062
Iter85760,MinibatchLoss=737.561157,TrainingAccuracy=0.90625
Iter87040,MinibatchLoss=583.576965,TrainingAccuracy=0.89844
Iter88320,MinibatchLoss=375.274475,TrainingAccuracy=0.93750
Iter89600,MinibatchLoss=183.815613,TrainingAccuracy=0.94531
Iter90880,MinibatchLoss=410.157867,TrainingAccuracy=0.89844
Iter92160,MinibatchLoss=895.187683,TrainingAccuracy=0.84375
Iter93440,MinibatchLoss=819.893555,TrainingAccuracy=0.89062
Iter94720,MinibatchLoss=460.179779,TrainingAccuracy=0.90625
Iter96000,MinibatchLoss=514.344482,TrainingAccuracy=0.87500
Iter97280,MinibatchLoss=507.836975,TrainingAccuracy=0.89844
Iter98560,MinibatchLoss=353.565735,TrainingAccuracy=0.92188
Iter99840,MinibatchLoss=195.138626,TrainingAccuracy=0.93750
OptimizationFinished!
TestingAccuracy:0.921875
Itprovidesanaccuracyofabout99.2%.Obviously,itdoesnotrepresentthestateoftheart,becausethepurposeoftheexampleistojustseehowtobuildaCNN.Themodelcanbefurtherrefinedtogivebetterresults.
Sourcecode
#ImportMINSTdata
importinput_data
mnist=input_data.read_data_sets("/tmp/data/",one_hot=True)
importtensorflowastf
#Parameters
learning_rate=0.001
training_iters=100000
batch_size=128
display_step=10
#NetworkParameters
n_input=784#MNISTdatainput(imgshape:28*28)
n_classes=10#MNISTtotalclasses(0-9digits)
dropout=0.75#Dropout,probabilitytokeepunits
#tfGraphinput
x=tf.placeholder(tf.float32,[None,n_input])
y=tf.placeholder(tf.float32,[None,n_classes])
#dropout(keepprobability)
keep_prob=tf.placeholder(tf.float32)
#Createmodel
defconv2d(img,w,b):
returntf.nn.relu(tf.nn.bias_add\
(tf.nn.conv2d(img,w,\
strides=[1,1,1,1],\
padding='SAME'),b))
defmax_pool(img,k):
returntf.nn.max_pool(img,\
ksize=[1,k,k,1],\
strides=[1,k,k,1],\
padding='SAME')
#Storelayersweight&bias
#5x5conv,1input,32outputs
wc1=tf.Variable(tf.random_normal([5,5,1,32]))
bc1=tf.Variable(tf.random_normal([32]))
#5x5conv,32inputs,64outputs
wc2=tf.Variable(tf.random_normal([5,5,32,64]))
bc2=tf.Variable(tf.random_normal([64]))
#fullyconnected,7*7*64inputs,1024outputs
wd1=tf.Variable(tf.random_normal([7*7*64,1024]))
#1024inputs,10outputs(classprediction)
wout=tf.Variable(tf.random_normal([1024,n_classes]))
bd1=tf.Variable(tf.random_normal([1024]))
bout=tf.Variable(tf.random_normal([n_classes]))
#Constructmodel
_X=tf.reshape(x,shape=[-1,28,28,1])
#ConvolutionLayer
conv1=conv2d(_X,wc1,bc1)
#MaxPooling(down-sampling)
conv1=max_pool(conv1,k=2)
#ApplyDropout
conv1=tf.nn.dropout(conv1,keep_prob)
#ConvolutionLayer
conv2=conv2d(conv1,wc2,bc2)
#MaxPooling(down-sampling)
conv2=max_pool(conv2,k=2)
#ApplyDropout
conv2=tf.nn.dropout(conv2,keep_prob)
#Fullyconnectedlayer
#Reshapeconv2outputtofitdenselayerinput
dense1=tf.reshape(conv2,[-1,wd1.get_shape().as_list()[0]])
#Reluactivation
dense1=tf.nn.relu(tf.add(tf.matmul(dense1,wd1),bd1))
#ApplyDropout
dense1=tf.nn.dropout(dense1,keep_prob)
#Output,classprediction
pred=tf.add(tf.matmul(dense1,wout),bout)
#Definelossandoptimizer
cost=tf.reduce_mean\
(tf.nn.softmax_cross_entropy_with_logits(pred,y))
optimizer=\
tf.train.AdamOptimizer\
(learning_rate=learning_rate).minimize(cost)
#Evaluatemodel
correct_pred=tf.equal(tf.argmax(pred,1),tf.argmax(y,1))
accuracy=tf.reduce_mean(tf.cast(correct_pred,tf.float32))
#Initializingthevariables
init=tf.initialize_all_variables()
#Launchthegraph
withtf.Session()assess:
sess.run(init)
step=1
#Keeptraininguntilreachmaxiterations
whilestep*batch_size<training_iters:
batch_xs,batch_ys=mnist.train.next_batch(batch_size)
#Fittrainingusingbatchdata
sess.run(optimizer,feed_dict={x:batch_xs,\
y:batch_ys,\
keep_prob:dropout})
ifstep%display_step==0:
#Calculatebatchaccuracy
acc=sess.run(accuracy,feed_dict={x:batch_xs,\
y:batch_ys,\
keep_prob:1.})
#Calculatebatchloss
loss=sess.run(cost,feed_dict={x:batch_xs,\
y:batch_ys,\
keep_prob:1.})
print"Iter"+str(step*batch_size)+\
",MinibatchLoss="+\
"{:.6f}".format(loss)+\
",TrainingAccuracy="+\
"{:.5f}".format(acc)
step+=1
print"OptimizationFinished!"
#Calculateaccuracyfor256mnisttestimages
print"TestingAccuracy:",\
sess.run(accuracy,\
feed_dict={x:mnist.test.images[:256],\
y:mnist.test.labels[:256],\
keep_prob:1.})
RecurrentneuralnetworksAnotherdeeplearning-orientedarchitectureisthatoftheso-calledrecurrentneuralnetworks(RNNs).ThebasicideaofRNNsistomakeuseofthesequentialinformationtypeintheinput.Inneuralnetworks,wetypicallyassumethateachinputandoutputisindependentfromalltheothers.Formanytypesofproblems,however,thisassumptiondoesnotresulttobepositive.Forexample,ifyouwanttopredictthenextwordofaphrase,itiscertainlyimportanttoknowthosethatprecedeit.Theseneuralnetsarecalledrecurrentbecausetheyperformthesamecomputationsforallelementsofasequenceofinputs,andtheoutputeachelementdepends,inadditiontothecurrentinput,onallpreviouscomputations.
RNNarchitecture
RNNsprocessasequentialinputitematatime,maintainingasortofupdatedstatevectorthatcontainsinformationaboutallpastelementsofthesequence.Ingeneral,anRNNhasashapeofthefollowingtype:
RNNarchitectureschema
TheprecedingfigureshowstheaspectofanRNN,withitsunfoldedversion,explainingthenetworkstructureforthewholesequenceofinputs,ateachinstantoftime.Itbecomesclearthat,differentlyfromthetypicalmulti-levelneuralnetworks,
whichuseseveralparametersateachlevel,anRNNalwaysusesthesameparameters,denominatedU,V,andW(seethepreviousfigure).Furthermore,anRNNperformsthesamecomputationateachinstant,onmultipleofthesamesequenceininput.Sharingthesameparameters,itstronglyreducesthenumberofparametersthatthenetworkmustlearnduringthetrainingphase,thusalsoimprovingthetrainingtime.
Itisalsoevidenthowyoucantrainnetworksofthistype,infact,becausetheparametersaresharedforeachinstantoftime,thegradientcalculatedforeachoutputdependsnotonlyfromthecurrentcomputationbutalsofromthepreviousones.Forexample,tocalculatethegradientattimet=4,itisnecessarytobackpropagatethegradientforthethreepreviousinstantsoftimeandthensumthegradientsthusobtained.Also,theentireinputsequenceistypicallyconsideredtobeasingleelementofthetrainingset.
However,thetrainingofthistypeofnetworksuffersfromtheso-calledvanishing/explodinggradientproblem;thegradients,computedandbackpropagated,tendtoincreaseordecreaseateachinstantoftimeandthen,afteracertainnumberofinstantsoftime,divergetoinfinityorconvergetozero.
LetusnowexaminehowanRNNoperates.Xt;isthenetworkinputatinstantt,whichcouldbe,forexample,avectorthatrepresentsawordofasentence,whileSt;isthestatevectorofthenet.Itcanbeconsideredasortofmemoryofthesystemwhichcontainsinformationonallthepreviouselementsoftheinputsequence.Thestatevectoratinstanttisevaluatedstartingfromthecurrentinput(timet)andthestatusevaluatedatthepreviousinstant(timet-1)throughtheUandWparameters:
St=f([U]Xt+[W]St-1)
Thefunctionfisanonlinearfunctionsuchasrectifiedlinearunit(ReLu),whileOt;istheoutputatinstantt,calculatedusingtheparameterV.
Theoutputwilldependonthetypeofproblemforthewhichthenetworkisused.Forexample,ifyouwanttopredictthenextwordofasentence,itcouldbeaprobabilityvectorwithrespecttoeachwordinthevocabularyofthesystem.
LSTMnetworks
LongSharedTermMemory(LSTM)networksareanextensionofthebasicmodelofRNNarchitectures.Themainideaistoimprovethenetwork,providingitwithan
explicitmemory.TheLSTMnetworks,infact,despitenothavinganessentiallydifferentarchitecturefromRNN,areequippedwithspecialhiddenunits,calledmemorycells,thebehaviorofwhichistorememberthepreviousinputforalongtime.
ALSTM)unit
TheLSTMunithasthreegatesandfourinputweights,xt(fromthedatatotheinputandthreegates),whilehtistheoutputoftheunit.
ALSTMblockcontainsgatesthatdeterminewhetheraninputissignificantenoughtobesaved.Thisblockisformedbyfourunits:
Inputgate:AllowsthevalueinputinthestructureForgetgate:GoestoeliminatethevaluescontainedinthestructureOutputgate:DetermineswhentheunitwilloutputthevaluestrappedinstructureCell:Enablesordisablesthememorycell
Inthenextexample,wewillseeaTensorFlowimplementationofaLSTMnetworkinalanguageprocessingproblem.
NLPwithTensorFlow
RNNshaveprovedtohaveexcellentperformanceinproblemssuchaspredictingthenextcharacterinatextor,similarly,thepredictionofthenextsequencewordinasentence.However,theyarealsousedformorecomplexproblems,suchasMachineTranslation.Inthiscase,thenetworkwillhaveasinputasequenceofwordsinasourcelanguage,whileyouwanttooutputthecorrespondingsequenceofwordsinalanguagetarget.Finally,anotherapplicationofgreatimportanceinwhichRNNsarewidelyusedisthatofspeechrecognition.Inthefollowing,wewilldevelopacomputationalmodelthatcanpredictthenextwordinatextbasedonthesequenceoftheprecedingwords.Tomeasuretheaccuracyofthemodel,wewillusethePennTreeBank(PTB)dataset,whichisthebenchmarkusedtomeasuretheprecisionofthesemodels.
Thisexamplereferstothefilesthatyoufindinthe/rnn/ptbdirectoryofyourTensorFlowdistribution.Itcomprisesofthefollowingtwofiles:
ptb_word_lm.py:ThequeuestotrainalanguagemodelonthePTBdatasetreader.py:Thecodetoreadthedataset
Unlikepreviousexamples,wewillpresentonlythepseudocodeoftheprocedureimplemented,inordertounderstandthemainideasbehindtheconstructionofthemodel,withoutgettingboggeddowninunnecessaryimplementationdetails.Thesourcecodeisquitelong,andanexplanationofthecodelinebylinewouldbetoocumbersome.
Note
Seehttps://www.tensorflow.org/versions/r0.8/tutorials/recurrent/index.htmlforotherreferences.
Downloadthedata
Youcandownloadthedatafromthewebpagehttp://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgzandthenextractthedatafolder.Thedatasetispreprocessedandcontains10000,differentwords,includingtheend-of-sentencemarkerandaspecialsymbol(<unk>)forrarewords.Weconvertalloftheminreader.pytouniqueintegeridentifierstomakeiteasyfortheneuralnetworktoprocess.
Toextracta.tgzfilewithtar,youneedtousethefollowing:
tar-xvzf/path/to/yourfile.tgz
BuildingthemodelThismodelimplementsanarchitectureoftheRNNusingtheLSTM.Infact,itplanstoincreasethearchitectureoftheRNNbyincludingstorageunitsthatallowsavinginformationregardinglong-termtemporaldependencies.
TheTensorFlowlibraryallowsyoutocreateaLSTMthroughthefollowingcommand:
lstm=rnn_cell.BasicLSTMCell(size)
HeresizeshouldbethenumberofunitstobeusedLSTM.TheLSTMmemoryisinitializedtozero:
state=tf.zeros([batch_size,lstm.state_size])
Inthecourseofcomputation,aftereachwordtoexaminethestatevalueisupdatedwiththeoutputvalue,followingisthepseudocodelistoftheimplementedsteps:
loss=0.0
forcurrent_batch_of_wordsinwords_in_dataset:
output,state=lstm(current_batch_of_words,state)
outputisthenusedtomakepredictionsonthepredictionofthenextword:
logits=tf.matmul(output,softmax_w)+softmax_b
probabilities=tf.nn.softmax(logits)
loss+=loss_function(probabilities,target_words)
Thelossfunctionminimizestheaveragenegativelogprobabilityofthetargetwords,itistheTensorFowfunction:
tf.nn.seq2seq.sequence_loss_by_example
Itcomputestheaverageper-wordperplexity,itsvaluemeasurestheaccuracyofthemodel(tolowervaluescorrespondbestperformance)andwillbemonitoredthroughoutthetrainingprocess.
RunningthecodeThemodelimplementedsupportsthreetypesofconfigurations:small,medium,andlarge.ThedifferencebetweenthemisinsizeoftheLSTMsandthesetofhyperparametersusedfortraining.Thelargerthemodel,thebetterresultsitshouldget.Thesmallmodelshouldbeabletoreachperplexitybelow120onthetestsetandthelargeonebelow80,thoughitmighttakeseveralhourstotrain.
Toexecutethemodelsimplytypethefollowing:
pythonptb_word_lm--data_path=/tmp/simple-examples/data/--model
small
In/tmp/simple-examples/data/,youmusthavedownloadedthedatafromthePTBdataset.
Thefollowinglistshowstherunafter8hoursoftraining(13epochsforasmallconfiguration):
Epoch:1Learningrate:1.000
0.004perplexity:5263.762speed:391wps
0.104perplexity:837.607speed:429wps
0.204perplexity:617.207speed:442wps
0.304perplexity:498.160speed:438wps
0.404perplexity:430.516speed:436wps
0.504perplexity:386.339speed:427wps
0.604perplexity:348.393speed:431wps
0.703perplexity:322.351speed:432wps
0.803perplexity:301.630speed:431wps
0.903perplexity:282.417speed:434wps
Epoch:1TrainPerplexity:268.124
Epoch:1ValidPerplexity:180.210
Epoch:2Learningrate:1.000
0.004perplexity:209.082speed:448wps
0.104perplexity:150.589speed:437wps
0.204perplexity:157.965speed:436wps
0.304perplexity:152.896speed:453wps
0.404perplexity:150.299speed:458wps
0.504perplexity:147.984speed:462wps
0.604perplexity:143.367speed:462wps
0.703perplexity:141.246speed:446wps
0.803perplexity:139.299speed:436wps
0.903perplexity:135.632speed:435wps
Epoch:2TrainPerplexity:133.576
Epoch:2ValidPerplexity:143.072
............................................................
Epoch:12Learningrate:0.008
0.004perplexity:57.011speed:347wps
0.104perplexity:41.305speed:356wps
0.204perplexity:45.136speed:356wps
0.304perplexity:43.386speed:357wps
0.404perplexity:42.624speed:358wps
0.504perplexity:41.980speed:358wps
0.604perplexity:40.549speed:357wps
0.703perplexity:39.943speed:357wps
0.803perplexity:39.287speed:358wps
0.903perplexity:37.949speed:359wps
Epoch:12TrainPerplexity:37.125
Epoch:12ValidPerplexity:123.571
Epoch:13Learningrate:0.004
0.004perplexity:56.576speed:365wps
0.104perplexity:40.989speed:358wps
0.204perplexity:44.809speed:358wps
0.304perplexity:43.082speed:356wps
0.404perplexity:42.332speed:356wps
0.504perplexity:41.694speed:356wps
0.604perplexity:40.275speed:357wps
0.703perplexity:39.673speed:356wps
0.803perplexity:39.021speed:356wps
0.903perplexity:37.690speed:356wps
Epoch:13TrainPerplexity:36.869
Epoch:13ValidPerplexity:123.358
TestPerplexity:117.171
Asyoucansee,theperplexitybecameloweraftereachepoch.
SummaryInthischapter,wegaveanoverviewofdeeplearningtechniques,examiningtwoofthedeeplearningarchitecturesinuse,CNNandRNNs.ThroughtheTensorFlowlibrary,wedevelopedaconvolutionalneuralnetworkarchitectureforimageclassificationproblem.ThelastpartofthechapterwasdevotedtoRNNs,wherewedescribedtheTensorFlow'stutorialforRNNs,whereaLSTMnetworkisbuilttopredictthenextwordinanEnglishsentence.
ThenextchaptershowstheTensorFlowfacilitiesforGPUcomputingandintroducesTensorFlowserving,ahighperformance,opensourceservingsystemformachinelearningmodels,designedforproductionenvironmentsandoptimizedforTensorFlow.
Chapter6.GPUProgrammingandServingwithTensorFlowInthischapter,wewillcoverthefollowingtopics:
GPUprogrammingTensorFlowServing:
HowtoinstallTensorFlowServingHowtouseTensorFlowServingHowtoloadandexportaTensorFlowmodel
GPUprogrammingInChapter5,DeepLearning,wherewetrainedarecurrentneuralnetwork(RNN)foranNLPapplication,wecouldseethatdeeplearningapplicationscanbecomputationallyintensive.However,youcanreducethetrainingtimebyusingparallelprogrammingtechniquesthroughagraphicprocessingunit(GPU).Infact,thecomputationalresourcesofmoderngraphicsunitsmakethemabletoperformparallelcodeportions,ensuringhighperformance.
TheGPUprogrammingmodelisaprogrammingstrategythatconsistsofreplacingaCPUtoaGPUtoacceleratetheexecutionofavarietyofapplications.Therangeofapplicationsofthisstrategyisverylargeandisgrowingdaybyday;theGPUs,currently,areabletoreducetheexecutiontimeofapplicationsacrossdifferentplatforms,fromcarstomobilephones,andfromtabletstodronesandrobots.
ThefollowingdiagramshowshowtheGPUprogrammingmodelworks.Intheapplication,therearecallstotelltheCPUtogiveawayspecificpartofthecodeGPUandletitruntogethighexecutionspeed.ThereasonforsuchspecificparttorelyontwoGPUisuptothespeedprovidedbytheGPUarchitecture.GPUhasmanyStreamingMultiprocessors(SMPs),witheachhavingmanycomputationalcores.ThesecoresarecapableofperformingALUandotheroperationswiththehelpofSingleInstructionMultipleThread(SIMT)calls,whichreducetheexecutiontimedrastically.
IntheGPUprogrammingmodeltherearepiecesofcodethatareexecutedsequentiallyintheCPU,andsomepartsareexecutedinparallelbytheGPU
TensorFlowpossessescapabilitiesthatyoucantakeadvantageofthisprogrammingmodel(ifyouhaveaNVIDIAGPU),thepackageversionthatsupportsGPUrequiresCudaToolkit7.0and6.5CUDNNV2.
Note
FortheinstallationofCudaenvironment,wesuggestreferringtheCudainstallationpage:http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#axzz49w1XvzNj
TensorFlowreferstothesedevicesinthefollowingway:
/cpu:0:ToreferencetheserverCPU/gpu:0:TheGPUserverifthereisonlyone
/gpu:1:ThesecondGPUserverandsoon
Tofindoutwhichdeviceisassignedtoouroperationsandtensionersneedtocreatethesessionwiththeoptionofsettinglog_device_placementinstantiatedtoTrue.
Considerthefollowingexample.
Wecreateacomputationalgraph;aandbwillbetwomatrices:
a=tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[2,3],
name='a')
b=tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[3,2],
name='b')
Incweputthematrixmultiplicationofthesetwoinputtensors:
c=tf.matmul(a,b)
Thenwebuildasessionwithlog_device_placementsettoTrue:
sess=tf.Session(config=tf.ConfigProto(log_device_placement=True))
Finally,welaunchthesession:
printsess.run(c)
Youshouldseethefollowingoutput:
Devicemapping:
/job:localhost/replica:0/task:0/gpu:0->device:0,name:TeslaK40c,
pcibus
id:0000:05:00.0
b:/job:localhost/replica:0/task:0/gpu:0
a:/job:localhost/replica:0/task:0/gpu:0
MatMul:/job:localhost/replica:0/task:0/gpu:0
[[22.28.]
[49.64.]]
Ifyouwouldlikeaparticularoperationtorunonadeviceofyourchoiceinsteadofwhat'sautomaticallyselectedforyou,youcanusetf.devicetocreateadevicecontext,sothatalltheoperationswithinthatcontextwillhavethesamedeviceassignment.
Let'screatethesamecomputationalgraphusingthetf.deviceinstruction:
withtf.device('/cpu:0'):
a=tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[2,3],
name='a')
b=tf.constant([1.0,2.0,3.0,4.0,5.0,6.0],shape=[3,2],
name='b')
c=tf.matmul(a,b)
Again,webuildthesessiongraphandlaunchit:
sess=tf.Session(config=tf.ConfigProto(log_device_placement=True))
printsess.run(c)
Youwillseethatnowaandbareassignedtocpu:0:
Devicemapping:
/job:localhost/replica:0/task:0/gpu:0->device:0,name:TeslaK40c,
pcibus
id:0000:05:00.0
b:/job:localhost/replica:0/task:0/cpu:0
a:/job:localhost/replica:0/task:0/cpu:0
MatMul:/job:localhost/replica:0/task:0/gpu:0
[[22.28.]
[49.64.]]
IfyouhavemorethanaGPU,youcandirectlyselectitsettingallow_soft_placementtoTrueintheconfigurationoptionwhencreatingthesession.
TensorFlowServingServingisaTensorFlowpackagethathasbeendevelopedtotakemachinelearningmodelsintoproductionsystems.ItmeansthatadevelopercanuseTensorFlowServing'sAPItobuildaservertoservetheimplementedmodel.
Theservedmodelwillbeabletomakeinferencesandpredictionseachtimeondatapresentedbyitsclients,allowingtoimprovethemodel.
Tocommunicatewiththeservingsystem,theclientsuseahighperformanceopensourceremoteprocedurecall(RPC)interfacedevelopedbyGoogle,calledgRPC.
Thetypicalpipeline(seethefollowingfigure)isthattrainingdataisfedtothelearner,whichoutputsamodel.Afterbeingvalidated,itisreadytobedeployedtotheTensorFlowservingsystem.Itisquitecommontolaunchanditerateonourmodelovertime,asnewdatabecomesavailable,orasyouimprovethemodel.
TensorFlowServingpipeline
HowtoinstallTensorFlowServingTocompileanduseTensorFlowServing,youneedtosetupsomeprerequisites.
Bazel
TensorFlowServingrequiresBazel0.2.0(http://www.bazel.io/)orhigher.Downloadbazel-0.2.0-installer-linux-x86_64.sh.
Note
Bazelisatoolthatautomatessoftwarebuildsandtests.Supportedbuildtasksincluderunningcompilersandlinkerstoproduceexecutableprogramsandlibraries,andassemblingdeployablepackages.
Runthefollowingcommands:
chmod+xbazel-0.2.0-installer-linux-x86_64.sh
./bazel-0.2.0-installer-linux-x86_64.sh-user
Finally,setupyourenvironment.Exportthisinyour~/.bashrcdirectory:
exportPATH="$PATH:$HOME/bin"
gRPC
OurtutorialsusegRPC(0.13orhigher)asourRPCframework.
Note
Youcanfindotherreferencesathttps://github.com/grpc.
TensorFlowservingdependencies
ToinstallTensorFlowservingdependencies,executethefollowing:
sudoapt-getupdate&&sudoapt-getinstall-y\
build-essential\
curl\
git\
libfreetype6-dev\
libpng12-dev\
libzmq3-dev\
pkg-config\
python-dev\
python-numpy\
python-pip\
software-properties-common\
swig\
zip\
zlib1g-dev
ThenconfigureTensorFlow,byrunningthefollowingcommand:
cdtensorflow
./configure
cd..
InstallServing
UseGittoclonetherepository:
gitclone--recurse-submodules
https://github.com/tensorflow/serving
cdserving
The--recurse-submodulesoptionisrequiredtofetchTensorFlow,gRPC,andotherlibrariesthatTensorFlowservingdependson.TobuildTensorFlow,youmustuseBazel:
bazelbuildtensorflow_serving/
Thebinarieswillbeplacedinthebazel-bindirectory,andcanberunusingthefollowingcommand:
/bazel-bin/tensorflow_serving/example/mnist_inference
Finally,youcantesttheinstallationbyexecutingthefollowingcommand:
bazeltesttensorflow_serving/
HowtouseTensorFlowServingInthistutorial,wewillshowhowtoexportatrainedTensorFlowmodelandbuildaservertoservetheexportedmodel.TheimplementedmodelisaSoftmaxRegressionmodelforhandwrittenimageclassification(MNISTdata).
Thecodewillconsistoftwoparts:
APythonfile(mnist_export.py)thattrainsandexportsthemodelAC++file(mnist_inference.cc)thatloadstheexportedmodelandrunsagRPCservicetoserveit
Inthefollowingsections,wereportthebasicstepstouseTensorFlowServing.Forotherreferences,youcanviewhttps://tensorflow.github.io/serving/serving_basic.
TrainingandexportingtheTensorFlowmodel
Asyoucanseeinmnist_export.py,thetrainingisdonethesamewayasintheMNIST.Forabeginnerstutorial,referthefollowinglink:
https://www.tensorflow.org/versions/r0.9/tutorials/mnist/beginners/index.html
TheTensorFlowgraphislaunchedinTensorFlowsessionsess,withtheinputtensor(image)asxandtheoutputtensor(Softmaxscore)asy.ThenweusetheTensorFlowservingexportertoexportthemodel;itbuildsasnapshotofthetrainedmodelsothatitcanbeloadedlaterforinference.Let'snowseethemainfunctiontousetoexportatrainedmodel.
Importtheexportertoserializethemodel:
fromtensorflow_serving.session_bundleimportexporter
ThenyoumustdefinesaverusingtheTensorFlowfunctiontf.train.Saver.IthastheshardedparameterequaltoTrue:
saver=tf.train.Saver(sharded=True)
saverisusedtoserializegraphvariablevaluestothemodelexportsothattheycanbeproperlyrestoredlater.
Thenextstepistodefinemodel_exporter:
model_exporter=exporter.Exporter(saver)
signature=exporter.classification_signature\
(input_tensor=x,scores_tensor=y)
model_exporter.init(sess.graph.as_graph_def(),
default_graph_signature=signature)
model_exportertakesthefollowingtwoarguments:
sess.graph.as_graph_def()istheprotobufofthegraph.ExportingwillserializetheprotobuftothemodelexportsothattheTensorFlowgraphcanbeproperlyrestoredlater.default_graph_signature=signaturespecifiesamodelexportsignature.Thesignaturespecifieswhattypeofmodelisbeingexported,andtheinput/outputtensorstobindtowhenrunninginference.Inthiscase,youuseexporter.classification_signaturetospecifythatthemodelisaclassificationmodel.
Finally,wecreateourexport:
model_exporter.export(export_path,tf.constant\
(FLAGS.export_version),sess)
model_exporter.exporttakesthefollowingarguments:
export_pathisthepathoftheexportdirectory.Exportwillcreatethedirectoryifitdoesnotexist.tf.constant(FLAGS.export_version)isatensorthatspecifiestheversionofthemodel.Youshouldspecifyalargerintegervaluewhenexportinganewerversionofthesamemodel.Eachversionwillbeexportedtoadifferentsub-directoryunderthegivenpath.sessistheTensorFlowsessionthatholdsthetrainedmodelyouareexporting.
Runningasession
Toexportthemodel,firstcleartheexportdirectory:
$>rm-rf/tmp/mnist_model
Then,usingbazel,buildthemnist_exportexample:
$>bazelbuild//tensorflow_serving/example:mnist_export
Finally,youcanrunthefollowingexample:
$>bazel-bin/tensorflow_serving/example/mnist_export/tmp/mnist_model
Trainingmodel...
Donetraining!
Exportingtrainedmodelto/tmp/mnist_model
Doneexporting!
Lookingintheexportdirectory,weshouldhaveasub-directoryforexportingeachversionofthemodel:
$>ls/tmp/mnist_model
00000001
Thecorrespondingsub-directoryhasthedefaultvalueof1,becausewespecifiedtf.constant(FLAGS.export_version)asthemodelversionearlier,andFLAGS.export_versionhasthedefaultvalueof1.
Eachversionofsub-directorycontainsthefollowingfiles:
export.metaistheserializedtensorflow::MetaGraphDefofthemodel.Itincludesthegraphdefinitionofthemodel,aswellasmetadataofthemodel,suchassignatures.export-?????-of-?????arefilesthatholdtheserializedvariablesofthegraph.
$>ls/tmp/mnist_model/00000001
checkpointexport-00000-of-00001export.meta
LoadingandexportingaTensorFlowmodelTheC++codeforloadingtheexportedTensorFlowmodelisinthemain()functioninmnist_inference.cc.Herewereportanexcerpt;wedonotconsidertheparametersforbatching.Ifyouwanttoadjustthemaximumbatchsize,timeoutthreshold,orthenumberofbackgroundthreadsusedforbatchedinference,youcandosobysettingmorevaluesinBatchingParameters:
intmain(intargc,char**argv)
{
SessionBundleConfigsession_bundle_config;
...Herebatchingparameters
std::unique_ptr<SessionBundleFactory>bundle_factory;
TF_QCHECK_OK(
SessionBundleFactory::Create(session_bundle_config,
&bundle_factory));
std::unique_ptr<SessionBundle>bundle(newSessionBundle);
TF_QCHECK_OK(bundle_factory->CreateSessionBundle(bundle_path,
&bundle));
......
RunServer(FLAGS_port,std::move(bundle));
return0;
}
SessionBundleisacomponentofTensorFlowServing.Let'sconsidertheincludefileSessionBundle.h:
structSessionBundle{
std::unique_ptr<tensorflow::Session>session;
tensorflow::MetaGraphDefmeta_graph_def;
};
ThesessionparameterisaTensorFlowsessionthathastheoriginalgraphwiththenecessaryvariablesproperlyrestored.
SessionBundleFactory::CreateSessionBundle()loadstheexportedTensorFlowmodelfrombundle_pathandcreatesaSessionBundleobjectforrunninginferencewiththemodel.
RunServerbringsupagRPCserverthatexportsasingleClassify()API.
Eachinferencerequestwillbeprocessedinthefollowingsteps:
1. Verifytheinput.TheserverexpectsexactlyoneMNIST-formatimageforeachinferencerequest.
2. Transforminputtoinferenceinputtensorandcreateoutputtensorplaceholder.3. Runinference.
Torunaninference,youmusttypethefollowingcommand:
$>bazelbuild//tensorflow_serving/example:mnist_inference
$>bazel-bin/tensorflow_serving/example/mnist_inference--port=9000
/tmp/mnist_model/00000001
TesttheserverTotesttheserver,weusethemnist_client.py(https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/mnist_client.pyutility.
ThisclientdownloadsMNISTtestdata,sendsitasrequeststotheserver,andcalculatestheinferenceerrorrate.
Torunit,typethefollowingcommand:
$>bazelbuild//tensorflow_serving/example:mnist_client
$>bazel-bin/tensorflow_serving/example/mnist_client--num_tests=1000
--server=localhost:9000
Inferenceerrorrate:10.5%
Theresultconfirmsthattheserverloadsandrunsthetrainedmodelsuccessfully.Infact,a10.5%inferenceerrorratefor1,000imagesgivesus91%accuracyforthetrainedSoftmaxmodel.
SummaryWedescribedtwoimportantfeaturesofTensorFlowinthischapter.FirstwasthepossibilityofusingtheprogrammingmodelknownasGPUcomputing,withwhichitbecomespossibletospeedupthecode(forexample,thetrainingphaseofaneuralnetwork).ThesecondpartofthechapterwasdevotedtodescribingtheframeworkTensorFlowServing.Itisahighperformance,opensourceservingsystemformachinelearningmodels,designedforproductionenvironmentsandoptimizedforTensorFlow.Thispowerfulframeworkcanrunmultiplemodelsatlargescalethatchangeovertime,basedonreal-worlddata,enablingamoreefficientuseofGPUresourcesandallowingthedevelopertoimprovetheirownmachinelearningmodels.