Scientific Programming with the SciPy Stack
Transcript of Scientific Programming with the SciPy Stack
ScientificProgrammingwiththeSciPyStackShaunWalbridge
KevinButler
https://github.com/scw/scipy-devsummit-2016-talk
HandoutPDF
HighQualityPDF(28MB)
ResourcesSection
ScientificComputing
Theapplicationofcomputationalmethodstoallaspectsoftheprocessofscientificinvestigation–dataacquisition,datamanagement,analysis,visualization,andsharingofmethodsandresults.
ScientificComputing
Python
WhyPython?Accessiblefornew-comers,andthe
Extensivepackagecollection(56kon ),broaduser-baseStronggluelanguageusedtobindtogethermanyenvironments,bothopensourceandcommercialOpensourcewithliberallicense—dowhatyouwant
mosttaughtfirstlanguageinUSuniversites
PyPI
WhyPython?Accessiblefornew-comers,andthe
Extensivepackagecollection(56kon ),broaduser-baseStronggluelanguageusedtobindtogethermanyenvironments,bothopensourceandcommercialOpensourcewithliberallicense—dowhatyouwant
BrandnewtoPython?ThistalkmaybechallengingResourcesincludematerialsthatforgettingstarted
mosttaughtfirstlanguageinUSuniversites
PyPI
PythoninArcGISPythonAPIfordrivingArcGISDesktopandServerAfullyintegratedmodule:import arcpyInteractiveWindow,PythonAddins,PythonTooboxesExtensions:
SpatialAnalyst:arcpy.saMapDocument:arcpy.mappingNetworkAnalyst:arcpy.naGeostatistics:arcpy.gaFastcursors:arcpy.da
PythoninArcGISPython3.4inPro( )
arcpy.mpinsteadofarcpy.mappingContinuetoaddmodules:NetCDF4,xlrd,xlwt,PyPDF2,dateutil,pip
,witha usingSciPyforontheflyvisualizations
DesktopvsProPython
Pythonrasterfunction repositoryofexamples
PythoninArcGISHere,focusonSciPystack,what’sincludedoutoftheboxMovetowardmaintainable,reusablecodeandbeyondthe“one-off”Recurringtheme:multi-dimensionaldatastructuresRelatedtalkstoday:
2:30PM,thisroom(SantaRosa)
4:00PM,MesquiteGH
GettingDataSciencewithRandArcGIS
PythoninArcGISUsingtheCondaDistribution
SciPy
WhySciPy?Mostlanguagesdon’tsupportthingsusefulforscience,e.g.:
VectorprimitivesComplexnumbersStatistics
Objectorientedprogrammingisn’talwaystherightparadigmforanalysisapplications,butistheonlywaytogoinmanymodernlanguagesSciPybringsthepiecesthatmatterforscientificproblemstoPython.
IncludedSciPyPackage KLOC Contributors Stars
118 426 3359
7 79 912
236 405 2683
183 407 5834
387 375 2150
243 427 2672
Totals 1174 1784
matplotlib
Nose
NumPy
Pandas
SciPy
SymPy
TestingwithNose—aPythonframeworkfortesting
Testsimproveyourproductivity,andcreaterobustcodeNosebuildsonunittestframework,extendsittomaketestingeasy.Pluginarchitecture, andcanbeextendedwith .
Nose
includesanumberofpluginsthird-partyplugins
1. Anarrayobjectofarbitraryhomogeneousitems2. Fastmathematicaloperationsoverarrays3. RandomNumberGeneration
,CC-BYSciPyLectures
ArcGIS+NumPyArcGISandNumPycaninteroperateonraster,table,andfeaturedata.SeeIn-memorydatamodel.Examplescriptto ifworkingwithlargerdata.
WorkingwithNumPyinArcGISprocessbyblocks
ArcGIS+NumPy
PlottinglibraryandAPIforNumPydataMatplotlibGallery
Computationalmethodsfor:
Integration( )Optimization( )Interpolation( )FourierTransforms( )SignalProcessing( )LinearAlgebra( )Spatial( )Statistics( )Multidimensionalimageprocessing( )
scipy.integratescipy.optimizescipy.interpolate
scipy.fftpackscipy.signal
scipy.linalgscipy.spatialscipy.stats
scipy.ndimage
SciPy:GeometricMean
CalculatingageometricmeanofanentirerasterusingSciPy( )source
import scipy.stats rast_in = 'data/input_raster.tif'rast_as_numpy_array = arcpy.RasterToNumPyArray(rast_in)raster_geometric_mean = scipy.stats.stats.gmean( rast_as_numpy_array, axis=None)
UseCase:BenthicTerrainModeler
BenthicTerrainModeler
APythonAdd-inandPythontoolboxforgeomorphologyOpensource,canborrowcodeforyourownprojects:
Activecommunityofusers,primarilymarinescientists,butalsousefulforotherapplications
https://github.com/EsriOceans/btm
LightweightSciPyIntegration
Usingscipy.ndimagetoperformbasicmultiscaleanalysisUsingscipy.statstocomputecircularstatistics
LightweightSciPyIntegration
Examplesource
import arcpyimport scipy.ndimage as ndfrom matplotlib import pyplot as plt
ras = "data/input_raster.tif"r = arcpy.RasterToNumPyArray(ras, "", 200, 200, 0)
fig = plt.figure(figsize=(10, 10))
LightweightSciPyIntegration
for i in xrange(25): size = (i+1) * 3 print "running {}".format(size) med = nd.median_filter(r, size)
a = fig.add_subplot(5, 5,i+1) plt.imshow(med, interpolation='nearest') a.set_title('{}x{}'.format(size, size)) plt.axis('off') plt.subplots_adjust(hspace = 0.1) prev = med
plt.savefig("btm-scale-compare.png", bbox_inches='tight')
SciPyStatistics
Breakdownaspectintosin()andcos()variablesAspectisacircularvariable—withoutthis0and360areoppositesinsteadofbeingthesamevalue
SciPyStatisticsSummarystatisticsfromSciPyincludecircularstatistics( ).source
import scipy.stats.morestats
ras = "data/aspect_raster.tif"r = arcpy.RasterToNumPyArray(ras)
morestats.circmean(r)morestats.circstd(r)morestats.circvar(r)
Demo:SciPy
MultidimensionalData
NetCDF4Fast,HDF5andNetCDF4read+writesupport,OPeNDAPHeirarchicaldatastructuresWidelyusedinmeterology,oceanography,climatecommunitiesEasier:MultidimensionalToolbox,butcanbeuseful
( )Source
import netCDF4nc = netCDF4.Dataset('test.nc', 'r', format='NETCDF4')print nc.file_format# outputs: NETCDF4nc.close()
MultidimensionalImprovements
Multidimensionalformats:HDF,GRIB,NetCDFAccessviaOPeNDAP,vectorrenderer,RasterFunctionChaining
Multi-DsupportedasWMS,andinMosaicdatasets(10.2.1+)Anexamplewhichcombinesmutli-Dwithtime
Pandas
PanelData—likeR"dataframes"BringarobustdataanalysisworkflowtoPythonDataframesarefundamental—treattabular(andmulti-dimensional)dataasalabeled,indexedseriesofobservations.
( )Source
import pandas
data = pandas.read_csv('data/season-ratings.csv')data.columns
Index([u'season', u'households', u'rank', u'tv_households', \ u'net_indep', u'primetime_pct'], dtype='object')
majority_simpsons = data[data.primetime_pct > 50]
season households tv_households net_indep primetime_pct0 1 13.4m[41] 92.1 51.6 80.7511741 2 12.2m[n2] 92.1 50.4 78.5046732 3 12.0m[n3] 92.1 48.4 76.5822783 4 12.1m[48] 93.1 46.2 72.7559064 5 10.5m[n4] 93.1 46.5 72.0930235 6 9.0m[50] 95.4 46.1 71.0323576 7 8.0m[51] 95.9 46.6 70.7132027 8 8.6m[52] 97.0 44.2 67.5840988 9 9.1m[53] 98.0 42.3 64.3835629 10 7.9m[54] 99.4 39.9 60.91603110 11 8.2m[55] 100.8 38.1 57.46606311 12 14.7m[56] 102.2 36.8 53.95894412 13 12.4m[57] 105.5 35.0 51.094891
PandasDemo
SymPy
AComputerAlgebraSystem(CAS),solvemathequations( )source
from sympy import *x = symbol('x')eq = Eq(x**3 + 2*x**2 + 4*x + 8, 0)
solve(eq, x)
SymPyDemo
WhereandHowFast?
WhereCanIRunThis?
Now:ArcGISPro(64-bit)ArcGISDesktopat10.4:32-bit,BackgroundGeoprocessing(64-bit),Server(64-bit),Engine(32-bit)
Bothnowshipwith (sansIPython)MKLenabledNumPyandSciPyeverywhereOlderreleases:NumPy:ArcGIS9.2+,matplotlib:ArcGIS10.1+,SciPy:10.4+,Pandas:10.4+
Upcoming:IPythonCondaformanagingfullPythonenvironments
StandalonePythonInstallforPro
ScipyStack
HowDoesItperform?BuiltwithIntel’s andcompilers—highlyoptimizedFortranandCunderthehood.Automatedparallelizationforexecutedcode
MathKernelLibrary(MKL)
MKLPerformanceChart
fromfutureimport*
OpeningDoorsMachinelearning(scikit-learn,scikit-image,...)Deeplearning(theano,...)Bayesianstatistics(PyMC)
MarkovChainMonteCarlo(MCMC)Frequentiststatistics(statsmodels)
Resources
OtherSessionsHarnessingthePowerofPythoninArcGISUsingtheCondaDistributionGettingDataSciencewithRandArcGISAutomatedLandSurfaceTemperatureEstimationUsingPythonandArcGISProWritingImageProcessingAlgorithmsusingthePythonRasterFunctionPython:WorkingwithScientificDataPython:DevelopingGeoprocessingToolsIntegratingOpen-sourceStatisticalPackageswithArcGIS
NewtoPythonCourses:
Books:
ProgrammingforEverybodyCodecademy:PythonTrack
LearnPythontheHardWayHowtoThinkLikeaComputerScientist
GISFocusedPythonScriptingforArcGISArcPyandArcGIS-GeospatialAnalysiswithPythonPythonDevelopersGeoNetCommunityGISStackexchange
ScientificCourses:
PythonScientificLectureNotesHighPerformanceScientificComputingCodingtheMatrix:LinearAlgebrathroughComputerScienceApplicationsTheDataScientist’sToolbox
ScientificBooks:
Free:
verycompellingbookonBayesianmethodsinPython,usesSciPy+PyMC.
ProbabilisticProgramming&BayesianMethodsforHackers
KalmanandBayesianFiltersinPython
ScientificPaid:
HowtouselinearalgebraandPythontosolveamazingproblems.
ThecannonicalbookonPandasandanalysis.
CodingtheMatrix
PythonforDataAnalysis:DataWranglingwithPandas,NumPy,andIPython
PackagesOnlyrequireSciPyStack:
Scikit-learn:
IncludesSVMs,canusethoseforimageprocessingamongotherthings...
FilterPy,Kalmanfilteringandoptimalestimation:
Lecturematerial
FilterPyonGitHubAnextensivelistofmachinelearningpackages
Code
AnopensourcecollectionoffunctionchainstoshowhowtodocomplexthingsusingNumPy+scipyontheflyforvisualizationpurposes
withahandfulofdescriptivestatisticsincludedinPython3.4.TIP:WantacodebasethatrunsinPython2and3?
,whichhelpsmaintainasinglecodebasethatsupportsboth.Includesthefuturizescripttoinitiallyaprojectwrittenforoneversion.
ArcPy+SciPyonGithubraster-functions
statisticslibrary
Checkoutfuture
ScientificArcGISExtensions
CombinesPython,R,andMATLABtosolveawidevarietyofproblems
speciesdistribution&maximumentropymodels
PySALArcGISToolboxMovementEcologyToolsforArcGIS(ArcMET)MarineGeospatialEcologyTools(MGET)
SDMToolbox
BenthicTerrainModelerGeospatialModelingEnvironmentCircuitScape
ConferencesThelargestgatheringofPythonistasintheworld
AmeetingofScientificPythonusersfromallwalks
ThePythoneventforPythonandGeoenthusiasts
TalksfromPythonconferencesaroundtheworldavailablefreelyonline.
PyCon
SciPy
GeoPython
PyVideo
PyVideoGIStalks
Closing
ThanksGeoprocessingTeamThemanyamazingcontributorstotheprojectsdemonstratedhere.
Getinvolved!AllareonGitHubandhappilyacceptcontributions.
RateThisSessioniOS,Android:Feedbackfromwithintheapp
RateThisSessioniOS,Android:Feedbackfromwithintheapp
WindowsPhone,ornosmartphone?Cuneiformtabletsaccepted.