Scientific Programming with the SciPy Stack

61
Scientific Programming with the SciPy Stack Shaun Walbridge Kevin Butler

Transcript of Scientific Programming with the SciPy Stack

Page 1: Scientific Programming with the SciPy Stack

ScientificProgrammingwiththeSciPyStackShaunWalbridge

KevinButler

Page 3: Scientific Programming with the SciPy Stack

ScientificComputing

Page 4: Scientific Programming with the SciPy Stack

Theapplicationofcomputationalmethodstoallaspectsoftheprocessofscientificinvestigation–dataacquisition,datamanagement,analysis,visualization,andsharingofmethodsandresults.

ScientificComputing

Page 5: Scientific Programming with the SciPy Stack

Python

Page 6: Scientific Programming with the SciPy Stack

WhyPython?Accessiblefornew-comers,andthe

Extensivepackagecollection(56kon ),broaduser-baseStronggluelanguageusedtobindtogethermanyenvironments,bothopensourceandcommercialOpensourcewithliberallicense—dowhatyouwant

mosttaughtfirstlanguageinUSuniversites

PyPI

Page 7: Scientific Programming with the SciPy Stack

WhyPython?Accessiblefornew-comers,andthe

Extensivepackagecollection(56kon ),broaduser-baseStronggluelanguageusedtobindtogethermanyenvironments,bothopensourceandcommercialOpensourcewithliberallicense—dowhatyouwant

BrandnewtoPython?ThistalkmaybechallengingResourcesincludematerialsthatforgettingstarted

mosttaughtfirstlanguageinUSuniversites

PyPI

Page 8: Scientific Programming with the SciPy Stack

PythoninArcGISPythonAPIfordrivingArcGISDesktopandServerAfullyintegratedmodule:import arcpyInteractiveWindow,PythonAddins,PythonTooboxesExtensions:

SpatialAnalyst:arcpy.saMapDocument:arcpy.mappingNetworkAnalyst:arcpy.naGeostatistics:arcpy.gaFastcursors:arcpy.da

Page 9: Scientific Programming with the SciPy Stack

PythoninArcGISPython3.4inPro( )

arcpy.mpinsteadofarcpy.mappingContinuetoaddmodules:NetCDF4,xlrd,xlwt,PyPDF2,dateutil,pip

,witha usingSciPyforontheflyvisualizations

DesktopvsProPython

Pythonrasterfunction repositoryofexamples

Page 10: Scientific Programming with the SciPy Stack

PythoninArcGISHere,focusonSciPystack,what’sincludedoutoftheboxMovetowardmaintainable,reusablecodeandbeyondthe“one-off”Recurringtheme:multi-dimensionaldatastructuresRelatedtalkstoday:

2:30PM,thisroom(SantaRosa)

4:00PM,MesquiteGH

GettingDataSciencewithRandArcGIS

PythoninArcGISUsingtheCondaDistribution

Page 11: Scientific Programming with the SciPy Stack

SciPy

Page 12: Scientific Programming with the SciPy Stack

WhySciPy?Mostlanguagesdon’tsupportthingsusefulforscience,e.g.:

VectorprimitivesComplexnumbersStatistics

Objectorientedprogrammingisn’talwaystherightparadigmforanalysisapplications,butistheonlywaytogoinmanymodernlanguagesSciPybringsthepiecesthatmatterforscientificproblemstoPython.

Page 13: Scientific Programming with the SciPy Stack

IncludedSciPyPackage KLOC Contributors Stars

118 426 3359

7 79 912

236 405 2683

183 407 5834

387 375 2150

243 427 2672

Totals 1174 1784

matplotlib

Nose

NumPy

Pandas

SciPy

SymPy

Page 14: Scientific Programming with the SciPy Stack

TestingwithNose—aPythonframeworkfortesting

Testsimproveyourproductivity,andcreaterobustcodeNosebuildsonunittestframework,extendsittomaketestingeasy.Pluginarchitecture, andcanbeextendedwith .

Nose

includesanumberofpluginsthird-partyplugins

Page 15: Scientific Programming with the SciPy Stack

1. Anarrayobjectofarbitraryhomogeneousitems2. Fastmathematicaloperationsoverarrays3. RandomNumberGeneration

,CC-BYSciPyLectures

Page 16: Scientific Programming with the SciPy Stack

ArcGIS+NumPyArcGISandNumPycaninteroperateonraster,table,andfeaturedata.SeeIn-memorydatamodel.Examplescriptto ifworkingwithlargerdata.

WorkingwithNumPyinArcGISprocessbyblocks

Page 17: Scientific Programming with the SciPy Stack

ArcGIS+NumPy

Page 18: Scientific Programming with the SciPy Stack

PlottinglibraryandAPIforNumPydataMatplotlibGallery

Page 20: Scientific Programming with the SciPy Stack

SciPy:GeometricMean

CalculatingageometricmeanofanentirerasterusingSciPy( )source

import scipy.stats rast_in = 'data/input_raster.tif'rast_as_numpy_array = arcpy.RasterToNumPyArray(rast_in)raster_geometric_mean = scipy.stats.stats.gmean( rast_as_numpy_array, axis=None)

Page 21: Scientific Programming with the SciPy Stack

UseCase:BenthicTerrainModeler

Page 22: Scientific Programming with the SciPy Stack

BenthicTerrainModeler

APythonAdd-inandPythontoolboxforgeomorphologyOpensource,canborrowcodeforyourownprojects:

Activecommunityofusers,primarilymarinescientists,butalsousefulforotherapplications

https://github.com/EsriOceans/btm

Page 23: Scientific Programming with the SciPy Stack

LightweightSciPyIntegration

Usingscipy.ndimagetoperformbasicmultiscaleanalysisUsingscipy.statstocomputecircularstatistics

Page 24: Scientific Programming with the SciPy Stack

LightweightSciPyIntegration

Examplesource

import arcpyimport scipy.ndimage as ndfrom matplotlib import pyplot as plt

ras = "data/input_raster.tif"r = arcpy.RasterToNumPyArray(ras, "", 200, 200, 0)

fig = plt.figure(figsize=(10, 10))

Page 25: Scientific Programming with the SciPy Stack

LightweightSciPyIntegration

for i in xrange(25): size = (i+1) * 3 print "running {}".format(size) med = nd.median_filter(r, size)

a = fig.add_subplot(5, 5,i+1) plt.imshow(med, interpolation='nearest') a.set_title('{}x{}'.format(size, size)) plt.axis('off') plt.subplots_adjust(hspace = 0.1) prev = med

plt.savefig("btm-scale-compare.png", bbox_inches='tight')

Page 26: Scientific Programming with the SciPy Stack
Page 27: Scientific Programming with the SciPy Stack

SciPyStatistics

Breakdownaspectintosin()andcos()variablesAspectisacircularvariable—withoutthis0and360areoppositesinsteadofbeingthesamevalue

Page 28: Scientific Programming with the SciPy Stack

SciPyStatisticsSummarystatisticsfromSciPyincludecircularstatistics( ).source

import scipy.stats.morestats

ras = "data/aspect_raster.tif"r = arcpy.RasterToNumPyArray(ras)

morestats.circmean(r)morestats.circstd(r)morestats.circvar(r)

Page 29: Scientific Programming with the SciPy Stack

Demo:SciPy

Page 30: Scientific Programming with the SciPy Stack

MultidimensionalData

Page 31: Scientific Programming with the SciPy Stack

NetCDF4Fast,HDF5andNetCDF4read+writesupport,OPeNDAPHeirarchicaldatastructuresWidelyusedinmeterology,oceanography,climatecommunitiesEasier:MultidimensionalToolbox,butcanbeuseful

( )Source

import netCDF4nc = netCDF4.Dataset('test.nc', 'r', format='NETCDF4')print nc.file_format# outputs: NETCDF4nc.close()

Page 32: Scientific Programming with the SciPy Stack

MultidimensionalImprovements

Multidimensionalformats:HDF,GRIB,NetCDFAccessviaOPeNDAP,vectorrenderer,RasterFunctionChaining

Multi-DsupportedasWMS,andinMosaicdatasets(10.2.1+)Anexamplewhichcombinesmutli-Dwithtime

Page 33: Scientific Programming with the SciPy Stack

Pandas

Page 34: Scientific Programming with the SciPy Stack

PanelData—likeR"dataframes"BringarobustdataanalysisworkflowtoPythonDataframesarefundamental—treattabular(andmulti-dimensional)dataasalabeled,indexedseriesofobservations.

Page 35: Scientific Programming with the SciPy Stack

( )Source

import pandas

data = pandas.read_csv('data/season-ratings.csv')data.columns

Index([u'season', u'households', u'rank', u'tv_households', \ u'net_indep', u'primetime_pct'], dtype='object')

Page 36: Scientific Programming with the SciPy Stack

majority_simpsons = data[data.primetime_pct > 50]

season households tv_households net_indep primetime_pct0 1 13.4m[41] 92.1 51.6 80.7511741 2 12.2m[n2] 92.1 50.4 78.5046732 3 12.0m[n3] 92.1 48.4 76.5822783 4 12.1m[48] 93.1 46.2 72.7559064 5 10.5m[n4] 93.1 46.5 72.0930235 6 9.0m[50] 95.4 46.1 71.0323576 7 8.0m[51] 95.9 46.6 70.7132027 8 8.6m[52] 97.0 44.2 67.5840988 9 9.1m[53] 98.0 42.3 64.3835629 10 7.9m[54] 99.4 39.9 60.91603110 11 8.2m[55] 100.8 38.1 57.46606311 12 14.7m[56] 102.2 36.8 53.95894412 13 12.4m[57] 105.5 35.0 51.094891

Page 37: Scientific Programming with the SciPy Stack

PandasDemo

Page 38: Scientific Programming with the SciPy Stack

SymPy

Page 39: Scientific Programming with the SciPy Stack

AComputerAlgebraSystem(CAS),solvemathequations( )source

from sympy import *x = symbol('x')eq = Eq(x**3 + 2*x**2 + 4*x + 8, 0)

solve(eq, x)

Page 40: Scientific Programming with the SciPy Stack

SymPyDemo

Page 41: Scientific Programming with the SciPy Stack

WhereandHowFast?

Page 42: Scientific Programming with the SciPy Stack

WhereCanIRunThis?

Now:ArcGISPro(64-bit)ArcGISDesktopat10.4:32-bit,BackgroundGeoprocessing(64-bit),Server(64-bit),Engine(32-bit)

Bothnowshipwith (sansIPython)MKLenabledNumPyandSciPyeverywhereOlderreleases:NumPy:ArcGIS9.2+,matplotlib:ArcGIS10.1+,SciPy:10.4+,Pandas:10.4+

Upcoming:IPythonCondaformanagingfullPythonenvironments

StandalonePythonInstallforPro

ScipyStack

Page 43: Scientific Programming with the SciPy Stack

HowDoesItperform?BuiltwithIntel’s andcompilers—highlyoptimizedFortranandCunderthehood.Automatedparallelizationforexecutedcode

MathKernelLibrary(MKL)

MKLPerformanceChart

Page 44: Scientific Programming with the SciPy Stack

fromfutureimport*

Page 45: Scientific Programming with the SciPy Stack

OpeningDoorsMachinelearning(scikit-learn,scikit-image,...)Deeplearning(theano,...)Bayesianstatistics(PyMC)

MarkovChainMonteCarlo(MCMC)Frequentiststatistics(statsmodels)

Page 46: Scientific Programming with the SciPy Stack

Resources

Page 48: Scientific Programming with the SciPy Stack

NewtoPythonCourses:

Books:

ProgrammingforEverybodyCodecademy:PythonTrack

LearnPythontheHardWayHowtoThinkLikeaComputerScientist

Page 50: Scientific Programming with the SciPy Stack

ScientificCourses:

PythonScientificLectureNotesHighPerformanceScientificComputingCodingtheMatrix:LinearAlgebrathroughComputerScienceApplicationsTheDataScientist’sToolbox

Page 51: Scientific Programming with the SciPy Stack

ScientificBooks:

Free:

verycompellingbookonBayesianmethodsinPython,usesSciPy+PyMC.

ProbabilisticProgramming&BayesianMethodsforHackers

KalmanandBayesianFiltersinPython

Page 52: Scientific Programming with the SciPy Stack

ScientificPaid:

HowtouselinearalgebraandPythontosolveamazingproblems.

ThecannonicalbookonPandasandanalysis.

CodingtheMatrix

PythonforDataAnalysis:DataWranglingwithPandas,NumPy,andIPython

Page 53: Scientific Programming with the SciPy Stack

PackagesOnlyrequireSciPyStack:

Scikit-learn:

IncludesSVMs,canusethoseforimageprocessingamongotherthings...

FilterPy,Kalmanfilteringandoptimalestimation:

Lecturematerial

FilterPyonGitHubAnextensivelistofmachinelearningpackages

Page 54: Scientific Programming with the SciPy Stack

Code

AnopensourcecollectionoffunctionchainstoshowhowtodocomplexthingsusingNumPy+scipyontheflyforvisualizationpurposes

withahandfulofdescriptivestatisticsincludedinPython3.4.TIP:WantacodebasethatrunsinPython2and3?

,whichhelpsmaintainasinglecodebasethatsupportsboth.Includesthefuturizescripttoinitiallyaprojectwrittenforoneversion.

ArcPy+SciPyonGithubraster-functions

statisticslibrary

Checkoutfuture

Page 55: Scientific Programming with the SciPy Stack

ScientificArcGISExtensions

CombinesPython,R,andMATLABtosolveawidevarietyofproblems

speciesdistribution&maximumentropymodels

PySALArcGISToolboxMovementEcologyToolsforArcGIS(ArcMET)MarineGeospatialEcologyTools(MGET)

SDMToolbox

BenthicTerrainModelerGeospatialModelingEnvironmentCircuitScape

Page 56: Scientific Programming with the SciPy Stack

ConferencesThelargestgatheringofPythonistasintheworld

AmeetingofScientificPythonusersfromallwalks

ThePythoneventforPythonandGeoenthusiasts

TalksfromPythonconferencesaroundtheworldavailablefreelyonline.

PyCon

SciPy

GeoPython

PyVideo

PyVideoGIStalks

Page 57: Scientific Programming with the SciPy Stack

Closing

Page 58: Scientific Programming with the SciPy Stack

ThanksGeoprocessingTeamThemanyamazingcontributorstotheprojectsdemonstratedhere.

Getinvolved!AllareonGitHubandhappilyacceptcontributions.

Page 59: Scientific Programming with the SciPy Stack

RateThisSessioniOS,Android:Feedbackfromwithintheapp

Page 60: Scientific Programming with the SciPy Stack

RateThisSessioniOS,Android:Feedbackfromwithintheapp

WindowsPhone,ornosmartphone?Cuneiformtabletsaccepted.

Page 61: Scientific Programming with the SciPy Stack