Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
-
Upload
betty-freeman -
Category
Documents
-
view
217 -
download
0
Transcript of Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.
Scientific Computing Beyond Matlab
Nov 19, 2012Jason Su
Motivation
• I’m interested in (re-)coding a general solver for sc/mcDESPOT relaxometry mapping– Open source– Extensibility to new/add’l sequences with better sensitivity to
certain parameters, e.g. B0 and MWF– Better parallelization
• But:– Large-scale code development in Matlab is cumbersome– Matlab is slow– C is hard (to write, read, debug)
• Creates large barrier for others to contribute
Matlab
Pros• Ubiquitous, code is cross-
platform• Can be fast with vectorized
code• Data visualization• Quick development time• Great IDE for general research
– Poor for large projects
• Many useful native libraries/toolboxes
• Built-in profiling tools
Cons• Requires license, not free
(though there is Octave)• Vectorized code is often
non-intuitive to write and hard to read
• Slow for general computations
• Limited parallel computing and GPU support
C/C++
Pros• Fast• Great IDEs for large coding
projects– Not as great for general
science work
• Strong parallel computer support and CUDA
• Community libraries for scientific computing
• Profiling dependent on IDE
Cons• High learning curve and
development time• No data visualization• Compiled code is platform
specific• Compiler is not generally
installed with OSX and Windows
Python
Pros• Preinstalled with OSX and Linux-
based systems• Readability is a core tenet
(“pythonic”)• Quick development time• Native parallel computing support
and community GPU modules• Extensive community support
– Including neuroimaging-specific: NiPype, NiBabel
• Built-in profiling module and some IDE tools
Cons• Slow for general
computation• Mixed bag of IDEs, some are
great for coding, others for research
• Out of the box it’s a poor alternative: no linear algebra or data visualization
Python & Friends
Cons• Slow for general computation• Mixed bag of IDEs, some are
great for coding, others for research
• Out of the box it’s a poor alternative: no linear algebra or data visualization
Solutions• Cython, JIT compilers like PyPy• There are a few good options out
there that I’ve found:– Eclipse + PyDev, NetBeanz– Spyder – closest to MATLAB– Sage Math Notebook, IPython – like
Mathematica– It may come down to preference.
• NumPy + SciPy + Matplotlib = PyLab– Sage Math includes these as well as
other capabilities like symbolic math and graph theory
Pythonic?• A term of praise used by the community to refer to clean code that is
readable, intuitive, explicit, and takes advantage of coding idioms• Python
people = [‘John Doe’, ’Jane Doe’, ’John Smith’]smith_family = []for name in people:
if ‘Smith’ in name:smith_family.append(name)
smith_family = [name for name in people if ‘Smith’ in name]
• Matlabpeople = {‘John Doe’, ’Jane Doe’, ’John Smith’};smith_family = {}for name = people
if strfind(name{1},’Smith’)smith_family = [smith_family name];
endend
Installation
• On any OS:– Sage Math (http://www.sagemath.org/), easy unzip
installation but many “extraneous” packages (500MB)• Some issues on OSX with matplotlib
• On OSX:– Use MacPorts to install Python (2.7), SciPy,
matplotlib, and Cython• Requires gcc compiler available through Apple Developer
NumPy + SciPy vs Matlab
• Same core libraries: LAPACK• Equivalent syntax but not trying to be similar• http://www.scipy.org/
NumPy_for_Matlab_Users• Key differences:– Python uses 0 (zero) based indexing. The initial
element of a sequence is found using [0].– In NumPy arrays have pass-by-reference
semantics. Slice operations are views into an array.
Syntax
Matlab• a\b• max(a(:))• a(end-4:end)• [0:9]
NumPy• linalg.lstsq(a,b)• a.max()• a[-5:]• arange(10.) or r_[:10.]
Cython• Requires a C compiler• Cython is Python with C data types.
– Dynamic typing of Python has overhead, slow for computation• Allows seamless coding of Python and embedded C-speed routines• Python values and C values can be freely intermixed, with conversions
occurring automatically wherever possible– This means for debugging C-level code, we can use all the plotting tools available
in Python
• Process is sort of like EPIC1. Write a .pyx source file2. Run the Cython compiler to generate a C file3. Run a C compiler to generate a compiled library4. Run the Python interpreter and ask it to import the module
Code Comparison – Matlab• Let’s try a really basic speed comparison tests = 0ticfor i = 1:1e8 s = s + i;endtoc
ticx = 1:1e8;sum(x)toc
Code Comparison – C#include <time.h>#include <stdio.h>
int main(){ long long unsigned int sum = 0; long long unsigned int i = 0; long long unsigned int max = 100000000; clock_t tic = clock(); for (i = 0; i <= max; i++) { sum = sum + i; } clock_t toc = clock();
printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) / CLOCKS_PER_SEC);
return 0;}
Code Comparison – Pythonimport timefrom numpy import *
s = 0t = time.time()for i in xrange(100000001):
s += iprint time.time() - t
t = time.time()x = arange(100000001)sum(x)print time.time() - t
Code Comparison – Cython• addCy.pyximport time
cdef long long int n = 100000000cdef long long int s = 0cdef long long int i = 0t = time.time()for i in xrange(n+1): s += iprint time.time() – t
• runCy.pyimport pyximport; pyximport.install()import addCy
Speed ComparisonLanguage/Implementation Time (sec)Matlab/For loop 0.547Matlab/Vector sum 0.817 (0.036 for sum only!)Python/For loop 15.944Python/NumPy sum 0.648 (0.135 for sum only)C/For loop 0.222Cython/For loop 0.068 (!)
Summary
• Python– Full featured programming language with an emphasis on
“pythonic” readability
• NumPy/SciPy– Core libraries for linear algebra and computation (fft, optimization)
• Cython– Allows as much optimization as you want, degrading gracefully
from high-level Python to low-level C– Profile, don’t over optimize too early!