Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

17
Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su

Transcript of Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Page 1: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Scientific Computing Beyond Matlab

Nov 19, 2012Jason Su

Page 2: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Motivation

• I’m interested in (re-)coding a general solver for sc/mcDESPOT relaxometry mapping– Open source– Extensibility to new/add’l sequences with better sensitivity to

certain parameters, e.g. B0 and MWF– Better parallelization

• But:– Large-scale code development in Matlab is cumbersome– Matlab is slow– C is hard (to write, read, debug)

• Creates large barrier for others to contribute

Page 3: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Matlab

Pros• Ubiquitous, code is cross-

platform• Can be fast with vectorized

code• Data visualization• Quick development time• Great IDE for general research

– Poor for large projects

• Many useful native libraries/toolboxes

• Built-in profiling tools

Cons• Requires license, not free

(though there is Octave)• Vectorized code is often

non-intuitive to write and hard to read

• Slow for general computations

• Limited parallel computing and GPU support

Page 4: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

C/C++

Pros• Fast• Great IDEs for large coding

projects– Not as great for general

science work

• Strong parallel computer support and CUDA

• Community libraries for scientific computing

• Profiling dependent on IDE

Cons• High learning curve and

development time• No data visualization• Compiled code is platform

specific• Compiler is not generally

installed with OSX and Windows

Page 5: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Python

Pros• Preinstalled with OSX and Linux-

based systems• Readability is a core tenet

(“pythonic”)• Quick development time• Native parallel computing support

and community GPU modules• Extensive community support

– Including neuroimaging-specific: NiPype, NiBabel

• Built-in profiling module and some IDE tools

Cons• Slow for general

computation• Mixed bag of IDEs, some are

great for coding, others for research

• Out of the box it’s a poor alternative: no linear algebra or data visualization

Page 6: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Python & Friends

Cons• Slow for general computation• Mixed bag of IDEs, some are

great for coding, others for research

• Out of the box it’s a poor alternative: no linear algebra or data visualization

Solutions• Cython, JIT compilers like PyPy• There are a few good options out

there that I’ve found:– Eclipse + PyDev, NetBeanz– Spyder – closest to MATLAB– Sage Math Notebook, IPython – like

Mathematica– It may come down to preference.

• NumPy + SciPy + Matplotlib = PyLab– Sage Math includes these as well as

other capabilities like symbolic math and graph theory

Page 7: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Pythonic?• A term of praise used by the community to refer to clean code that is

readable, intuitive, explicit, and takes advantage of coding idioms• Python

people = [‘John Doe’, ’Jane Doe’, ’John Smith’]smith_family = []for name in people:

if ‘Smith’ in name:smith_family.append(name)

smith_family = [name for name in people if ‘Smith’ in name]

• Matlabpeople = {‘John Doe’, ’Jane Doe’, ’John Smith’};smith_family = {}for name = people

if strfind(name{1},’Smith’)smith_family = [smith_family name];

endend

Page 8: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Installation

• On any OS:– Sage Math (http://www.sagemath.org/), easy unzip

installation but many “extraneous” packages (500MB)• Some issues on OSX with matplotlib

• On OSX:– Use MacPorts to install Python (2.7), SciPy,

matplotlib, and Cython• Requires gcc compiler available through Apple Developer

Page 9: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

NumPy + SciPy vs Matlab

• Same core libraries: LAPACK• Equivalent syntax but not trying to be similar• http://www.scipy.org/

NumPy_for_Matlab_Users• Key differences:– Python uses 0 (zero) based indexing. The initial

element of a sequence is found using [0].– In NumPy arrays have pass-by-reference

semantics. Slice operations are views into an array.

Page 10: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Syntax

Matlab• a\b• max(a(:))• a(end-4:end)• [0:9]

NumPy• linalg.lstsq(a,b)• a.max()• a[-5:]• arange(10.) or r_[:10.]

Page 11: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Cython• Requires a C compiler• Cython is Python with C data types.

– Dynamic typing of Python has overhead, slow for computation• Allows seamless coding of Python and embedded C-speed routines• Python values and C values can be freely intermixed, with conversions

occurring automatically wherever possible– This means for debugging C-level code, we can use all the plotting tools available

in Python

• Process is sort of like EPIC1. Write a .pyx source file2. Run the Cython compiler to generate a C file3. Run a C compiler to generate a compiled library4. Run the Python interpreter and ask it to import the module

Page 12: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Code Comparison – Matlab• Let’s try a really basic speed comparison tests = 0ticfor i = 1:1e8 s = s + i;endtoc

ticx = 1:1e8;sum(x)toc

Page 13: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Code Comparison – C#include <time.h>#include <stdio.h>

int main(){ long long unsigned int sum = 0; long long unsigned int i = 0; long long unsigned int max = 100000000; clock_t tic = clock(); for (i = 0; i <= max; i++) { sum = sum + i; } clock_t toc = clock();

printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) / CLOCKS_PER_SEC);

return 0;}

Page 14: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Code Comparison – Pythonimport timefrom numpy import *

s = 0t = time.time()for i in xrange(100000001):

s += iprint time.time() - t

t = time.time()x = arange(100000001)sum(x)print time.time() - t

Page 15: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Code Comparison – Cython• addCy.pyximport time

cdef long long int n = 100000000cdef long long int s = 0cdef long long int i = 0t = time.time()for i in xrange(n+1): s += iprint time.time() – t

• runCy.pyimport pyximport; pyximport.install()import addCy

Page 16: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Speed ComparisonLanguage/Implementation Time (sec)Matlab/For loop 0.547Matlab/Vector sum 0.817 (0.036 for sum only!)Python/For loop 15.944Python/NumPy sum 0.648 (0.135 for sum only)C/For loop 0.222Cython/For loop 0.068 (!)

Page 17: Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su.

Summary

• Python– Full featured programming language with an emphasis on

“pythonic” readability

• NumPy/SciPy– Core libraries for linear algebra and computation (fft, optimization)

• Cython– Allows as much optimization as you want, degrading gracefully

from high-level Python to low-level C– Profile, don’t over optimize too early!