Python for Sciences and Engineering

download Python for Sciences and Engineering

of 89

Transcript of Python for Sciences and Engineering

  • 7/30/2019 Python for Sciences and Engineering

    1/89

    Python for Science

    and EngineeringDr Edward SchofieldA*STAR / Singapore Computational Sciences Club Seminar

    June 14, 2011

  • 7/30/2019 Python for Sciences and Engineering

    2/89

    Scientific programming in 2011

    Most scientists and engineers are:

    programming for 50+% of their work time (and rising)

    self-taught programmers

    using inefficient programming practices

    using the wrong programming languages: C++,

    FORTRAN, C#, PHP, Java, ...

  • 7/30/2019 Python for Sciences and Engineering

    3/89

    Scientific programming needs

    Rapid prototyping

    Efficiency for computational kernels

    Pre-written packages!

    Vectors, matrices, modelling, simulations, visualisation

    Extensibility; web front-ends; database backends; ...

  • 7/30/2019 Python for Sciences and Engineering

    4/89

    Ed's story:

    How I found PythonPhD in statistical pattern recognition: 2001-2006

    Needed good tools for my research!

    Discovered Python in 2002 after frustration with C++, Matlab,Java, Perl

    Contributed to NumPy and SciPy:

    maxent, sparse matrices, optimization, Monte Carlo, etc.

    Managed six releases of SciPy in 2005-6

  • 7/30/2019 Python for Sciences and Engineering

    5/89

    1. Why Python?

  • 7/30/2019 Python for Sciences and Engineering

    6/89

    Introducing Python

    What is it?

    What is it good for?

    Who uses it?

  • 7/30/2019 Python for Sciences and Engineering

    7/89

    What is Python?

    interpreted

    strongly but dynamically typedobject-oriented

    intuitive, readable

    open source, free

    batteries included

  • 7/30/2019 Python for Sciences and Engineering

    8/89

    batteries included

    Pythons standard library

    is:

    very large

    well-supported

    well-documented

  • 7/30/2019 Python for Sciences and Engineering

    9/89

    Pythons standard library

    data types strings networking threads

    operating

    system compression GUI arguments

    CGIcomplex

    numbersFTP cryptography

    testing multimedia databases CSV files

    calendar email XML serialization

  • 7/30/2019 Python for Sciences and Engineering

    10/89

    What is an efficient

    programming language?

    Native Python codeexecutes 10x more slowlythan C and FORTRAN

  • 7/30/2019 Python for Sciences and Engineering

    11/89

    Would you build a racing car ...... to get to Kuala Lumpur ASAP?

  • 7/30/2019 Python for Sciences and Engineering

    12/89

    Date Cost per GFLOPS (US $) Technology

    1961 US $1.1 trillion 17 million IBM 1620s

    1984 US $15,000,000 Cray X-MP

    1997 US $30,000Two 16-CPU clusters of

    Pentiums

    2000, Apr $1000 Bunyip Beowulf cluster

    2003, Aug $82 KASY0

    2007, Mar $0.42 Ambric AM2045

    2009, Sep $0.13 ATI Radeon R800

    Source: Wikipedia: FLOPS

  • 7/30/2019 Python for Sciences and Engineering

    13/89

    Unit labor cost growthProxy for cost of programmer time

  • 7/30/2019 Python for Sciences and Engineering

    14/89

    Efficiency

    When FORTRAN was invented, computer time was moreexpensive than programmer time.

    In the 1980s and 1990s that reversed.

  • 7/30/2019 Python for Sciences and Engineering

    15/89

    Efficient programming

    Python code is 10x fasterto write than C andFORTRAN

  • 7/30/2019 Python for Sciences and Engineering

    16/89

    What if ...... you now need to reach Sydney?

  • 7/30/2019 Python for Sciences and Engineering

    17/89

    Advantages of Python

    Easy to write

    Easy to maintain

    Great standard libraries

    Thriving ecosystem ofthird-party packages

    Open source

  • 7/30/2019 Python for Sciences and Engineering

    18/89

    Batteries included

    Pythons standard library is:very large

    well supported

    well documented

  • 7/30/2019 Python for Sciences and Engineering

    19/89

    Pythons standard library

    data types strings networking threads

    operating

    system compression GUI arguments

    CGIcomplex

    numbersFTP cryptography

    testing multimedia databases CSV files

    calendar email XML serialization

  • 7/30/2019 Python for Sciences and Engineering

    20/89

    QuestionWhat is the date 177 days from now?

  • 7/30/2019 Python for Sciences and Engineering

    21/89

    Natural applications of Python

    Rapid prototyping

    Plotting, visualisation, 3D

    Numerical computing

    Web and database

    programmingAll-purpose glue

  • 7/30/2019 Python for Sciences and Engineering

    22/89

    Python vs other languages

  • 7/30/2019 Python for Sciences and Engineering

    23/89

    Languages used at CSIRO

    Python Fortran Java

    Matlab C VB.net

    IDL C++ R

    Perl C# +5-10 others!

  • 7/30/2019 Python for Sciences and Engineering

    24/89

    Which language do I choose?

    A different language for each task?

    A language you know?

    A language others in your team are using: support and help?

  • 7/30/2019 Python for Sciences and Engineering

    25/89

    Python Matlab

    Interpreted Yes Yes

    Powerful data input/output Yes Yes

    Great plotting Yes Yes

    General-purpose language Powerful Limited

    Cost Free $$$

    Open source Yes No

  • 7/30/2019 Python for Sciences and Engineering

    26/89

    Python C++

    Powerful Yes Yes

    Portable Yes In theory

    Standard libraries Vast Limited

    Easy to write and maintain Yes No

    Easy to learn Yes No

  • 7/30/2019 Python for Sciences and Engineering

    27/89

    Python C

    Fast to write Yes No

    Good for embedded systems, devicedrivers and operating systems

    No Yes

    Good for most other high-level tasks Yes No

    Standard library Vast Limited

  • 7/30/2019 Python for Sciences and Engineering

    28/89

  • 7/30/2019 Python for Sciences and Engineering

    29/89

    Open source

    Python is open source software

    Benefits:No vendor lock-in

    Cross-platform

    Insurance against bugs in the platformFree

  • 7/30/2019 Python for Sciences and Engineering

    30/89

    Python success stories

    Computer graphics:

    Industrial Light & Magic

    Web:

    Google: News, Groups, Maps, Gmail

    Legacy system integration:

    AstraZeneca - collaborative drug discovery

  • 7/30/2019 Python for Sciences and Engineering

    31/89

    Python success stories (2)

    Aerospace:

    NASAResearch:

    universities worldwide ...

    Others:YouTube, Reddit, BitTorrent, Civilization IV,

  • 7/30/2019 Python for Sciences and Engineering

    32/89

  • 7/30/2019 Python for Sciences and Engineering

    33/89

    United Space Alliance

    A common sentiment:

    We achieve immediate functioning code so much faster inPython than in any other language that its staggering.

    - Robin Friedrich, Senior Project Engineer

  • 7/30/2019 Python for Sciences and Engineering

    34/89

    Case study: air-traffic control

    Eric Newton, Python forCritical Applications: http://metaslash.com/brochure/

    recall.html

    Metaslash, Inc: 1999 to 2001

    Mission-critical system for

    air-traffic controlReplicated, fault-tolerantdata storage

  • 7/30/2019 Python for Sciences and Engineering

    35/89

    Case study: air-traffic control

    Python prototype -> C++ implementation -> Python again

    Why?

    C++ dependencies were buggy

    C++ threads, STL were not portable enough

    Pythons advantages over C++

    More portable

    75% less code: more productivity, fewer bugs

  • 7/30/2019 Python for Sciences and Engineering

    36/89

    More case studies

    See for lots more casestudies and success stories

    http://www.python.org/about/success/http://www.python.org/about/success/
  • 7/30/2019 Python for Sciences and Engineering

    37/89

    2. The scientific Python ecosystem

  • 7/30/2019 Python for Sciences and Engineering

    38/89

  • 7/30/2019 Python for Sciences and Engineering

    39/89

    NumPyAn n-dimensional array/matrix package

  • 7/30/2019 Python for Sciences and Engineering

    40/89

    NumPyCentre of Pythons numerical computing ecosystem

  • 7/30/2019 Python for Sciences and Engineering

    41/89

    NumPy

    The most fundamental tool for numerical computing inPython

    Fast multi-dimensional array capability

  • 7/30/2019 Python for Sciences and Engineering

    42/89

    What NumPy defines:

    Two fundamental objects:

    1. n-dimensional array

    2. universal function

    a rich set of numerical data types

    nearly 400 functions and methods on arrays:

    type conversions

    mathematical

    logical

  • 7/30/2019 Python for Sciences and Engineering

    43/89

    NumPy's features

    Fast. Written in C with BLAS/LAPACK hooks.

    Rich set of data types

    Linear algebra: matrix inversion, decompositions,

    Discrete Fourier transforms

    Random number generation

    Trig, hypergeometric functions, etc.

  • 7/30/2019 Python for Sciences and Engineering

    44/89

    Elementwise array operations

    Loops are mostly unnecessary

    Operate on entire arrays!>>> a = numpy.array([20, 30, 40, 50])>>> a < 35array([True, True, False, False], dtype=bool)>>> b = numpy.arange(4)>>> a - barray([20, 29, 38, 47])>>> b**2array([0, 1, 4, 9])

  • 7/30/2019 Python for Sciences and Engineering

    45/89

    Universal functions

    NumPy defines 'ufuncs' that operate on entire arrays

    and other sequences (hence 'universal')Example: sin()

    >>> a = numpy.array([20, 30, 40, 50])>>> c = 10 * numpy.sin(a)

    >>> carray([ 9.12945251, -9.88031624, 7.4511316 ,-2.62374854])

  • 7/30/2019 Python for Sciences and Engineering

    46/89

    Array slicing

    Arrays can be sliced and indexed powerfully:>>> a = numpy.arange(10)**3>>> aarray([ 0, 1, 8, 27, 64, 125, 216, 343,

    512, 729])>>> a[2:5]array([ 8, 27, 64])

  • 7/30/2019 Python for Sciences and Engineering

    47/89

    Fancy indexing

    Arrays can be used as indices into other arrays:>>> a = numpy.arange(12)**2>>> ind = numpy.array([ 1, 1, 3, 8, 5 ])>>> a[ind]array([ 1, 1, 9, 64, 25])

  • 7/30/2019 Python for Sciences and Engineering

    48/89

    Other linear algebra features

    Matrix inversion: mat(A).I

    Or: linalg.inv(A)

    Linear solvers: linalg.solve(A, x)

    Pseudoinverse: linalg.pinv(A)

  • 7/30/2019 Python for Sciences and Engineering

    49/89

    What is SciPy?

    A community

    A conference

    A package of scientific libraries

  • 7/30/2019 Python for Sciences and Engineering

    50/89

    Python for scientific software

    Back-end: computational work

    Front-end: input / output, visualization, GUIs

    Dozens of great scientific packages exist

  • 7/30/2019 Python for Sciences and Engineering

    51/89

    Python in science (2)

    NumPy: numerical / array moduleMatplotlib: great 2D and 3D plotting library

    IPython: nice interactive Python shell

    SciPy: set of scientific libraries: sparse matrices, signal

    processing,

    RPy: integration with the R statistical environment

  • 7/30/2019 Python for Sciences and Engineering

    52/89

    Python in science (3)

    Cython: C language extensionsMayavi: 3D graphics, volumetric rendering

    Nitimes, Nipype: Python tools for neuroimaging

    SymPy: symbolic mathematics library

  • 7/30/2019 Python for Sciences and Engineering

    53/89

    Python in science (4)

    VPython: easy, real-time 3D programming

    UCSF Chimera, PyMOL,VMD: molecular graphics

    PyRAF: Hubble Space Telescope interface to RAF astronomicaldata

    BioPython: computational molecular biology

    Natural language toolkit: symbolic + statistical NLP

    Physics: PyROOT

  • 7/30/2019 Python for Sciences and Engineering

    54/89

    The SciPy packageBSD-licensed software for maths, science,engineering

    integration signal processing sparse matrices

    optimization linear algebra maximum entropy

    interpolation ODEs statistics

    FFTs

    n-dim image

    processing scientific constants

    clustering interpolationC/C++ and Fortran

    integration

  • 7/30/2019 Python for Sciences and Engineering

    55/89

    SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+

  • 7/30/2019 Python for Sciences and Engineering

    56/89

    Example: fitting a model withscipy.optimize

    Task: Fit a model of the form y = a/bx sin(cx)+

    to noisy data.

    Spec:

    1. Generate noisy data

    2. Choose parameters (a, b, c) to minimize sum squarederrors

    3. Plot the data and fitted model (next session)

  • 7/30/2019 Python for Sciences and Engineering

    57/89

    SciPy optimisation example

    import numpyimport pylabfrom scipy.optimize import leastsq

    def myfunc(params, x):(a, b, c) = params

    return a / (x**b) * numpy.sin(c * x)

    true_params = [1.5, 0.1, 2.]def f(x): return myfunc(true_params, x)

    def err(params, x, y): # error function return myfunc(params, x) - y

  • 7/30/2019 Python for Sciences and Engineering

    58/89

    SciPy optimisation example

    #Generate noisy data to fitn = 30; xmin = 0.1; xmax = 5x = numpy.linspace(xmin, xmax, n)y = f(x)y += numpy.rand(len(x)) * 0.2 * \

    (y.max() - y.min())

    v0 = [3., 1., 4.] # initial param estimate# Fittingv, success = leastsq(err, v0, args=(x, y), maxfev=10000)

    print'Estimated parameters: ', vprint'True parameters: ', true_paramsX = numpy.linspace(xmin, xmax, 5 * n)pylab.plot(x, y, 'ro', X, myfunc(v, X))pylab.show()

  • 7/30/2019 Python for Sciences and Engineering

    59/89

    SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+

  • 7/30/2019 Python for Sciences and Engineering

    60/89

    Ingredients for this example

    numpy.linspace

    numpy.random.rand for the noise model (uniform)

    scipy.optimize.leastsq

  • 7/30/2019 Python for Sciences and Engineering

    61/89

    Sparse matrix exampleConstruct and solve a sparse linear system

  • 7/30/2019 Python for Sciences and Engineering

    62/89

    Sparse matrices

    Sparse matrices are mostly zeros.

    They can be symmetric or

    asymmetric.Sparsity patterns vary:

    block sparse, band matrices, ...

    They can be huge!

    Only non-zeros are stored.

  • 7/30/2019 Python for Sciences and Engineering

    63/89

    Sparse matrices in SciPy

    SciPy supports seven sparse storage schemes

    ... and sparse solvers in Fortran.

  • 7/30/2019 Python for Sciences and Engineering

    64/89

    Sparse matrix creation

    To construct a 1000x1000 lil_matrix and add values:

    >>> from scipy.sparse import lil_matrix>>> from numpy.random import rand>>> from scipy.sparse.linalg import spsolve

    >>> A = lil_matrix((1000, 1000))>>> A[0, :100] = rand(100)>>> A[1, 100:200] = A[0, :100]>>> A.setdiag(rand(1000))

    S l i t i

  • 7/30/2019 Python for Sciences and Engineering

    65/89

    Solving sparse matrix

    systemsNow convert the matrix to CSR format and solve Ax=b:>>> A = A.tocsr()>>> b = rand(1000)>>> x = spsolve(A, b)

    # Convert it to a dense matrix and solve, andcheck that the result is the same:>>> from numpy.linalg import solve, norm>>> x_ = solve(A.todense(), b)# Compute norm of the error:>>> err = norm(x - x_)>>> err < 1e-10True

  • 7/30/2019 Python for Sciences and Engineering

    66/89

    Matplotlib

    Great plotting package in Python

    Matlab-like syntax

    Great rendering: anti-aliasing etc.

    Many backends: Cairo, GTK, Cocoa, PDF

    Flexible output: to EPS, PS, PDF, TIFF, PNG, ...

  • 7/30/2019 Python for Sciences and Engineering

    67/89

    Matplotlib: worked examplesSearch the web for 'Matplotlib gallery'

    E ample N mP

  • 7/30/2019 Python for Sciences and Engineering

    68/89

    Example: NumPy

    vectorization1. Use a Monte Carlo algorithm to

    estimate :

    1. Generate uniform random variates (x,%y) over [0, 1].

    2. Estimate from the proportion p that land in the unit

    circle.

    2. Time two ways of doing this:

    1. Using for loops

    2. Using array operations (vectorized)

  • 7/30/2019 Python for Sciences and Engineering

    69/89

    3. Scaling

  • 7/30/2019 Python for Sciences and Engineering

    70/89

    HPCHigh-performance computing

  • 7/30/2019 Python for Sciences and Engineering

    71/89

    Aspects to HPC

    Supercomputers Distributed clusters / grids

    Parallel programming Scripting

    Caches, shared memory Job control

    Code porting Specialized hardware

  • 7/30/2019 Python for Sciences and Engineering

    72/89

    Python for HPC

    Advantages Disadvantages

    Portability Global interpreter lock

    Easy scripting, glue Less control than C

    Maintainability Native loops are slow

    Profiling to identify hotspots

    Vectorization with NumPy

  • 7/30/2019 Python for Sciences and Engineering

    73/89

    Large data sets

    Useful Python language features:

    Generators, iterators

    Useful packages:

    Great HDF5 support from PyTables!

  • 7/30/2019 Python for Sciences and Engineering

    74/89

    Hierarchical dataDatabases without the relational baggage

  • 7/30/2019 Python for Sciences and Engineering

    75/89

    Great interface for HDF5 dataEfficient support for massive data sets

  • 7/30/2019 Python for Sciences and Engineering

    76/89

    Applications of PyTables

    aeronautics telecommunications

    drug discovery data mining

    financial analysis statistical analysis

    climate prediction etc.

  • 7/30/2019 Python for Sciences and Engineering

    77/89

    Breaking news: June 2011

    PyTables Pro is now being open sourced.

    Indexed searches for speed

    Merging with PyTables

    Working project name: NewPyTables

  • 7/30/2019 Python for Sciences and Engineering

    78/89

    PyTables performance

    OPSI indexing engine speed:

    Querying 10 billion rows can take hundredths of asecond!

    Target use-case:

    mostly read-only or append-only data

  • 7/30/2019 Python for Sciences and Engineering

    79/89

    Principles for efficient code

  • 7/30/2019 Python for Sciences and Engineering

    80/89

    Important principles

    1. "Premature optimization is the root of all evil"

    Don't write cryptic code just to make it more efficient!

    2. 1-5% of the code takes up the vast majority of the

    computing time!

    ... and it might not be the 1-5% that you think!

  • 7/30/2019 Python for Sciences and Engineering

    81/89

    Checklist for efficient code

    From most to least important:

    1. Check: Do you really need to make it more efficient?

    2. Check: Are you using the right algorithms and datastructures?

    3. Check: Are you reusing pre-written libraries wherever

    possible?

    4. Check: Which parts of the code are expensive?

    Measure, don't guess!

  • 7/30/2019 Python for Sciences and Engineering

    82/89

    Relative efficiency gains

    Exponential-order and polynomial-order speedups are

    possible by choosing the right algorithm for a task.

    These require the right data structures!

    These dwarf 10-25x linear-order speedups from:

    using lower-level languages

    using different language constructs.

  • 7/30/2019 Python for Sciences and Engineering

    83/89

    4. About Python Charmers

  • 7/30/2019 Python for Sciences and Engineering

    84/89

    The largest Python training provider in South-East Asia

    Delighted customers include:

  • 7/30/2019 Python for Sciences and Engineering

    85/89

    Most popular course topics

    Python for Programmers 3 days

    Python for Scientists and Engineers 4 days

    Python for Geoscientists 4 days

    Python for Bioinformaticians 4 days

    Python for Financial Engineers 4 days

    Python for IT Security Professionals 3 days

    New courses:

    Python Charmers:

  • 7/30/2019 Python for Sciences and Engineering

    86/89

    Python Charmers:

    Topics of expertisePython: beginners, advanced

    Scientific data processing with Python

    Software engineering with Python

    Large-scale problems: HPC, huge data sets, grids

    Statistics and Monte Carlo problems

    Python Charmers:

  • 7/30/2019 Python for Sciences and Engineering

    87/89

    Python Charmers:

    Topics of expertise (2)Spatial data analysis /GIS

    General scripting, job control, glue

    GUIs with PyQt

    Integrating with other languages: R, C, C++, Fortran, ...

    Web development in Django

  • 7/30/2019 Python for Sciences and Engineering

    88/89

    How to get in touch

    See PythonCharmers.com

    or email us at:

    mailto:[email protected]:[email protected]
  • 7/30/2019 Python for Sciences and Engineering

    89/89