A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA...
Transcript of A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA...
![Page 1: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/1.jpg)
A Short History of Array Computingin PythonWolf Vollprecht, PyParis 2018
![Page 2: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/2.jpg)
TOC - Array computing in general- History up to NumPy- Libraries “after” NumPy
- Pure Python libraries- JIT / AOT compilers- Deep Learning
- NumPy extension proposal
![Page 3: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/3.jpg)
Arrays
- Used practically in all scientific domains- Physics, Controls, Biological System, Big Data, Deep
Learning, Autonomous Cars …
![Page 4: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/4.jpg)
Array computing
Generalize operations on scalars to … Arrays
C ← A + B
![Page 5: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/5.jpg)
What is an n-dimensional Array?
- memory region (buffer)- dimension- shape- Often strides
Layout Row Major (C) 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10 11Shape 3, 4 4 5 6 7
Strides 4, 1 8 9 10 11
Layout Col Major (F) 0 1 2 3 0 4 8 1 5 9 2 6 10 3 7 11Shape 3, 4 4 5 6 7
Strides 1, 3 8 9 10 11
4 el’s
1 el
![Page 6: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/6.jpg)
1957 / 1977 Fortran 77
- One of the oldest languages for scientific computing- Still a reference in benchmarks- Original implementation of BLAS & LAPACK in Fortran- Maximum of 7 dimensions
![Page 7: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/7.jpg)
1966 APL: Honorable Mention
- Seriously dense language
→ Try it online: https://tryapl.org/
![Page 8: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/8.jpg)
1987 Matlab
- Proprietary software from Mathworks- Dynamic interface to Fortran- Pioneered interactive
computing + visualization
![Page 9: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/9.jpg)
1995 Numeric
- Python numerical computing package- Inspired additions to Python (indexing syntax)
![Page 10: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/10.jpg)
~2003 NumArray
- More flexible than Numeric- Slower for small arrays, better for large arrays- Split in the community:
- SciPy remained on Numeric...
![Page 11: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/11.jpg)
2006: NumPy
- “Merge” of Numeric and NumArray- Fast & flexible array computing in Python- Typed memory block- Notion of broadcasting- Vector Loops in C
![Page 12: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/12.jpg)
NumPy Broadcasting
- Broadcasting: what to do when dimensions don’t match up?
![Page 13: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/13.jpg)
NumPy ufunc
- Function that has specified input/output- np.sin:
- nin = 1, nout = 1- signature: f -> f, d -> d...
- np.add:- nin = 2, nout = 1- signature: ff -> f, dd -> d...
![Page 14: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/14.jpg)
NumPy as a Standard
- Computing needs have shifted- More specialized data containers needed- Parallelization, speed, GPU, data size …
NumPy interface de-facto standard!
![Page 15: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/15.jpg)
2007 numexpr
- Avoid temporaries- R = A + B + C
-> T1 = B + C-> T2 = A + T1-> R = T2
- Evaluate in chunks
![Page 16: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/16.jpg)
2007 numexpr
![Page 17: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/17.jpg)
2014 Dask
- Distributed array computing- Can handle large data- Execution of function
distributed
![Page 18: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/18.jpg)
2014 Dask
![Page 19: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/19.jpg)
2017 pydata/sparse
- Support for sparse ndarrays- Advantages
- Higher data compression- Faster computation
- Reuses scipy.sparse (but nD!)
![Page 20: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/20.jpg)
2017 pydata/sparse
- Store data in COO (coordinate list) model
![Page 21: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/21.jpg)
GPUs for computation
- Massively parallel- Great for large data- Cost of memory transfer from CPU → GPU- Other programming model
![Page 22: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/22.jpg)
2015 CuPy
- CUDA-aware NumPy implementation- Part of the Chainer DL framework
![Page 23: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/23.jpg)
2017 xnd
3 libraries:
- ndtypes: shape, type & memory- gumath: dispatch math functions on memory container- xnd: python bridge for typed memory
![Page 24: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/24.jpg)
JIT & AOT compilers
- Just in Time compilation for numeric code- Can give incredible speed ups
![Page 25: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/25.jpg)
2012 Pythran
- A Python/NumPy to C++ AOT compiler- Supports high level optimizations in Python- C++ implementation of NumPy with expression
templates- Cython integration
(Don’t miss the talk by Serge later today!)
![Page 26: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/26.jpg)
2012 Pythran
![Page 27: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/27.jpg)
2012 Numba
- A Python to LLVM JIT- Takes Python and compiles it to Machine Code- GPU support (Cuda + AMD)- For High Performance: need to write explicit “for” loops
![Page 28: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/28.jpg)
2012 Numba
![Page 29: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/29.jpg)
Numba + ufunc
![Page 30: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/30.jpg)
Numba + GPU
![Page 31: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/31.jpg)
The AI winter is over …
- Deep learning revolution- Python ecosystem benefits heavily- Lot’s of array computing
![Page 32: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/32.jpg)
Computation Graph
a = b = inputc = a + bd = b + 1e = c * d
![Page 33: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/33.jpg)
Computation Graph
- Abstraction of computation- Benefit: allows automatic differentiation- Optimization opportunities
- Common Subexpression Elimination- Algebraic simplifications: (y * x) / y → (x)- Constant folding (2 * 3 + a) → (6 + a)- Fuse ops
![Page 34: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/34.jpg)
2007 Theano
- One of the first “Deep Learning” libraries- Works on a computation graph- Lazy evaluation- Compiles kernels to C & CUDA
![Page 35: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/35.jpg)
2015 TensorFlow
- Big library from Google- Killed many others (including Theano)- Same principle as Theano- At the beginning: no compilation stage
![Page 36: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/36.jpg)
2015 TensorFlow
![Page 37: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/37.jpg)
2015 TensorFlow + XLA
- An experimental compiler for TensorFlow graphs- JIT + AOT modes- Uses LLVM under the hood
![Page 38: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/38.jpg)
2016 PyTorch
- Deep Learning Framework from Facebook- Computation Graph, but dynamic (no deferred graph
model)- Easier to have control flow
![Page 39: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/39.jpg)
PyTorch JIT & TorchScript
- Subset of Python that can be compiled- Generates CUDA & CPU code
![Page 40: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/40.jpg)
Conclusion
- NumPy is the best … API- Many NumPy implementations- Many downstream projects
- Pandas- xarray- scikit-..., scipy
![Page 41: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/41.jpg)
The array extension proposal
- 6 months ago started by M. Rocklin- Problem: it’s hard to write generic code- Already extension points: __array__, __array_ufunc__
![Page 42: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/42.jpg)
The array extension proposal
- E.g. CuPy input → CuPy output desired- Arguments allowed to overload __array_function__
NEP 18 numpy.org/neps/nep-0018-array-function-protocol.html
![Page 43: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/43.jpg)
Trends
● Ecosystem has become much richer in the past years● More compilation● More specialized NumPy implementations● __array_function__ will make it easy to write
implementation independent code
![Page 44: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/44.jpg)
Thanks
● Questions?
Check out xtensor & xtensor-python
NumPy for C++ ;)
Follow me on Twitter @wuoulf or GitHub @wolfv
![Page 45: A Short History of Array Computing in Pythonpyparis.org/static/slides/Wolf+Vollprecht-0cc86b66.pdfA Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array](https://reader034.fdocuments.us/reader034/viewer/2022050409/5f8664e0da834d2fa6418542/html5/thumbnails/45.jpg)
NumPy ufunc
- Automatic broadcasting- ufunc supports:
- __call__- reduce- reduceat- accumulate- outer- inner