PARALLEL PROCESSING IN PYTHON - COSMOS
Transcript of PARALLEL PROCESSING IN PYTHON - COSMOS
PARALLEL PROCESSING IN PYTHON
COSMOS - 1/28/2020
BY JOSEPH KREADY
LAYOUT
What is Parallel
Processing
History of Parallel
Computation
Parallel Processing and Python
Google Colab Example
SERIAL PROCESSING VS. PARALLEL PROCESSING
Serial Processing: One object at a time
Parallel Processing: Multiple objects at a time
CPU ARCHITECTURE
Serial Computation
FREQUENCY SCALING
The method used for improving computer performance from 1980s –2000s
Frequency Scaling Equation: Power consumption = capacitance * voltage^2 * frequency
The increase to power consumption led to the demise of frequency scaling
SUPERSCALAR ARCHITECTURE
Executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor.
MULTI-CORE PROCESSORS
Each CPU is made of independent ‘Cores’ that can access the same memory concurrently
Moore’s Law: The number of cores per processor doubles every 18-24 months
Operating systems ensure programs run on available cores, but developers must design their programs to take advantage of parallel processing.
PROCESS VS. THREAD VS.MULTI-THREADING VS.HYPER-THREADING
¡ Processes are made of threads
¡ Threads of the same processes share memory.
¡ Processes run in separate memory.
¡ Hyper-threading: allows scheduling of 2 processes on 1 CPU core
¡ Multiple Instructions operate on separate data in parallel
THE GLOBAL INTERPRETER LOCK!!!¡ Global Interpreter lock (GIL): All Python processes must
go through the GIL to execute; thus threads execute 1 at a time¡ It is faster in the single-threaded case.¡ It is faster in the multi-threaded case for i/o bound
programs. (they are not GIL locked)¡ It is faster in the multi-threaded case for CPU-bound
programs that do their compute-intensive work in C libraries, ie Numpy.
¡ GIL only becomes a problem when doing CPU-intensive work in pure Python.
¡ Not all versions of Python use a GIL: Jython, IronPython, PyPy-STM
YOUR CHOICES
¡ Threads act as sub-tasks of a single process
¡ Threads share the same memory space
¡ Great for background tasks / waiting for asynchronous functions
¡ Can lead to conflicts (Race Conditions), when writing to the same memory location at the same time
¡ Separate processes act as individual jobs
¡ Processes run in their own memory space
¡ Great for complex calculations / running multiple instances of a whole project
¡ Higher overhead, but more secure
MULTI-THREADING MULTI-PROCESSING
WHY DON’T WE JUST USE TREADS?
Problem
¡ Race Conditions: Multiple threads reading and writing to the same object Will cause unexpected results
¡ The operating system handles the processing of threads dynamically. There’s no way to ensure the compute order.
Solution
¡ Synchronization using Lock: You can define one part of your function ‘with thread.lock:’ which requires 1 thread processing at a time.
¡ This can lead to Deadlocks: where the lock is not released properly, or you call sub functions that are already locked in another thread
WHY DON’T WE JUST USE PROCESSES? Problem
¡ Serialization using Pickles: converting python objects to byte streams
¡ Individual processes run in separate memory spaces and need a way to communicate. This is done through Pickles.
¡ Certain limitations, like limited function arguments and no class supported with pickling. Therefore, you must design your functions with pickling in mind.
Solution
¡ Serialization using Dill
¡ Dill extends Pickles, allowing to send arbitrary classes and functions as byte streams. Dill can Pickle all the python objects!
¡ Pathos.Multiprocessing library is a fork of python’s multi-processing that uses Dills instead of Pickles.
AMDAHL’S LAW¡ The small part of a program that
cannot be parallelized will limit the overall speedup
¡ S-latency is the potential speedup in latency of the execution of the whole task;
¡ s is the speedup in latency of the execution of the parallelizable part of the task;
¡ p is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization.
https://colab.research.google.com/drive/1TkjjiIrzq5wE1BF2DbOAqTKhmRgzSYVh
Works Cited
“An Introduction to Parallel Programming Using Python's Multiprocessing Module.” Dr.
Sebastian Raschka, 20 June 2014,
sebastianraschka.com/Articles/2014_multiprocessing.html. Accessed 31 Jan. 2020.
“Dill.” PyPI, pypi.org/project/dill/. Accessed 31 Jan. 2020.
FomiteFomite 2, et al. “Why Was Python Written with the GIL?” Software Engineering Stack
Exchange, 1 Feb. 1963, softwareengineering.stackexchange.com/questions/186889/why-
was-python-written-with-the-gil. Accessed 31 Jan. 2020.
“Has the Python GIL Been Slain?” By, hackernoon.com/has-the-python-gil-been-slain-
9440d28fa93d. Accessed 31 Jan. 2020.
“Hyper-Threading.” Wikipedia, Wikimedia Foundation, 19 Jan. 2020,
en.wikipedia.org/wiki/Hyper-threading. Accessed 31 Jan. 2020.
“Multithreading (Computer Architecture).” Wikipedia, Wikimedia Foundation, 2 Jan. 2020,
en.wikipedia.org/wiki/Multithreading_(computer_architecture). Accessed 31 Jan. 2020.
“Parallel Computing.” Wikipedia, Wikimedia Foundation, 26 Dec. 2019,
en.wikipedia.org/wiki/Parallel_computing. Accessed 31 Jan. 2020.
“Pickle - Python Object Serialization¶.” Pickle - Python Object Serialization - Python 3.8.1
Documentation, docs.python.org/3/library/pickle.html. Accessed 31 Jan. 2020.
Real Python. “An Intro to Threading in Python.” Real Python, Real Python, 25 May 2019,
realpython.com/intro-to-python-threading/. Accessed 31 Jan. 2020.
Rocklin, Matthew. “Parallelism and Serialization How Poor Pickling Breaks Multiprocessing.”
Parallelism and Serialization, matthewrocklin.com/blog/work/2013/12/05/Parallelism-
and-Serialization. Accessed 31 Jan. 2020.
“Superscalar Processor.” Wikipedia, Wikimedia Foundation, 7 May 2019,
en.wikipedia.org/wiki/Superscalar_processor. Accessed 31 Jan. 2020.
REFERENCES