Creating Profiling Tools to Analyze and Optimize FiPy Poster

1
Creating Profiling Tools to Analyze and Optimize FiPy Mira L. Holford Abstract Many physical processes that occur in materials science are governed by partial differential equations (PDEs) including microstructure evolution, electrodeposition and phase boundary motion as well as many more. Solving the governing equations with appropriate numerical methods is a vital part of understanding these processes. FiPy provides a high level interface for posing and solving PDEs (typical to materials science) with appropriate numerical methods. It is an open-source, Python based code that both uses and integrates well with the existing scientific Python code stack. Although FiPy is a well established tool, it still requires improvements in both runtime and memory performance as the emphases during development has tended to focus on usage and functionality rather than efficiency. In particular, FiPy's memory usage limits its capability to handle large simulations. This presentation describes efforts to measure FiPy's performance and identify bottlenecks and then subsequently improve FiPy by optimizing these bottlenecks. An important part of this effort is creating general tools for time and memory profiling that integrate well with FiPy. In particular, the developed tools gather, compile and cache profiling data for many simulations as well as producing graphs that show performance scaling against simulation size. Advisors: Jonathan E. Guyer and Daniel Wheeler Introduction FiPy is written in Python which provides a powerful high level language for posing and solving scientific problems, but can suffer from inefficiencies common in Python-based numerical computation. These inefficiencies are often caused by the frequency of calling the underlying C libraries which Python is built upon. In this work we develop profiling tools to try and measure how these inefficiencies scale with system size and identify potential regions of code that are highly inefficient in terms of both time and memory. Profiling Tools Acknowledgements Many thanks to my mentors, Dr. Jonathan Guyer and Dr. Daniel Wheeler, as well as Danya Murali and the SHIP Staff and Peers. This work uses Python's standard cProfile module to profile speed and the non-standard memory_profiler module to profile memory. The memory_profiler module enables line by line memory profiling for a single function, while the cProfile module profiles every part of the code. Obtaining consistent reproducible results can be extremely challenging for both speed and memory profiling. These challenges include consistent memory deallocation, system resource competition and conflation of simulation and profiler resource usage. Two extra classes automate the generation and plotting of profiling data for changing system size. The speed profiling class uses the cProfile module to generate and save data for multiple system sizes. It uses the inspect module to identify a key for a given function pointer enabling the plotting of scaling data for multiple functions. In a similar way, the memory profiling class generates and saves data for multiple system sizes, but auto decorates (identifies to the memory profiler) a given function pointer for profiling. The class then executes each simulation in a subprocess to enable consistent and accurate memory consumption results. Code class FiPyProfileTime(FiPyProfile): def __init__(self, runfunc, ncells, regenerate=False): ... def profile(self, ncell): ... def get_time_for_function(self, function_key): return stats[function_key] def get_key_from_function_pointer(function_pointer): return inspect.getfile(function_pointer) def plot(self, keys, field="cumulative"): ... class MemoryProfiler(object): def __init__(self, profileMethod, runfunc): ... def decorate(self, func): def wrapper(*args, **kwargs): ... return self.codeMap def getLineMemory(self, line): ... class MemoryViewer(object): def generateData(self): def worker(ncell, resultQ, profileMethod, runfunc, lines): process = multiprocessing.Process(...) FiPy Examples: Polycrystal and Extreme Fill This work uses two case studies to assess FiPy's time and memory consumption. The first case study models an evolving microstructure consisting of multiple solidifying crystals (polycrystal). The second case study models through-hole via deposition of copper (extreme fill), a process often employed in the fabrication of electronic components. Turbine wing and Polycrystals Copper deposits Memory Speed Extreme fill How does FiPy Work? The extreme fill example has very similar memory consumption to the polycrystal example, but runs about 10 times slower. Furthermore, the total run time is dominated by the solver, although using “inline” (small kernels of C code to replace Numpy) does reduce run times by 20%. The extreme fill example exhibits the same 500 float per cell memory problem as the polycrystal example. FiPy can solve a general PDE with the following form Polycrystal The profiling of the polycrystal example reveals that using the Gmsh meshing tool (to replace FiPy's built in grid classes), decreases run time, but also increases memory consumption. At these system sizes, both run time and memory consumption scale like N. Due to the complicated diffusion coefficients in the polycrystal example and the symmetric matrices, the solver takes only 10% of the total run time. However, each cell takes up a memory comparable to 500 floats, which indicates a significant memory problem. FiPy breaks down the physical domain into separate cells as part of a mesh. Each cell within the mesh holds a numerical approximation to the solution field. The PDE is discretized using the mesh and then linearized into a matrix representation. Each row in the matrix represents one cell and, thus, the size of the matrix scales with the number of cells. Speed Memory

Transcript of Creating Profiling Tools to Analyze and Optimize FiPy Poster

Page 1: Creating Profiling Tools to Analyze and Optimize FiPy Poster

Creating Profiling Tools to Analyze and Optimize FiPyMira L. Holford

AbstractMany physical processes that occur in materials science are governed by partial differential equations (PDEs) including microstructure evolution, electrodeposition and phase boundary motion as well as many more. Solving the governing equations with appropriate numerical methods is a vital part of understanding these processes. FiPy provides a high level interface for posing and solving PDEs (typical to materials science) with appropriate numerical methods. It is an open-source, Python based code that both uses and integrates well with the existing scientific Python code stack. Although FiPy is a well established tool, it still requires improvements in both runtime and memory performance as the emphases during development has tended to focus on usage and functionality rather than efficiency. In particular, FiPy's memory usage limits its capability to handle large simulations. This presentation describes efforts to measure FiPy's performance and identify bottlenecks and then subsequently improve FiPy by optimizing these bottlenecks. An important part of this effort is creating general tools for time and memory profiling that integrate well with FiPy. In particular, the developed tools gather, compile and cache profiling data for many simulations as well as producing graphs that show performance scaling against simulation size.

Advisors: Jonathan E. Guyer and Daniel Wheeler

IntroductionFiPy is written in Python which provides a powerful high level language for posing and solving scientific problems, but can suffer from inefficiencies common in Python-based numerical computation. These inefficiencies are often caused by the frequency of calling the underlying C libraries which Python is built upon. In this work we develop profiling tools to try and measure how these inefficiencies scale with system size and identify potential regions of code that are highly inefficient in terms of both time and memory.

Profiling Tools

Acknowledgements Many thanks to my mentors, Dr. Jonathan Guyer and Dr. Daniel Wheeler, as well as Danya Murali and the SHIP Staff and Peers.

This work uses Python's standard cProfile module to profile speed and the non-standard memory_profiler module to profile memory. The memory_profiler module enables line by line memory profiling for a single function, while the cProfile module profiles every part of the code. Obtaining consistent reproducible results can be extremely challenging for both speed and memory profiling. These challenges include consistent memory deallocation, system resource competition and conflation of simulation and profiler resource usage.

Two extra classes automate the generation and plotting of profiling data for changing system size. The speed profiling class uses the cProfile module to generate and save data for multiple system sizes. It uses the inspect module to identify a key for a given function pointer enabling the plotting of scaling data for multiple functions.

In a similar way, the memory profiling class generates and saves data for multiple system sizes, but auto decorates (identifies to the memory profiler) a given function pointer for profiling. The class then executes each simulation in a subprocess to enable consistent and accurate memory consumption results.

Codeclass FiPyProfileTime(FiPyProfile): def __init__(self, runfunc, ncells, regenerate=False): ...

def profile(self, ncell): ... def get_time_for_function(self, function_key): return stats[function_key] def get_key_from_function_pointer(function_pointer): return inspect.getfile(function_pointer) def plot(self, keys, field="cumulative"): ...

class MemoryProfiler(object): def __init__(self, profileMethod, runfunc): ...

def decorate(self, func): def wrapper(*args, **kwargs): ... return self.codeMap

def getLineMemory(self, line): ...

class MemoryViewer(object):

def generateData(self): def worker(ncell, resultQ, profileMethod, runfunc, lines): process = multiprocessing.Process(...)

FiPy Examples: Polycrystal and Extreme FillThis work uses two case studies to assess FiPy's time and memory consumption. The first case study models an evolving microstructure consisting of multiple solidifying crystals (polycrystal). The second case study models through-hole via deposition of copper (extreme fill), a process often employed in the fabrication of electronic components.

Turbine wing and Polycrystals Copper deposits

MemorySpeedExtreme fill

How does FiPy Work?

The extreme fill example has very similar memory consumption to the polycrystal example, but runs about 10 times slower. Furthermore, the total run time is dominated by the solver, although using “inline” (small kernels of C code to replace Numpy) does reduce run times by 20%. The extreme fill example exhibits the same 500 float per cell memory problem as the polycrystal example.

FiPy can solve a general PDE with the following form

Polycrystal

The profiling of the polycrystal example reveals that using the Gmsh meshing tool (to replace FiPy's built in grid classes), decreases run time, but also increases memory consumption. At these system sizes, both run time and memory consumption scale like N. Due to the complicated diffusion coefficients in the polycrystal example and the symmetric matrices, the solver takes only 10% of the total run time. However, each cell takes up a memory comparable to 500 floats, which indicates a significant memory problem.

FiPy breaks down the physical domain into separate cells as part of a mesh. Each cell within the mesh holds a numerical approximation to the solution field. The PDE is discretized using the mesh and then linearized into a matrix representation. Each row in the matrix represents one cell and, thus, the size of the matrix scales with the number of cells.

Speed Memory