Software Engineering and Process for Scientific Computing: The FLASH Example Anshu Dubey Sept 9,...

Software Engineering and Process for Scientific Computing: The FLASH Example

Anshu Dubey

Sept 9, 2013

AFRD Simulation and Modeling Meeting

Code Architecture : Considerations

2

• Identify logically separable functional units of computation• Encode the logical separation (modularity) into a framework• Separate what is exposed outside the module from what is

private to the module• Define interfaces through which the modules can interact with

each other• Devise control flow – the driver

While these are good principles to start with, they don’t always work out easily. It may become difficult to untangle the data dependencies

or modularity might dictate code replication. This is where design really becomes important.

FLASH Architecture

• Implemented by the Setup Script, which also configures• Links together needed physics and tools for an application

– Parses Config files to• Determine a self consistent set of units to include• If a unit has multiple implementations, finds out which

implementation to include• Get list of parameters from units• Determines solution data storage

– Configures Makefiles properly• For a particular platform• For included Units

– Implements inheritance with unix directory structure– Provides a mechanism for customization

Config file example

Alternate local IO routines

Runtime parameters and documentation

Additional scratch grid variable

Required Units

Enforce geometry or other conditions

Data Management

• Defined constants for globally known quantities• Data ownership by individual units

– Arbitration on data shared by two or more units• Definition of scope for groups of data

– Unit scope data module, one per implementation of the unit– Subunit scope data module, one per implementation of the subunit– All other data modules follow the general FLASH inheritance

• The directory in which the module exists, and all of its subdirectories have access to the data modules

• Other units can access data through available accessor functions• For large scale manipulations of data residing in two or more units,

runtime control transfers back and forth between units– Avoids lateral transfer of large amounts of data– Avoids performance degradation

Unit HierarchyUnitAPI/stubs

UnitMainCommon APIimplementation

UnitSomethingAPI implementation

kernel

kernel

kernel

kernel

Impl_1Remaining API impl Impl_2

Remaining API impl

Impl_3Remaining API impl

Common Impl

Example of a Unit – Grid (simplified)

Grid

GridSolversGridMainGridParticles

UG

Paramesh2 paramesh4

paramesh

PM4_package

UG paramesh

Sieve PttoPt

local API

Why Local API ? Grid_init calls init functions for all subunits, if subunit is not included code won’t build.

PM4dev_package

GridBC

GPMapToMesh GPMove

etc…

Functional Component in Multiple Units• Example Particles

– Position initialization and time integration in Particles unit

– Data movement in Grid unit– Mapping divided between Grid and Particles

• Solve the problem by moving control back and forth between units

Driver

Init

Evolve

ParticlesInit Map Evolve

GridInit Map Move

FLASH Evolution : Version 1

• Goal from the beginning– Make the code public– Use the same code for many different applications

• All target applications were for reactive flows• Diverging camps from the beginning

– Camp 1: Produce a well architected modular code– Camp 2: Yes, but also use it soon for science

• Both goals hard to meet in the near term• Two parallel development paths started

– Not enough resources to sustain both– Camp 2 won out

• First release FLASH1.6

Version 1

• Smashed together from three distinct existing codes– PARAMESH for AMR– Prometheus for Hydro– EOS and nuclear burn from other research codes

• F77 style of programming; Common blocks for data sharing• Inconsistent data structures, divergent coding practices and

no coding standards• Concept of alternative implementations brought in with a script

for plugging different EOS• Beginning of inheriting directory structure• First release FLASH 1.6

Version 2 : Data Inventory

• Centralized database– Common blocks eliminated– All data inventoried– Different types of variables identified

• Testing got formalized– Test-suite version 1– Run on multiple platforms– Policies about monitoring

• Not much else changed in the architecture

Central Database Disadvantages

• Navigating the source tree became more confusing and Config file dependencies became more verbose

• No possibility of data scoping; every data item was equally accessible to every routine in the code

• When parsing a function, one could not tell the source of data• Lateral dependencies were further hidden• Overhead of database querying slowed the code by about 10-

15%• The queries caused huge amount of code replication and

source files became ugly

Version 3 : the Current Architecture

• Kept inheriting directory structure, inheritance and customization mechanisms from earlier versions

• Defined naming conventions – Differentiate between namespace and organizational directories– Differentiate between API and non-API functions in a unit– Prefixes indicating the source and scope of data items

• Formalized the unit architecture– Defined API for each unit with null implementation at the top

level• Resolved data ownership and scope• Resolved lateral dependencies for encapsulation • Introduced subunits and built-in unit test framework

Version 4

• Did not need any change in the architecture• Primarily a capabilities addition exercise• Mesh replication was easily introduced for multigroup radiation• Expanded to other communities such as fluid-structure

interaction because of existing Lagrangian framework and elliptic solver

• Has Chombo as an alternative mesh package, but for hydro only applications

Transition to Version 2

• The bias at the time – keep the scientists in control• Keep the development and production branches synchronized

– Enforced backward compatibility in the interfaces– Precluded needed deep changes– Hugely increased developer effort – High barrier to entry for a new developer

• Did not get adopted for production in the center for more than two years

– Development continued in FLASH1.6, and so had to be brought simultaneously into FLASH2 too.

– Database caused performance hit and IPA could not be done, so slower

Transition to Version 3

• Controlled by the developers• Sufficient time and resources made available to design and

prototype• No attempt at backward compatibility• No attempt to keep development synchronized with production• All focus on a forward looking modular, extensible and

maintainable code

Two very important factors to remember:The scientists had a robust enough production code

The developers had internalized the vagaries of the solvers

The Methodology

• Build the framework in isolation from the production code base• Infrastructure units first implemented with a homegrown Uniform

Grid.– Helped define the API and data ownership

• Unit tests for infrastructure built before any physics was brought over

• Hydro and ideal gas EOS were next with Sod problem• Next was PARAMESH: the Sod problem and the IO implementation

were verified• Test-suite was started on multiple platforms with various

configurations (1/2/3D, UG/PARAMESH, HDF5/PnetCDF)• This took about a year and a half, the framework was very well

tested and robust by this time

The Methodology Continued …

• In the next stage the mature solvers (ones that were unlikely to have incremental changes) were transitioned to the code

– Once a code unit became designated for FLASH3, no users could make a change to that unit in FLASH2 without consulting the code group.

• The next transition was the simplest production application (with minimal amount of physics)

• Scientists were in the loop for verification and in prioritizing the units to be transitioned

• FLASH3 was in production in the Center long before its official 3.0 release

– More trust between developers and scientists– More reliable code; unit tests provided more confidence, and it was

easier to add capabilities

Verification• Codes obviously need to be verified for correctness• There is no such thing as a bug-free code• A code is only as robust as the most rigorous test designed

for it• Devising a good test is at least as important as a good

algorithm design• Multi-component code testing needs

– Unit test to verify a single functionality• May need to be done in more than one way

– Other tests that combine components in many different ways

– Combinations increase non-linearly with code components

What makes a good test-suite• Verifies the code in every possible meaningful configuration

(again impossible to achieve)• In the absence of comprehensive coverage provides a wide

coverage with available resources • Verifies the code on all supported hardware and software

stack• Is able to report on detected errors in easy to interpret ways• Runs regularly and catches bugs introduced into the

code base as early as possible

Maintenance Practices• Repository management

– For every development branch if there is a production schedule there is a corresponding production branch

– Stable revisions of the development branches are tagged and periodically merged to production branch

– Campaigns branch off from the production branch• No forward merges occur on these branches• Backward merges are rare, but they do happen• Usually very limited manual merges of individual files

or directories• It all works only if all participants buy into the practice• Typical pitfall : someone not checking in their work regularly,

their working copy diverges from the repo, updates become a headache

Coding Standard Management

• Code is F90 based, compilers tend to be very tolerant of bad code

• Extremely easy to let non-maintainable code proliferate– Example : you can violate variable scoping by simply

putting in the “use” anywhere, it is valid F90 code– Function prototypes (interfaces in F90) are not

necessary, you can eat arguments and not find out until it has become hard to debug because it is so old

• Set of scripts that run nightly and flag the violations in coding and document standards

• Periodically (most often just before releases) those violations get resolved

Documentation : How much• A well maintainable code is likely to have 25-30% of its

source as inline documentation– More is even better– Not doing that is the surest way of a code component to

become unsupported (and eventually disappear from the code base) once its developer has moved on

– Even otherwise, in a common code it is a requirement that others can read and make sense out of your code

– You might forget why you did what you did• The APIs should be really well documented in terms of their

function, inputs and outputs, the correct range of values for inputs and expected outcome for those values.

– Examples of use are even better

Documentation : How much• If the code is public, other type of documentation becomes

necessary– User’s guide– Online resources– FAQ’s or equivalent

• If the code accepts contributions from external users then even more documentation becomes necessary

– Published coding standards– Coding examples– Developer’s guideFLASH Example

http://flash.uchicago.edu/site/flashcode/user_support/

Interdisciplinary Interactions

Prioritization – whether good long term design or meet short term science

objectives– Both have their place – Initial stages should be driven by science objectives

• Too early for long term software design• Quick and dirty solutions with an eye to learning about code

components and their interplay– Once there is useable code, long term planning and design

should occur• Willingness to make wholesale changes to the code at least

once in necessary• At no stage should one lose sight of science objectives

Interdisciplinary Interactions

Partnership model– Science users who recognize the code as a research

instrument– Even better if they are interested in the code

• Flash early scientists were– Developers and computer scientists interested in a product

and the science being done with the code• Helps to have people with multidisciplinary training

– Comparable resources and autonomy for code group• And recognition of their intellectual contribution to

scientific discovery– Careful balance between long term and short term objectives

Lessons Learned

• Public Releases – every 8-10 months – forces discipline– Brings the code up to coding standards– Reconciles and refreshes the test suite

• Documentation – transient developer population– User support documentation– Extensive inline documentation

• Backward compatibility is overrated• Uncluttered infrastructure is the best• Supporting users is good, letting users drive the capability

addition is even better• Testing the code on multiple platforms is indispensable• Allowing branches to diverge is a really bad idea

Application Codes Now

Many successful codes provide an infrastructure backbone into which solvers plug in◦ Mesh, IO, runtime etc

Balancing act between performance and portability◦ now a new concern : survival

◦ Reducing the size of code : very limited option

◦ tunable parameters : re-factor the codes – but how ? Software process applied to codes – decade and a half ago◦ everybody went their own way, but arrived at remarkably similar

solution

Is there a lesson in it for the abstractions in the code infrastructures ?

Architecting for Future

• Requirements– Maintainable code, support large user community– Reliable results within quantified limits– Retain code portability and performance– Measurable and predictable performance

• The challenges in meeting the requirements; tension between

– Modularity and performance– Readable/maintainable code and portability– Easy adaptability to new and heterogeneous architectures and

complex multiphysics capabilities– Regression test based verification and tolerance for non

reproducibility

One Possibility: Foothold for Abstractions

• Separation of concerns– Codes have different types of complexities

• Physical model, and its numerical algorithms• Implementation – data structures and therefore memory

access patterns• Parallelism

• Expose parallelism opportunities– Spatial– Operational

• Hardware oblivious solver

Expose Parallelism : Spatial

Mapping to Programming Abstractions

Need programming languages with richer collectionof data structures and high level constructs that allow expression of computations with much less detail

numerical complexity

micro-block computation

memoryaccess complexity

code transformation

dynamic scheduling

parallelcomplexity

hardwareoblivioussolver

write solvers in the form of interdependent tasksregister dependencies with the abstraction layerexpose data/operation fusion possibilities

Some useful links

• http://flash.uchicago.edu/site/flashcode• http://flash.uchicago.edu/site/flashcode/user_support/• http://flash.uchicago.edu/site/publications/flash_pubs.shtml• http://flash.uchicago.edu/site/testsuite/home.py

http://flash.uchicago.edu/site/testsuite/home.py

Backup Slides

Further Insights

• Supporting multiple set of projects from different branches is more recent at FLASH

• A hierarchy of project and production branches• A stringent merge and test schedule is important• How we did it :

– Turned one of the branches into main development branch– Turned trunk into the merge area– Enforced a merge schedule– Enforced a policy of prioritizing the fixing of whatever

broke in the merge.

The Present State• The code is large, extensible and well architected• Just about managing to run well on some of the current

architectures – Mira– Homogeneous architecture– Sufficient memory per core– Hybrid MPI-OpenMP parallelism

• Threading at the solvers level – The maintainable format by threading on blocks– The not so easy to maintain but better performing -

threading the nested loops• The code as is will not be able to effectively use Titan

and quite possibly mic architectures.

Separate Complexity: Example• At present we separate unit complexity from parallel complexity (most

good codes do)• Unit explicitly pulls the data it needs

– we get a block, cell coordinates and other relevant grid meta data explicitly

– At the wrapper layer we separate some infrastructural complexity from the numerics, but not all

• Solver has to make receiving data structures conform to the mesh -> has to know them

• Because of data structures memory access patterns are deeply intertwined with the numerics

• Getting the performance implies second guessing the compiler• Solver should ideally be written without explicit knowledge of data

structures, loop-bounds and nesting– Data structures as desired by the solver– Possibly the solver written as a stencil

• Deepen the wrapper layer to assemble the data structure

Expose Parallelism : FunctionalFLASH Hydro

• update halo • apply equation of state to halo• get Riemann state• compute face fluxes• conserve fluxes• update• apply equation of state

In get Riemann state– normal state reconstruction using the characteristics– transverse flux construction– correct states

• Lots of field variables and meta-data

Expose functional parallelism

• Rewrite solvers as a collection of somewhat independent operations

– Define dependencies in the solvers (operation and data)

• Apply operator fusing at build (code transformations, pre-processing or some combination)

• Make it possible to operate on micro-tiles/blocks : stencil based approaches are the extreme cases

• Data fusing at run and/or compile time The abstraction layers should do appropriate fusions and code transformations and use dynamic runtime management to orchestrate the computation for performance

Simple setup hostname:Flash3> ./setup MySimulation -autosetup script will automatically generate the object directory based on the MySimulation problem you specify

INCLUDE Driver/DriverMain/TimeDepINCLUDE Grid/GridMain/paramesh/Paramesh3/PM3_package/headersINCLUDE Grid/GridMain/paramesh/Paramesh3/PM3_package/mpi_sourceINCLUDE Grid/GridMain/paramesh/Paramesh3/PM3_package/sourceINCLUDE Grid/localAPIINCLUDE IO/IOMain/hdf5/serial/PMINCLUDE PhysicalConstants/PhysicalConstantsMainINCLUDE RuntimeParameters/RuntimeParametersMainINCLUDE Simulation/SimulationMain/SedovINCLUDE flashUtilities/generalINCLUDE physics/Eos/EosMain/GammaINCLUDE physics/Hydro/HydroMain/split/PPM/PPMKernelINCLUDE physics/Hydro/HydroMain/utilities

Sample Units File

FLASH Example : Makefile• Each supported site has a specific Makefile.h

– Variable defined for library locations– Variables for compiler being used– Flags for using in “debug”, “test” or “opt” mode– Other necessary flags

• Every directory can have a makefile snippet– Exploits the recursively expanded variables– Makes sure to include the source files defined at that

level unless they are inherited– Specified local dependencies

• The file snippets are consolidated into Makefile.Unit for every unit

• The Makefile.h and Makefile.Unit are “included” in the generated Makefile

41

Code Architecture : Important Questions• What are the essential data structures

– State data, meta data and scratch data • What are the different ways in which the data structures are

manipulated– Solver operations, housekeeping, being moved around

• How do various data structures interact with each other– What metadata needed to correctly change state data– How much scratch space is needed, where can it be reused– What are the data dependencies

• Where are the firewalls between who can use what data and how– Which part of the data can be modified by which solver– Which variables can only be modified by global state change– How should the data be scoped

42

FLASH Example

• Requirements for infrastructure support:– AMR, and also preferably Uniform Grid– Input runtime parameters– IO– Support for multiple species, physical constants etc

• Physics requirements– Shock hydrodynamics /MHD– Nuclear networks– Equation of state and other material properties– Time-stepping – Lagrangian particles

Example of Unit Design• Non trivial to design several of the physics units in ways that meet

modularity and performance constraints.• Eos (equation of state) unit is a good example

– Individual mesh points are independent of each other– There are several reusable calculations– Other physics units demand great flexibility from it

• single grid point• only the interior cells, or only the ghost cells• a row at a time, a column at a time or the entire block at once• different grid data structures, and different modes at different times

– Implementations range from simple ideal gas law to table look up and iterations for degenerate matter and plasma, with widely differing relative contribution in the overall execution time

– Relative values of overall energy and internal energy play role in accuracy of results

– Sometimes several derivative quantities are desired as output

EOS interface Design• Hierarchy in complexity of interfaces

– For single point calculation scalar input and output– For sections of a block or full block vectorized input and

output• wrappers to vectorize and configure the data• returning derivative quantities if desired

• Different levels in the hierarchy give different degrees of control to the client routines

– Most of the complexity is completely hidden from casual users

– More sophisticated users can bypass the wrappers for greater control

• Done with elaborate machinery of masks and defined constants

Coding Standards• Absolutely essential for code maintainability

– Consistent code is easier to maintain– Someone other than the developer can inspect and

make sense out of the code segment– Data structures remain more consistent

• Should always include documenting standards also– Critical when there is transient population of developers– Someone else can understand and maintain your code– Easier for users to customize and even contribute code

• Typically involve – Naming conventions– Inheritance and Code organization

FLASH Example: The Tests Collection

Maintenance Practices• Repository management

– Should you have a gatekeeper– How far do you allow the branches to diverge– How much access control do you apply

• Verification management– Monitoring the regression tests– Prioritization of efforts : how long do you let a failing test

go on failing• Coding Standards management

– How do you verify that the new code adheres to coding and documentation standards

• Documentation – What fraction of developer time reasonable

Variety of User Expertise• Novice users – execute one of included applications

– change only the runtime parameters• Most users – generate new problems, analyze

– Generate new Simulations with initial conditions, parameters

– Write alternate API routines for specialized output• Advanced users – Customize existing routines

– Add small amounts of new code where their application resides

• Expert – new research– Completely new algorithms and/or capabilities– Can contribute to core functionality

Code Repositories

• Centralized Version Control– CVS the first one to be heavily deployed– Subversion the most commonly used

• Distributed Version Control– Most popular ones are Git and Mercurial– Synchronization through exchange of patches– One can maintain multiple local branches– Makes for a much easier co-existence of production and development

– Gate keeping can become challenging

50

Subversion: SVN• Central Repository system.

– There is one master version of the state of the code• Users have “check outs” or “working copy” of the master

repository• Can access the master repository via several mechanisms

– rsh connection– ssh connection– svnserver – All user interaction is considered a client-side operation– Transactional protocol

51

Software Process Components• For All Codes

– Code Repository– Build Process– Code Architecture– Coding Standards– Verification Process– Maintenance Practices

• If Publicly Distributed code– Distribution Policies– Contribution Policies– Attribution Policies

52

Build Process

53

• Multiple files, individual file compilation does not scale beyond a point

• If the code runs on many different platforms then each software stack will have its own peculiarities

• The code may want to use available libraries, getting them all built consistently may be challenging

• For all of these reasons it is worth investing in a managed build process

• Usually a combination of configuration and make• Autoconf, perl scripts, python for configuration• GNU Make for compilation

Software Engineering and Process for Scientific Computing: The FLASH Example Anshu Dubey Sept 9,...

Documents

Transcript of Software Engineering and Process for Scientific Computing: The FLASH Example Anshu Dubey Sept 9,...