Perspectives on Scientific Software Recovery and ...
Transcript of Perspectives on Scientific Software Recovery and ...
Perspectives on Scientific Software Recoveryand Revitalization.
Bob ApthorpeMarch 30, 2018
Acorvid Technical Services Corporation
Overview.
Characteristics of Scientific Software
Properties of scientific software include
• Numerically sophisticated• Developed principally or exclusively by subject matterexperts (SMEs)
• Favors accuracy ≥ efficiency > elegance• Long lifespan
1
Common Themes in Older Scientific Software Projects
Many mature scientific software projects have a commonlifecycle:
• Often for internal use only• Resources allocated for development, not maintenance• May be maintained by one person or a small group• Software can outlive its creator• Maintenance may fall to junior personnel unfamiliar withthe code
Search Stack Overflow for the phrase ``I recently inherited aFORTRAN application…''
2
Software Development Lifecycle
Source: https://www.gethow.org/importance-of-software-lifecycle-management 3
Reasons for Software Recovery
Inactive or retired software may need to be put back in service
• Legal or regulatory requirements• Confirmation of prior results• Renewed interest in subject matter
Examples include:
• Recovering fast reactor safety analysis codes to supportGenIV reactor development (SPRAY, SOFIRE-II)
• Extending support code which generated aerosol particlesize distributions for severe reactor accident code
• Needed chemical equilibrium analysis code built andtested to ASME NQA-1 standard (``nuclear grade'')
4
Software Recovery Issues
Software recovery involves issues similar to those with legacycode:
• Original or cognizant developers may not be available• Source code may not be machine-readable• Limited test cases and documentation• Original development environment may not exist
• Different hardware• Different operating system• Language differences; proprietary extensions• Compilation flags• Missing dependencies (libraries)
5
Pragmatic Questions
When encountering new software, basic questions are:
• What does this do?• How do I know it works?• How do I make it work?• How does it work?• How do I change it without breaking it?
Relevant to a new user, developer, or manager
6
Case Study: Analysis of Flow in PipeNetworks.
Case Study: Analysis of Flow in Pipe Networks
7
Background
• Question on Stack Overflow about a missing libraryneeded by a pipe network flow solver
• Source code published in [Jep74] and [Jep76] but not inmachine-readable format
• Code was in fair to poor condition; missing proprietarylibrary
• Short, tractable problem ideal for illustrating coderecovery techniques
• As of March 2018, Google Scholar shows at least 330citations to [Jep76]
• Code has historical significance [Orm08]
8
Problem Domain: Flow in Pipe Networks
9
Basic Theory i
Bernoulli's equation:
z1 +p1ρg +
V212g +
E1g = z2 +
p2ρg +
V222g +
E2g
Newton's Law of Viscosity:
τ = µdvdy
Darcy-Weisbach equation:
∆p = fDLDH
ρ⟨v⟩22
10
Basic Theory ii
11
Basic Theory iii
Conserve mass and energy
Assumptions:
• constant material properties• isothermal• incompressible flow• continuity; Kirchoff's laws• momentum can be ignored
Reasonable approximation of steady-state single-phaseincompressible flow
Suitable for design and analysis of water distribution systems
12
Empirical Correlations for Friction Factor
13
Known Challenges
14
Known Challenges
• Code was originally written for UNIVAC 1108;• Uses matrix solver GJR from proprietary Sperry MATH-PACKlibrary
• May use proprietary extensions to FORTRAN IV
• OCR'd source code will be full of scanning artifacts• Illegible source code in [Jep74]; [Jep76] is legible however…• No build instructions• Few test cases
15
Source Code Legibility
16
Proprietary Libraries and Extensions
17
Source Code Errors
18
Ambiguous Code
19
Test Errors
20
Development Plan.
Recovery Phases
Refactoring effort split into three distinct phases:
• Resuscitate: Make code run correctly• Update: Replace problematic coding constructs withmodern equivalents
• Modularize: Extract common elements to modules sharedamong applications
21
Resuscitate: Goals
Primary Goal: Make code run correctly
• Convert OCR'd text to well-formed source code• Set liberal compile and link options• Set up test cases with sample input and expected putout• Create build scripts• Put project under revision control• Set up auto-documentation system (doxygen)• Note sections of code which may be easy or difficult tomodernize in next phase
22
Resuscitate: Deliverables
Large number of artifacts created in this phase:
• `working' executable code• Well-formed, managed source code
• Original code• Replacement for GJR from Sperry MATH-PACK
• Build scripts• Test cases• Documentation
• Build• Test• Code
Deliverables should be correct but may not be complete or`perfect'.Goal is a Minimum Viable Product
23
Update: Goals i
Primary Goal: Replace problematic coding constructs withmodern equivalents
• Convert code to indented free-format• Replace DO/CONTINUE with DO/ENDDO• Replace IF/GOTO with IF/ELSEIF/ELSE/ENDIF• Replace bare GOTO with DO WHILE, CYCLE, or EXIT• Replace common numeric literals with named constants• Replace old-style conditional operators with modernequivalents (e.g. convert .GE. to >=)
• Force all variables to be declared by using IMPLICITNONE
24
Update: Goals ii
Note repetitive sections of code for isolation into sharedmodules in next phase
Defer complex refactoring until next phase
Structure becomes apparent as simple refactorings are applied
25
Update: Deliverables
Fewer new deliverables; more changes to existing work:
• `Working' executable code• Readable, modernized source code• New shared utility routines• Improved build scripts with common dependencies• Updated documentation
26
Modularize: Goals
Goal: Extract common elements to modules shared amongapplications
• Add unit tests• Create cohesive modules with sensible grouping ofcomponents
27
Modularize: Deliverables
More new deliverables and substantial changes to existingwork:
• `Working' executable code• Source code separated into applications and commondependencies
• Unit tests on dependencies• Improved build scripts with common dependencies• Updated documentation
28
Summary.
Summary
• Full project detailed in [Apt18]• Source archive available at https://bitbucket.org/apthorpe/jeppson_pipeflow
29
Conclusions and Insights i
• Source code recovery went about at well as anticipated• OCR'd text was surprisingly usable• Took several passes to re disambiguate visually similarcharacters
• Did not anticipate errors in published code or test cases -errors exist everywhere!
• Phased approach worked well• Maintained very clear focus on outcome and deliverables• Simplified decision-making when new issues arose• Short iterations limited `scope creep' to manageable levels
30
Conclusions and Insights ii
• Reconstruction of UNIVAC MATH-PACK routine was aboutas complex as expected due to oddness of LAPACKinterfaces
• Build and test infrastructure reconstitution was valuable• Transition from sh to make to CMake was worth theannoyance
• ftncheck is very usable as a simple cross-platform unittest framework; has quirks
• Still looking into CTest and CDash integration• Still not sold on CMake; haven't found anything better, butsystem is baroque and unintuitive
• Jupyter Notebook invaluable for prototyping,benchmarking, and test data construction
31
Questions?.
Thank You!
32
References.
References i
Robert Apthorpe, Recovery of jeppson pipe networkanalysis software, Tech. Report wp-20180209-jeppson,Acorvid Technical Services Corporation, March 2018,http://www.acorvid.com/wp-content/uploads/2018/03/wp-20180307-jeppson-1.pdf.Roland W. Jeppson, Steady flow analysis of pipe networks:An instructional manual, Tech. Report 300, Utah WaterResearch Laboratory, September 1974, https://digitalcommons.usu.edu/water_rep/300/.
, Analysis of flow in pipe networks, Ann ArborScientific Publishers, Inc., Ann Arbor, MI, 1976.
33
References ii
Lindell E. Ormsbee, The history of water distributionnetwork analysis: The computer age, Water DistributionSystems Analysis Symposium 2006, 2008, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.510.2637&rep=rep1&type=pdf, pp. 1--6.
34
For More Information i
• Stack Overflow question ``Univac Math pack subroutines inold-school FORTRAN (pre-77)''https://stackoverflow.com/questions/48265245/WP-20180209-jeppson/
univac-math-pack-subroutines-in-old-school-fortran-pre-77
• Citation count for [Jep76] via Google Scholar:https://scholar.google.com/scholar?cites=15501786215094582199&as_sdt=5,44&sciodt=0,44&hl=en
• Jeppson pipe flow code archive available at https://bitbucket.org/apthorpe/jeppson_pipeflow
35
For More Information ii
• Doxygen documentation generation tool,http://www.stack.nl/~dimitri/doxygen/
• findent source code reformatter,https://sourceforge.net/projects/findent/
• FLIBS, open Fortran utility libraries,https://sourceforge.net/projects/flibs/
• CMake build automation system, https://cmake.org/
36