Perspectives on Scientific Software Recovery and ...

43
Perspectives on Scientific Software Recovery and Revitalization . Bob Apthorpe March 30, 2018 Acorvid Technical Services Corporation

Transcript of Perspectives on Scientific Software Recovery and ...

Page 1: Perspectives on Scientific Software Recovery and ...

Perspectives on Scientific Software Recoveryand Revitalization.

Bob ApthorpeMarch 30, 2018

Acorvid Technical Services Corporation

Page 2: Perspectives on Scientific Software Recovery and ...

Overview.

Page 3: Perspectives on Scientific Software Recovery and ...

Characteristics of Scientific Software

Properties of scientific software include

• Numerically sophisticated• Developed principally or exclusively by subject matterexperts (SMEs)

• Favors accuracy ≥ efficiency > elegance• Long lifespan

1

Page 4: Perspectives on Scientific Software Recovery and ...

Common Themes in Older Scientific Software Projects

Many mature scientific software projects have a commonlifecycle:

• Often for internal use only• Resources allocated for development, not maintenance• May be maintained by one person or a small group• Software can outlive its creator• Maintenance may fall to junior personnel unfamiliar withthe code

Search Stack Overflow for the phrase ``I recently inherited aFORTRAN application…''

2

Page 5: Perspectives on Scientific Software Recovery and ...

Software Development Lifecycle

Source: https://www.gethow.org/importance-of-software-lifecycle-management 3

Page 6: Perspectives on Scientific Software Recovery and ...

Reasons for Software Recovery

Inactive or retired software may need to be put back in service

• Legal or regulatory requirements• Confirmation of prior results• Renewed interest in subject matter

Examples include:

• Recovering fast reactor safety analysis codes to supportGenIV reactor development (SPRAY, SOFIRE-II)

• Extending support code which generated aerosol particlesize distributions for severe reactor accident code

• Needed chemical equilibrium analysis code built andtested to ASME NQA-1 standard (``nuclear grade'')

4

Page 7: Perspectives on Scientific Software Recovery and ...

Software Recovery Issues

Software recovery involves issues similar to those with legacycode:

• Original or cognizant developers may not be available• Source code may not be machine-readable• Limited test cases and documentation• Original development environment may not exist

• Different hardware• Different operating system• Language differences; proprietary extensions• Compilation flags• Missing dependencies (libraries)

5

Page 8: Perspectives on Scientific Software Recovery and ...

Pragmatic Questions

When encountering new software, basic questions are:

• What does this do?• How do I know it works?• How do I make it work?• How does it work?• How do I change it without breaking it?

Relevant to a new user, developer, or manager

6

Page 9: Perspectives on Scientific Software Recovery and ...

Case Study: Analysis of Flow in PipeNetworks.

Page 10: Perspectives on Scientific Software Recovery and ...

Case Study: Analysis of Flow in Pipe Networks

7

Page 11: Perspectives on Scientific Software Recovery and ...

Background

• Question on Stack Overflow about a missing libraryneeded by a pipe network flow solver

• Source code published in [Jep74] and [Jep76] but not inmachine-readable format

• Code was in fair to poor condition; missing proprietarylibrary

• Short, tractable problem ideal for illustrating coderecovery techniques

• As of March 2018, Google Scholar shows at least 330citations to [Jep76]

• Code has historical significance [Orm08]

8

Page 12: Perspectives on Scientific Software Recovery and ...

Problem Domain: Flow in Pipe Networks

9

Page 13: Perspectives on Scientific Software Recovery and ...

Basic Theory i

Bernoulli's equation:

z1 +p1ρg +

V212g +

E1g = z2 +

p2ρg +

V222g +

E2g

Newton's Law of Viscosity:

τ = µdvdy

Darcy-Weisbach equation:

∆p = fDLDH

ρ⟨v⟩22

10

Page 14: Perspectives on Scientific Software Recovery and ...

Basic Theory ii

11

Page 15: Perspectives on Scientific Software Recovery and ...

Basic Theory iii

Conserve mass and energy

Assumptions:

• constant material properties• isothermal• incompressible flow• continuity; Kirchoff's laws• momentum can be ignored

Reasonable approximation of steady-state single-phaseincompressible flow

Suitable for design and analysis of water distribution systems

12

Page 16: Perspectives on Scientific Software Recovery and ...

Empirical Correlations for Friction Factor

13

Page 17: Perspectives on Scientific Software Recovery and ...

Known Challenges

14

Page 18: Perspectives on Scientific Software Recovery and ...

Known Challenges

• Code was originally written for UNIVAC 1108;• Uses matrix solver GJR from proprietary Sperry MATH-PACKlibrary

• May use proprietary extensions to FORTRAN IV

• OCR'd source code will be full of scanning artifacts• Illegible source code in [Jep74]; [Jep76] is legible however…• No build instructions• Few test cases

15

Page 19: Perspectives on Scientific Software Recovery and ...

Source Code Legibility

16

Page 20: Perspectives on Scientific Software Recovery and ...

Proprietary Libraries and Extensions

17

Page 21: Perspectives on Scientific Software Recovery and ...

Source Code Errors

18

Page 22: Perspectives on Scientific Software Recovery and ...

Ambiguous Code

19

Page 23: Perspectives on Scientific Software Recovery and ...

Test Errors

20

Page 24: Perspectives on Scientific Software Recovery and ...

Development Plan.

Page 25: Perspectives on Scientific Software Recovery and ...

Recovery Phases

Refactoring effort split into three distinct phases:

• Resuscitate: Make code run correctly• Update: Replace problematic coding constructs withmodern equivalents

• Modularize: Extract common elements to modules sharedamong applications

21

Page 26: Perspectives on Scientific Software Recovery and ...

Resuscitate: Goals

Primary Goal: Make code run correctly

• Convert OCR'd text to well-formed source code• Set liberal compile and link options• Set up test cases with sample input and expected putout• Create build scripts• Put project under revision control• Set up auto-documentation system (doxygen)• Note sections of code which may be easy or difficult tomodernize in next phase

22

Page 27: Perspectives on Scientific Software Recovery and ...

Resuscitate: Deliverables

Large number of artifacts created in this phase:

• `working' executable code• Well-formed, managed source code

• Original code• Replacement for GJR from Sperry MATH-PACK

• Build scripts• Test cases• Documentation

• Build• Test• Code

Deliverables should be correct but may not be complete or`perfect'.Goal is a Minimum Viable Product

23

Page 28: Perspectives on Scientific Software Recovery and ...

Update: Goals i

Primary Goal: Replace problematic coding constructs withmodern equivalents

• Convert code to indented free-format• Replace DO/CONTINUE with DO/ENDDO• Replace IF/GOTO with IF/ELSEIF/ELSE/ENDIF• Replace bare GOTO with DO WHILE, CYCLE, or EXIT• Replace common numeric literals with named constants• Replace old-style conditional operators with modernequivalents (e.g. convert .GE. to >=)

• Force all variables to be declared by using IMPLICITNONE

24

Page 29: Perspectives on Scientific Software Recovery and ...

Update: Goals ii

Note repetitive sections of code for isolation into sharedmodules in next phase

Defer complex refactoring until next phase

Structure becomes apparent as simple refactorings are applied

25

Page 30: Perspectives on Scientific Software Recovery and ...

Update: Deliverables

Fewer new deliverables; more changes to existing work:

• `Working' executable code• Readable, modernized source code• New shared utility routines• Improved build scripts with common dependencies• Updated documentation

26

Page 31: Perspectives on Scientific Software Recovery and ...

Modularize: Goals

Goal: Extract common elements to modules shared amongapplications

• Add unit tests• Create cohesive modules with sensible grouping ofcomponents

27

Page 32: Perspectives on Scientific Software Recovery and ...

Modularize: Deliverables

More new deliverables and substantial changes to existingwork:

• `Working' executable code• Source code separated into applications and commondependencies

• Unit tests on dependencies• Improved build scripts with common dependencies• Updated documentation

28

Page 33: Perspectives on Scientific Software Recovery and ...

Summary.

Page 34: Perspectives on Scientific Software Recovery and ...

Summary

• Full project detailed in [Apt18]• Source archive available at https://bitbucket.org/apthorpe/jeppson_pipeflow

29

Page 35: Perspectives on Scientific Software Recovery and ...

Conclusions and Insights i

• Source code recovery went about at well as anticipated• OCR'd text was surprisingly usable• Took several passes to re disambiguate visually similarcharacters

• Did not anticipate errors in published code or test cases -errors exist everywhere!

• Phased approach worked well• Maintained very clear focus on outcome and deliverables• Simplified decision-making when new issues arose• Short iterations limited `scope creep' to manageable levels

30

Page 36: Perspectives on Scientific Software Recovery and ...

Conclusions and Insights ii

• Reconstruction of UNIVAC MATH-PACK routine was aboutas complex as expected due to oddness of LAPACKinterfaces

• Build and test infrastructure reconstitution was valuable• Transition from sh to make to CMake was worth theannoyance

• ftncheck is very usable as a simple cross-platform unittest framework; has quirks

• Still looking into CTest and CDash integration• Still not sold on CMake; haven't found anything better, butsystem is baroque and unintuitive

• Jupyter Notebook invaluable for prototyping,benchmarking, and test data construction

31

Page 37: Perspectives on Scientific Software Recovery and ...

Questions?.

Page 38: Perspectives on Scientific Software Recovery and ...

Thank You!

32

Page 39: Perspectives on Scientific Software Recovery and ...

References.

Page 40: Perspectives on Scientific Software Recovery and ...

References i

Robert Apthorpe, Recovery of jeppson pipe networkanalysis software, Tech. Report wp-20180209-jeppson,Acorvid Technical Services Corporation, March 2018,http://www.acorvid.com/wp-content/uploads/2018/03/wp-20180307-jeppson-1.pdf.Roland W. Jeppson, Steady flow analysis of pipe networks:An instructional manual, Tech. Report 300, Utah WaterResearch Laboratory, September 1974, https://digitalcommons.usu.edu/water_rep/300/.

, Analysis of flow in pipe networks, Ann ArborScientific Publishers, Inc., Ann Arbor, MI, 1976.

33

Page 41: Perspectives on Scientific Software Recovery and ...

References ii

Lindell E. Ormsbee, The history of water distributionnetwork analysis: The computer age, Water DistributionSystems Analysis Symposium 2006, 2008, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.510.2637&rep=rep1&type=pdf, pp. 1--6.

34

Page 42: Perspectives on Scientific Software Recovery and ...

For More Information i

• Stack Overflow question ``Univac Math pack subroutines inold-school FORTRAN (pre-77)''https://stackoverflow.com/questions/48265245/WP-20180209-jeppson/

univac-math-pack-subroutines-in-old-school-fortran-pre-77

• Citation count for [Jep76] via Google Scholar:https://scholar.google.com/scholar?cites=15501786215094582199&as_sdt=5,44&sciodt=0,44&hl=en

• Jeppson pipe flow code archive available at https://bitbucket.org/apthorpe/jeppson_pipeflow

35

Page 43: Perspectives on Scientific Software Recovery and ...

For More Information ii

• Doxygen documentation generation tool,http://www.stack.nl/~dimitri/doxygen/

• findent source code reformatter,https://sourceforge.net/projects/findent/

• FLIBS, open Fortran utility libraries,https://sourceforge.net/projects/flibs/

• CMake build automation system, https://cmake.org/

36