Better Scientific Software
a tutorial presented at
Improving Scientific Software
on 9:00 am - 11:20 am and 1:20 pm - 3:40 pm MDT (UTC-6) Thursday 7 April 2022
Presenters: David E. Bernholdt (Oak Ridge National Laboratory), Rinku K. Gupta (Argonne National Laboratory), Patricia A. Grubel (Los Alamos National Laboratory), and David M. Rogers (Oak Ridge National Laboratory)
Helpers: Anshu Dubey (Argonne National Laboratory) and Gregory R. Watson (Oak Ridge National Laboratory)
This page provides detailed information specific to the tutorial event above. Expect updates to this page up to, and perhaps shortly after, the date of the tutorial. Pages for other tutorial events can be accessed from the main page of this site.
Quick Links
- Playlist (YouTube)
- Presentation Slides (FigShare)
- Hands-On Code Repository (GitHub)
On this Page
- Description
- Learning Objectives
- Agenda
- Presentation Slides
- How to Participate
- Hands-On Exercises
- Stay in Touch
- Resources from Presentations
- Requested Citation
- Acknowledgements
Description
Producing scientific software is a challenge. The high-performance modeling and simulation community, in particular, is dealing with the confluence of disruptive changes in computing architectures and new opportunities (and demands) for greatly improved simulation capabilities, especially through coupling physics and scales. At the same time, computational science and engineering (CSE), as well as other areas of science, are experiencing increasing focus on scientific reproducibility and software quality.
Computer architecture changes require new software design and implementation strategies, including significant refactoring of existing code. Reproducibility demands require more rigor across the entire software endeavor. Code coupling requires aggregate team interactions including integration of software processes and practices. These challenges demand large investments in scientific software development and improved practices. Focusing on improved developer productivity and software sustainability is both urgent and essential.
This tutorial distills experience from members of the IDEAS Productivity project and the creators of the BSSw.io community website drawn from many scientific software projects over many years. The tutorial will provide information about software practices, processes, and tools explicitly tailored for CSE. Topics to be covered include: Agile methodologies and tools, software design and refactoring, testing and continuous integration, Git workflows for teams, and reproducibility. Material will be mostly at the beginner and intermediate levels. There will also be opportunities to discuss topics raised by the audience.
Learning Objectives
Participants should be able to…
- Describe a range of methods and strategies to improve software development processes, working towards better developer productivity, software sustainability, and scientific reproducibility.
- Be able to customize an approach for tailoring software development processes to the particulars of your project team and explain software value-related trade-offs.
- Increase motivation, inspiration, and awareness of resources to help you in working towards producing better scientific software, and thus better scientific outcomes, in your own projects.
Agenda
This tutorial comprises two conference sessions. The first session, 9:00am-11:20am MDT, will be a series of presentations. The second session, 1:20pm-3:40pm MDT, will be more open-ended, depending on the interests of the participants. There will be opportunities to:
- Gain some experience with hands-on activities the presentations (guided or independent);
- Deeper Q&A based on the presentations; or
- Discussion of experiences and challenges with software development faced by participants in their own projects.
We will use Zoom breakout rooms to be able to accommodate multiple interests in so far as we can staff them.
Time (MDT) | Module | Title | Presenter |
---|---|---|---|
9:00 AM | 0 | Introduction and Setup | David E. Bernholdt (ORNL) |
9:05 AM | 1 | Motivation and Overview of Best Practices in HPC Software Development | Rinku K. Gupta (ANL) |
9:15 AM | 2 | Agile Methodologies | Patricia A. Grubel (LANL) |
9:45 AM | 3 | Git Workflows | Patricia A. Grubel (LANL) |
10:00 AM | Break | ||
10:20 AM | 4 | Software Testing Introduction | Rinku K. Gupta (ANL) |
10:40 AM | 5 | Scientific Software Design | David E. Bernholdt (ORNL) |
11:00 AM | 6 | Testing Complex Software | Rinku K. Gupta (ANL) |
11:15 AM | Q&A | ||
11:20 AM | Adjourn morning session | ||
11:20 AM | Other conference activities | ||
1:20 PM | 7 | Refactoring Scientific Software | David M. Rogers (ORNL) |
1:45 PM | 8 | Improving Reproducibility Through Better Software Practices | David M. Rogers (ORNL) |
2:00 PM | 9 | Summary | David M. Rogers (ORNL) |
2:05 PM | Hands-on & Discussion (optional) | ||
2:20 PM | Break | ||
2:40 PM | Hands-on & Discussion (optional) | ||
3:40 PM | Adjourn |
Presentation Slides
The latest version of the slides will always be available at https://doi.org/10.6084/m9.figshare.19416767.
Note that these files may include additional slides that will not be discussed during the tutorial, but questions are welcome.
How to Participate
- Please use Zoom chat or unmute to ask questions at any time. We will respond in chat or verbally as opportunities permit.
-
We will be available during the break for Q&A or further discussion.
- We have an afternoon session for the tutorial which will allow you to try the hands-on activities, ask questions from the presentations or have discussion of experiences and challenges with software development faced by participants in their own projects.
Hands-On Exercises
Introduction
The hands-on exercises for this tutorial are based around a simple numerical model using the one-dimensional heat equation. The example is described briefly in the repository’s README file, and in greater detail in the ATPESC Hands-On lesson. The ATPESC version focuses on the numerical aspects of the model.
But for this tutorial, we’re focused on how to make the software better from a quality perspective, so you don’t need to understand the math to do these exercises.
For the purposes of these hands-on exercises, you should imagine you’ve inherited an early version of the hello-numerical-world software from a colleague who’s left the project, and you’ve been assigned to get it into better shape so that it can be used in the next ATPESC summer school.
The repository you’ll be working with is: bssw-tutorial/hello-numerical-world-2022-04-07-iss.
Note: most of the screenshots will refer to the generic “hello-numerical-world” repository rather than the one specifically for this event.
List of Hands-On Exercises
Note that not every presentation module has exercises (yet).
- Setting up the Prerequisites. Setup the accounts needed for these exercises.
- Agile Methodologies. You’ll use GitHub issues and project boards to setup a simple “personal kanban” board.
- Basic Git for Collaboration. You’ll fork our hello-numerical-world repository, create a feature branch, and make a pull request
- Software Testing. You’ll use an example project to try out using test driven development to add new functionality to a project
- Refactoring Scientific Software. You’ll perform a small, well-defined refactoring exercise
There are also activities associated with a couple of modules that we didn’t have time to cover in this tutorial. You are welcome to try them out too.
- Agile Redux. You’ll create epic, story, and task issues for the refactoring task and track them on a kanban board
- Continuous Integration. You’ll establish a simple continuous integration workflow and then refine it, adding code coverage assessment
Stay in Touch
-
After the tutorial please feel free to email questions or feedback to the BSSw tutorial team at bssw-tutorial@lists.mcs.anl.gov.
-
If you want to do the hands-on exercises, we’re happy to provide feedback on your pull requests and issues, even after the end of the tutorial.
-
To find out about future events organized by the IDEAS Productivity Project, you can subscribe to our mailing list (usually ~2 messages/month).
-
For monthly updates on the Better Scientific Software site, subscribe to our monthly digest.
Resources from Presentations
Links from the tutorial presentations are listed here for convenience
- Module 0: Introduction and Setup
- Module 1: Motivation and Overview of Best Practices in HPC Software Development
- Best Practices for Scientific Computing
- Good Enough Practices in Scientific Computing
- Linux Foundation Core Infrastructure Initiative (CII) Best Practices Badging Program
- Rate Your Project Assesment Tool
- Progress Tracking Card (PTC) Examples
- Productivity and Sustainability Improvement Planning
- Better Scientific Software (BSSw)
- Module 2: Agile Methodologies
- Module 3: Git Workflows
- Atlassian/BitBucket (Comparing Workflows)
- How to code review in a Pull Request
- Testing and Code Review Practices in Research Software Development (webinar) (previously linked to
ideas-productivity.org/events/hpc-best-practices-webinars/#webinar044) - Git Flow (Driessen’s Original Blog)
- GitHub Flow (previously linked to
scottchacon.com/2011/08/31/github-flow.html) - GitLab Flow (previously linked to
docs.gitlab.com/ee/topics/gitlab_flow.html) - Trilinos
- Open MPI
- FleCSI
- Module 4: Software Testing Introduction
- In the face of uncertainties, NNSA seeks verification and validation
- Python Build and Test Framework: pyscaffold.org
- Build-Link-Test CMake Framework: llnl-blt.readthedocs.io
- Static Source Analysis (C++): clang-tidy
- Static Source Analysis (python): flake8 and pylint
- Code Coverage Webservices: codecov and coveralls
- Tutorials for code coverage: Online Tutorial, Another example
- Development Practices Survey Article
- CMake Tutorial
- CMake add-test command documentation
- Verification and Validation in Scientific Computing
- Working Effectively with Legacy Code
- Module 5: Scientific Software Design
- The Exascale Computing Project (ECP)
- Findings from the ECP Performance Portability Panel Series
- SC20 Tutorial: Better Scientific Software
- Performance Portability and the Exascale Computing Project
- Kokkos Lecture Series
- Related paper: A Design Proposal for a Next Generation Scientific Software Framework
- Related webinar: Software Design for Longevity with Performance Portability
- Module 6: Testing Complex Software
- Useful resources on testing (formerly linked to
ideas-productivity.org/resources/howtos/) - Related articles:
- Useful resources on testing (formerly linked to
- Module 7: Refactoring Scientific Software
- Module 8: Improving Reproducibility Through Better Software Practices
- Motivations and Background:
- Definitions, Guidelines, and Organizations:
- National Science Foundation Data Management Plan Requirements
- Findable, Accessible, Interoperable, Re-usable
- SC21 Reproducibility Initiative
- ACM Transactions on Mathematical Software (TOMS)
- ACM Artifact Review and Badging
- http://fursin.net/reproducibility.html
- National Information Standards Organization (NISO) on Reproducibility and Badging
- Helpful Tools
- Floating Point Analysis Tools
- Code Ocean (Cloud platforms - publish and reproduce research code and data)
- DOIs and hosting of data, code, documents:
- Other Resources:
- The FAIR Guiding Principles for Scientific Data Management and Stewardship. Mark D. Wilkinson, et al. 2016
- https://gofair.us
- FAIR4RS (previously linked to
www.rd-alliance.org/groups/fair-research-software-fair4rs-wg) - Editorial: ACM TOMS Replicated Computational Results Initiative. Michael A. Heroux. 2015
- Enhancing Reproducibility for Computational Methods. Victoria Stodden, Marcia McNutt, David H. Bailey, Ewa Deelman, Yolanda Gil, Brooks Hanson, Michael A. Heroux, John P.A. Ioannidis, Michela Taufer Science (09 Dec 2016), pp. 1240-1241. DOI: 10.1126/science.aah6168
- Simple experiments in reproducibility and technical trust by Mike Heroux and students (work in progress)
- What every scientist should know about floating-point arithmetic. David Goldberg.
- Module 9: Summary
- COVID-19 epidemiology saga
- https://doi.org/10.25561/77482
- https://www.nicholaslewis.org/imperial-college-uk-covid-19-numbers-dont-seem-to-add-up/
- https://www.nature.com/articles/d41586-020-01003-6
- https://www.foxnews.com/world/imperial-college-britain-coronavirus-lockdown-buggy-mess-unreliable
- https://www.telegraph.co.uk/technology/2020/05/16/coding-led-lockdown-totally-unreliable-buggy-mess-say-experts/
- https://github.com/mrc-ide/covid-sim/
- https://philbull.wordpress.com/2020/05/10/why-you-can-ignore-reviews-of-scientific-code-by-commercial-software-developers/amp/
- http://doi.org/10.5281/zenodo.3865491
- Productivity and Sustainability Improvement Planning
- Write to the tutorial authors
- Tutorial Material Online
- IDEAS Productivity project
- Better Scientific Software site
- COVID-19 epidemiology saga
Requested Citation
The requested citation the overall tutorial is:
David E. Bernholdt, Rinku K. Gupta, Patricia A. Grubel, and David M. Rogers, Better Scientific Software tutorial, in Improving Scientific Software, online, 2022. DOI: 10.6084/m9.figshare.19416767.
Individual modules may be cited as Speaker, Module Title, in Better Scientific Software tutorial…
Acknowledgements
This tutorial is produced by the IDEAS Productivity project.
This work was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.