Better Scientific Software
a tutorial presented at
The International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC20)
on 2:30 pm - 6:30 pm EST Tuesday 10 November 2020
Presenters: David E. Bernholdt (Oak Ridge National Laboratory), Anshu Dubey (Argonne National Laboratory), Patricia A. Grubel (Los Alamos National Laboratory), and Rinku K. Gupta (Argonne National Laboratory)
This page provides detailed information specific to the tutorial event above. Expect updates to this page up to, and perhaps shortly after, the date of the tutorial. Pages for other tutorial events can be accessed from the main page of this site.
Quick Links
- Presentation Slides (FigShare)
- Hands-On Code Repository (GitHub)
On this Page
- Description
- How To Participate
- Stay In Touch
- Presentation Slides
- Agenda
- Hands-On Exercises
- Supplementary Materials
- Resources from Presentations
- Requested Citation
- Acknowledgements
Description
The computational science and engineering (CSE) community faces an extremely challenging period. Computer architecture changes require new software design and implementation strategies, including significant refactoring of existing code. Reproducibility demands require more rigor across the entire software endeavor. Code coupling requires aggregate team interactions including integration of software processes and practices. These challenges demand large investments in scientific software development and improved practices. Focusing on improved developer productivity and software sustainability is both urgent and essential. This tutorial will provide information on software practices, processes, and tools with the goals of improving the productivity of CSE software developers and increasing the sustainability of software artifacts. Topics include agile methodologies, collaboration via git, scientific software design, testing, and refactoring, continuous integration, and reproducibility. Hands-on homework will be available to reinforce the presentations.
How To Participate
-
Registration for the tutorial program (“TUT”) is required. You can register at any time up to 16 December 2020 to get access to the recorded version of the tutorial for up to 6 months.
-
Most presentations have been pre-recorded, though there are several live sessions interspersed.
-
Please use the question tool to ask questions at any time. The speaker and the rest of the tutorial team will be monitoring and will respond. If you’re also interested in a question, feel free to add points to make it more visible to the tutorial team. There is also a chat channel, which can be used for extended follow-up discussions.
-
The schedule includes a break in the middle, where we plan to do more Q&A and if there is interest, some demos from the hands-on exercises. There is also a segment at the end for additional Q&A and demos.
-
Participation in the Q&A and demonstrations is optional – we know you need breaks too.
-
Your feedback is important both to us and to the SC20 organizers. Please remember to evaluate us! of follow the Evaluation link in the tutorial’s page on the EventScribe site. Evaluations can be submitted through 20 December 2020.
-
You might also be interested in attending these other “software-related” events at SC20.
Stay In Touch
-
After the tutorial, or if you’re not able to participate in the live session, please feel free to email the BSSw tutorial team at bssw-tutorial@lists.mcs.anl.gov.
-
If you want to do the hands-on exercises on your own, we’re happy to provide feedback on your pull requests.
-
To find out about future events organized by the IDEAS Productivity Project, you can subscribe to our mailing list (usually ~2 messages/month).
-
For monthly updates on the Better Scientific Software site, subscribe to our monthly digest.
Presentation Slides
- The latest version of the slides will always be available at https://doi.org/10.6084/m9.figshare.12994376.
- Note that these files may include additional slides that will not be discussed during the tutorial, but questions are welcome.
- Errata (References are to versions of the FigShare DOI. The unversioned link above always retrieves the latest. Specific older versions are available if you dig into the interface.)
- v3: Updates to intro slides (module 00) with adjustments for the event platform and a list of additional software-related events at SC20.
- v2: Corrected “License, Citation, and Acknowledgements” slides in modules 02 and 05.
- v1: Same as distributed through SC20.
Agenda
The live presentation takes place 2:30-6:30pm ET, Tuesday 10 November 2020 (19:30-23:30 UTC).
Time (EST) | Module | Topic | Speaker | Time (UTC) |
---|---|---|---|---|
2:30pm-2:35pm | 00 | Introduction | David E. Bernholdt, ORNL | 19:30-19:35 |
2:35pm-2:45pm | 01 | Motivation and Overview of Best Practices in HPC Software Development | David E. Bernholdt, ORNL | 19:35-19:45 |
2:45pm-3:15pm | 02 | Agile Methodologies | Rinku Gupta, ANL | 19:45-20:15 |
3:15pm-3:30pm | 03 | Git Workflows | Patricia Grubel, LANL | 20:15-20:30 |
3:30pm-4:00pm | 04 | Software Design | Anshu Dubey, ANL | 20:30-21:00 |
4:00pm-4:15pm | 05 | Software Testing 1 | Rinku Gupta, ANL | 21:00-21:15 |
4:15pm-4:35pm | Beak (live Q&A and demo of Kanban hands-on activities) | David E. Bernholdt and All | 21:15-21:35 | |
4:35pm-4:50pm | 06 | Software Testing 2 | Anshu Dubey, ANL | 21:35-21:50 |
4:50pm-5:35pm | 07 | Refactoring | Anshu Dubey, ANL | 21:50-22:35 |
5:35pm-5:50pm | 08 | Continuous Integration | David E. Bernholdt, ORNL | 22:35-22:50 |
5:50pm-6:05pm | 09 | Reproducibility | Patricia Grubel, LANL | 22:50-23:05 |
6:05pm-6:10pm | 10 | Summary | David E. Bernholdt, ORNL | 23:05-23:10 |
6:10pm-6:30pm | Live Q&A and demo of CI hands-on activities | David E. Bernholdt and All | 23:10-23:30 |
Hands-On Exercises
Introduction
The hands-on exercises for this tutorial are based around a simple numerical model using the one-dimensional heat equation. The example is described briefly in the repository’s README file, and in greater detail in the ATPESC Hands-On lesson. The ATPESC version focuses on the numerical aspects of the model. But for the Better Scientific Software tutorial, we’re focused on how to make the software better from a quality perspective, so you don’t need to understand the math to do these exercises.
For the purposes of the BSSw hands-on exercises, you should imagine you’ve inherited an early version of the hello-numerical-world software from a colleague who’s left the project, and you’ve been assigned to get it into better shape so that it can be used in the next ATPESC summer school.
The repository you’ll be working with is on GitHub: betterscientificsoftware/hello-numerical-world-sc20. Note: most of the screenshots will refer to the generic “hello-numerical-world” repository rather than the one specifically for this tutorial.
List of Hands-On Exercises
Note that the exercise numbers align with the presentation modules. Not every module has exercises (yet).
- Exercise 0: Setting up the Prerequisites. Setup the accounts needed for these exercises.
- Exercise 2: Agile Methodologies. You’ll use GitHub issues and project boards to setup a simple “personal kanban” board.
- Exercise 3: Git Workflows. You’ll fork our hello-numerical-world repository, create a feature branch, and make a pull request
- Exercise 7a: Agile Redux. You’ll create epic, story, and task issues for the refactoring task and track them on a kanban board
- Exercise 7b: Refactoring Part 1. You’ll perform a small, well-defined refactoring exercise
- Exercise 7c: Refactoring Part 2. You’ll perform a a more open-ended refactoring exercise
- Exercise 8: Continuous Integration. You’ll establish a simple continuous integration workflow and then refine it, adding code coverage assessment
Supplementary Materials
We gave a full-day version of this tutorial for the Argonne Extreme Scale Training Program (ATPESC) in August 2020. The presentations slides are available at https://doi.org/10.6084/m9.figshare.12719834. If you want to delve a little deeper into some of the topics we’re covering, consider checking out these resources. We want to particularly draw attention to Module 09 Reproducibility, which we were able to spend 45 minutes on, rather than 15 minutes today. Eventually, they will post recordings of all of the sessions. If you’d like us to notify you when the videos are published, please email us at bssw-tutorial@lists.mcs.anl.gov.
Resources from Presentations
These are the links included in the tutorial presentations, included here for easier access
- Module 0: Introduction
- Module 1: Motivation and Overview of Best Practices in HPC Software Development
- none
- Module 2: Agile Methodologies
- Module 3: Git Workflows
- Atlassian/BitBucket (Comparing Workflows)
- Git Flow (Driessen’s Original Blog)
- GitHub Flow (previously linked to
scottchacon.com/2011/08/31/github-flow.html) - GitLab Flow (previously linked to
docs.gitlab.com/ee/topics/gitlab_flow.html)
- Module 4: Software Design
- Module 5: Software Testing 1
- Tutorials for code coverage: Online Tutorial, Another example
- Useful resources on testing (formerly linked to
ideas-productivity.org/resources/howtos/)
- Module 6: Software Testing 2
- Module 7: Refactoring
- Module 8: Continuous Integration
- Module 9: Reproducibility
- Floating Point Analysis Tools
- Code Ocean (Cloud platforms - publish and reproduce research code and data)
- DOIs and hosting of data, code, documents:
- National Science Foundation Data Management Plan Requirements
- SC20 Transparency and Reproducibility Initiative
- ACM Transactions on Mathematical Software (TOMS)
- ACM Artifact Review and Badging
- National Information Standards Organization (NISO) on Reproducibility and Badging
- The FAIR Guiding Principles for Scientific Data Management and Stewardship. Mark D. Wilkinson, et al. 2016
- Editorial: ACM TOMS Replicated Computational Results Initiative. Michael A. Heroux. 2015
- Simple experiments in reproducibility and technical trust by Mike Heroux and students (work in progress)
- Module 10: Summary
- COVID-19 epidemiology saga
- https://doi.org/10.25561/77482
- https://www.nicholaslewis.org/imperial-college-uk-covid-19-numbers-dont-seem-to-add-up/
- https://www.nature.com/articles/d41586-020-01003-6
- https://www.foxnews.com/world/imperial-college-britain-coronavirus-lockdown-buggy-mess-unreliable
- https://www.telegraph.co.uk/technology/2020/05/16/coding-led-lockdown-totally-unreliable-buggy-mess-say-experts/
- https://github.com/mrc-ide/covid-sim/
- https://philbull.wordpress.com/2020/05/10/why-you-can-ignore-reviews-of-scientific-code-by-commercial-software-developers/amp/
- http://doi.org/10.5281/zenodo.3865491
- Productivity and Sustainability Improvement Planning
- Better Scientific Software web site
- COVID-19 epidemiology saga
Requested Citation
The requested citation the overall tutorial is:
David E. Bernholdt, Anshu Dubey, Patricia A. Grubel, and Rinku K. Gupta, Better Scientific Software tutorial, in The International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC20), online, 2020. DOI: 10.6084/m9.figshare.12994376.
Individual modules may be cited as Speaker, Module Title, in Better Scientific Software tutorial…
Acknowledgements
This tutorial is produced by the IDEAS Productivity project.
This work was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.