Software Practices for Reproducible Science
a tutorial presented at
2024 ACM Conference on Reproducibility and Replicability (ACM-REP)
on 1:30 pm - 5:00 pm CEST Tuesday 18 June 2024
Presenters: Anshu Dubey (Argonne National Laboratory) and Gregory R. Watson (Oak Ridge National Laboratory)
This page provides detailed information specific to the tutorial event above. Expect updates to this page up to, and perhaps shortly after, the date of the tutorial. Pages for other tutorial events can be accessed from the main page of this site.
Quick Links
- Presentation Slides (FigShare)
On this Page
- Description
- Agenda
- Presentation Slides
- How to Participate
- Stay in Touch
- Resources from Presentations
- Requested Citation
- Acknowledgements
Description
The computational science and engineering (CSE) community is in the midst of an extremely challenging period created by the confluence of disruptive changes in computing architectures, demand for greater scientific reproducibility, and new opportunities for greatly improved simulation capabilities, especially through coupling physics and scales. Computer architecture changes require new software design and implementation strategies, including significant refactoring of existing code. Reproducibility demands require more rigor across the entire software endeavor and for running computational experiments. These challenges demand large investments in scientific software development and improved practices. Focusing on improved developer productivity and software sustainability is both urgent and essential.
This tutorial will provide information and illustrative use cases of software practices, processes, and tools explicitly tailored to enhance reproducibility in computational science. We discuss practices that are relevant for projects of all sizes, with emphasis on complex workflows and reproducible science. Topics include software design, software testing, collaborative development, and methodologies for running reproducible computational experiments.
Agenda
Time (CEST) | Title | Presenter |
---|---|---|
1:30 PM | Introduction | Gregory R. Watson (ORNL) |
1:35 PM | Improving Reproducibility Through Better Software Practices | Gregory R. Watson (ORNL) |
2:15 PM | Software Testing and Verification | Gregory R. Watson (ORNL) |
3:00 PM | Afternoon break | |
3:30 PM | Managing Computational Experiments | Anshu Dubey (ANL) |
4:30 PM | Reproducibility of Workflows | Gregory R. Watson (ORNL) |
5:00 PM | Adjourn | |
Presentation Slides
The latest version of the slides will always be available at https://doi.org/10.6084/m9.figshare.26019469.
Note that these files may include additional slides that will not be discussed during the tutorial, but questions are welcome.
Supplementary Materials
Due to technical difficulties, we were unable to present the “Managing Computational Experiments” module. The following two videos, from our 2022 ATPESC tutorial, taken together, are very similar to what we intended to present in this tutorial.
- Lab Notebooks for Computational Mathematics, Sciences, & Engineering (Jared O’Neal, ANL)
- Managing Computational Experiments (Jared O’Neal and Anshu Dubey, ANL)
How to Participate
-
We want to interact with you! We find these tutorials most interesting and informative (for everyone) if you ask questions and share experiences! We learn too!
-
Please use the raise hand feature to ask a question, or put it in the chat if you prefer. We’ll respond verbally or in chat or verbally as opportunities permit.
Stay in Touch
-
After the tutorial please feel free to email questions or feedback to the BSSw tutorial team at bssw-tutorial@lists.mcs.anl.gov.
-
To find out about future events organized by the IDEAS Productivity Project, you can subscribe to our mailing list (usually ~2 messages/month).
-
For monthly updates on the Better Scientific Software site, subscribe to our monthly digest.
Resources from Presentations
Links from the tutorial presentations are listed here for convenience
- Module 1: Introduction
- Module 2: Improving Reproducibility Through Better Software Practices
- Toward a Compatible Reproducibility Taxonomy for Computational and Computing Sciences
- Reproducibility and Replicability in Science
- Many Psychology Findings Not As Strong As Claimed
- The War Over Supercooled Water
- Researchers find bug in Python Script may have affected hundreds of studies
- National Science Foundation Data Management Plan Requirements
- Findable, Accessible, Interoperable, Re-usable
- FAIR Data Principles US
- SC Conference Reproducibility Initiative
- HOWTO for AEC Submitters
- Artifact Evaluation: Tips for Authors
- SIGOPS articles on award winning artifacts
- Github CSArtifacts Resources
- ACM Reproducible Computational Results
- ACM Artifact Review and Badging
- http://fursin.net/reproducibility.html
- National Information Standards Organization (NISO) on Reproducibility and Badging
- Helpful Tools
- Floating Point Analysis Tools
- Code Ocean (Cloud platforms - publish and reproduce research code and data)
- DOIs and hosting of data, code, documents:
- Other Resources:
- The FAIR Guiding Principles for Scientific Data Management and Stewardship. Mark D. Wilkinson, et al. 2016
- FAIR4RS (previously linked to
www.rd-alliance.org/groups/fair-research-software-fair4rs-wg) - Editorial: ACM TOMS Replicated Computational Results Initiative. Michael A. Heroux. 2015
- Enhancing Reproducibility for Computational Methods
- Simple experiments in reproducibility and technical trust by Mike Heroux and students (work in progress)
- What every scientist should know about floating-point arithmetic. David Goldberg.
- Module 3: Software Testing and Verification
- Module 4: Managing Computational Experiments
- Writing the Laboratory Notebook
- DIKW pyramid
- HPC and the Lab Manager
- How to pick an electronic notebook
- Ivo Jimenez
- Popper
- FlashKit
- Dubey A, Calder AC, Daley C, et al. Pragmatic optimizations for better scientific utilization of large supercomputers. The International Journal of High Performance Computing Applications. 2013;27(3):360-373. doi:10.1177/1094342012464404
- Wilfred F. van Gunsteren, and Alan E. Mark. Validation of molecular dynamics simulation. J. Chem. Phys. 108(15), 6109-6116 (1998). doi:10.1063/1.476021
- Module 5: Reproducibility of Workflows
- Improving Max-Min scheduling Algorithm for Reducing the Makespan of Workflow Execution in the Cloud. DOI:10.5120/ijca2017915684.
- Example workflow systems: https://s.apache.org/existing-workflow-systems
- Workflow repositories and registries
- Data/Code Combinator
- Common scientific workflow patterns
- High-performance simulation and modeling: https://doi.org/10.1007/978-3-031-23606-8_9
- High-performance AI: https://www.analyticsvidhya.com/blog/2022/02/a-comprehensive-guide-on-hyperparameter-tuning-and-its-techniques/
- Scientific data lifecycle: https://www.nersc.gov/assets/NERSC-10/Workflows-Archetypes-White-Paper-v1.0.pdf
- Real-time XFEL Data Analysis at SLAC and NERSC: https://doi.org/10.48550/arXiv.2106.11469
- Hybrid: https://doi.org/10.3389/fonc.2019.00984
- https://opencontainers.org/
- Containers and the Truth between HPC & Cloud System Software Convergence. 2021. DOI:10.2172/1859696
- Other resources
- Workflow systems
- AiiDA: https://www.aiida.net
- BEE: https://github.com/lanl/BEE
- COMPSs: https://compss.bsc.es
- Covalent: https://www.covalent.xyz
- Cromwell: http://cromwell.readthedocs.io
- FireWorks: https://materialsproject.github.io/fireworks
- Galaxy: https://galaxyproject.org
- Maesto: https://maestrowf.readthedocs.io
- Nextflow: https://www.nextflow.io
- Pegasus: https://pegasus.isi.edu
- Snakemake: https://snakemake.github.io
- Swift: http://swift-lang.org/Swift-T
- Taskvine: https://ccl.cse.nd.edu/software/taskvine
- Containers
- Apptainer (singularity) https://apptainer.org
- Charliecloud: https://hpc.github.io/charliecloud
- Docker: https://docker.com
- Podman: https://podman.io
- Sarus: https://sarus.readthedocs.io/en/stable
- Shifter: https://shifter.readthedocs.io/en/latest
- Workflow repositories
- @nf-core (Nextflow) https://nf-co.re
- Snakemake workflow catalog https://snakemake.github.io/snakemake-workflow-catalog
- Metadata frameworks
- Common workflow language https://www.commonwl.org
- Workflow RO-Crate https://w3id.org/workflowhub/workflow-ro-crate/1.0
- Bioschemas Profiles https://bioschemas.org/profiles
- Workflow repositories
- WorkflowHub https://workflowhub.eu
- Dockstore https://dockstore.org
- Data repositories
- Zenodo https://zenodo.org
- Dataverse https://dataverse.org
- Dynamic Provisioning and Execution of HPC Workflows Using Pythonm doi:10.1109/PyHPC.2016.005.
- Characterization of scientific workflows, doi:10.1109/WORKS.2008.4723958.
- Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis, 10.48550/arXiv.2106.11469.
- Containers and the Truth between HPC & Cloud System Software Convergence, DOI:10.2172/1859696.
- Workflow systems
Requested Citation
The requested citation the overall tutorial is:
Anshu Dubey and Gregory R. Watson, Software Practices for Reproducible Science tutorial, in 2024 ACM Conference on Reproducibility and Replicability (ACM-REP), Rennes, France and online, 2024. DOI: 10.6084/m9.figshare.26019469.
Individual modules may be cited as Speaker, Module Title, in Software Practices for Reproducible Science tutorial…
Acknowledgements
This tutorial is produced by the IDEAS Productivity project.
This work was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
This work was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Next-Generation Scientific Software Technologies (NGSST) program.