Better Software for Science with High-Performance Computing
a tutorial presented at
SupercomputingAsia 2026 (SCA 2026)/The International Conference on High Performance Computing in Asia-Pacific Region 2026 (HPCAsia 2026)
on 9:30 am - 4:30 pm JST Monday 26 January 2026
Presenters: Anshu Dubey (Argonne National Laboratory) and Akash Dhruv (Argonne National Laboratory)
This page provides detailed information specific to the tutorial event above. Expect updates to this page up to, and perhaps shortly after, the date of the tutorial. Pages for other tutorial events can be accessed from the main page of this site.
Quick Links
On this Page
- Description
- Presenters
- Agenda
- Presentation Slides
- How to Participate
- Hands-On Activities
- Stay in Touch
- Resources from Presentations
- Requested Citation
- Acknowledgements
Description
Producing scientific software is a challenge. The high-performance modeling and simulation community, in particular, faces the confluence of disruptive changes in computing architectures and new opportunities (and demands) for greatly improved simulation capabilities, especially through coupling physics and scales. Simultaneously, computational science and engineering (CSE), as well as other areas of science, are experiencing an increasing focus on scientific reproducibility and software quality. Large language models (LLMs), can significantly increase developer productivity through judicious off-loading of tasks. However, models can hallucinate, therefore it is important to have a good methodology to get the most benefit out of this approach.
We propose a tutorial in which attendees will learn about practices, processes, and tools to improve the productivity of those who develop CSE software, increase the sustainability of software artifacts, and enhance trustworthiness in their use. We will focus on aspects of scientific software development that are not adequately addressed by resources developed for industrial software engineering. We will additionally impart state-of-the-art approaches for using LLMs to enhance developer productivity in the context of scientific software development and maintenance. Topics include the design, test-driven development, refactoring, code translation and testing of complex scientific software systems; and conducting computational experiments with reproducibility built in.
The inclusion of LLM assistance on coding related tasks is particularly important to include in any software productivity concern given that it has the potential to change the way development is done. It is particularly challenging to get this assistance in developing research software because of limited training data. We have developed methodologies and tools for software development and translation that use LLMs. The use of these tools and methodologies for hands-on activities will be a part of this tutorial.
Presenters
Anshu Dubey
Anshu Dubey is a Senior Computational Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory and a Senior Scientist (CASE) at the University of Chicago. She was previously at Lawrence Berkeley National Laboratory, and before that she was the associate director of the Flash Center for Computational Science at the University of Chicago and also led the CS/Applications group, who develop, maintain, and distribute the FLASH code. She received her Ph.D. in computer science from Old Dominion University and a B.Tech. in electrical engineering from the Indian Institute of Technology, New Delhi
Akash Dhruv
Akash Dhruv is an Assistant Computational Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory where he performs research in software design and algorithm development for scientific computing applications. His projects range from fundamental physics-driven engineering research to designing computational workflows that integrate numerical simulations with machine learning and artificial intelligence. He received his Ph.D. in mechanical and aerospace engineering from George Washington University, Washington D.C., and a B.Tech in mechanical engineering from National Institute of Technology, Surat.
Agenda
| Time (JST) | Title | Presenter |
|---|---|---|
| 9:30 AM | Introduction | Anshu Dubey (ANL) |
| 9:45 AM | Motivation and Overview of Best Practices in HPC Software Development | Anshu Dubey (ANL) |
| 10:15 AM | Code Translation from Fortran to C++ using CodeScribe | Akash Dhruv (ANL) |
| 10:45 AM | Morning break | |
| 11:15 AM | Scientific Software Design | Anshu Dubey (ANL) |
| 11:50 AM | Software Testing and Verification | Anshu Dubey (ANL) |
| 12:30 PM | Lunch break | |
| 1:30 PM | Refactoring | Anshu Dubey (ANL) |
| 2:00 PM | Improving Reproducibility Through Better Software Practices | Akash Dhruv (ANL) |
| 2:45 PM | Afternoon break | |
| 3:15 PM | Hands on coding tasks with LLMs | Akash Dhruv (ANL) and Anshu Dubey (ANL) |
| 4:30 PM | Adjourn |
Presentation Slides
The presentations will be published shortly before the event.
How to Participate
-
We want to interact with you! We find these tutorials most interesting and informative (for everyone) if you ask questions and share experiences! We learn too!
-
Please raise your hand at any time to ask a question
Hands-On Activities
Introduction
The hands-on activity for this tutorial involves using a large language model (LLM) for code translation and generation based on specifications (prompts) that you will develop. Participation in these activities is encouraged but not required. After interested participants have had time to try the exercises on their own, the instructor will review their prompts and the resulting code with the class, and these materials will be made available to all participants.
You can participate in the hands-on section in two modes: using the LLM’s web interface, or using CodeScribe, a tool that enables chat-completion through the LLM’s API interface. The code generation objective of the hands-on can be met using the web interface; however, for code translation you will need to do some advance preparation to use CodeScribe. The advantage of using CodeScribe is gaining exposure to the chat-completion technique and becoming familiar with a tool that can be very useful for writing and maintaining code.
The instructors will work in Fortran, C, and C++, but only a surface-level understanding of these languages is required to follow the explanations. For your code generation work, you may prompt the LLM to generate code in the language of your choice. Evaluating the generated code (and revising the prompt accordingly) will be part of the hands-on activity. For the purposes of the tutorial, inspecting the code will be sufficient to gauge its appropriateness. However, if you wish to more rigorously validate the generated code, you will need access to an environment in which you can build and run it, either locally or remotely.
The code translation portion is specific to Fortran, C, and C++. In this case, seed prompts will be provided, and you may choose to use the example in the CodeScribe tutorial repository (https://github.com/akashdhruv/codescribe-tutorial) or create your own Fortran example. If you decide to attempt code translation using your own example, please ensure that the code builds successfully and that you have checks (tests) in place to verify correctness. Keep any customized examples minimal to ensure that you obtain meaningful results during the hands-on exercise, and then apply the workflow to more complex problems later. Do not start with a complex problem.
Advance preparation
If you wish to participate in the hands-on activities, we strongly encourage you to do some preparation before you leave for Osaka. This is especially important if you plan to use CodeScribe, which may require advance interaction with the tool developers to integrate new LLM APIs.
Preparation for using the LLM web interface
-
You will need access to an LLM chat tool. The instructors will be using ChatGPT, but any comparable LLM, including institutionally supported tools, should work.
-
(OPTIONAL) If you wish to build and run the code generated by the LLM, you will need access (local or remote) to an appropriate environment. As stated above, you may use any language you prefer for your hands-on activity. The instructors will work in Fortran, C, and C++.
Preparation for using CodeScribe
It is important that you complete this preparation with enough lead time that we can assist you if necessary before HPC Asia starts. We will not be able to provide support for CodeScribe setup issues during the tutorial itself.
-
You will need API access to an LLM. This goes beyond the web interface and may incur an additional charge on some platforms. However, many institutionally supported LLMs offer API access at no additional cost. You will be responsible for any additional costs incurred. The instructors will be using ChatGPT, but any comparable LLM should work. CodeScribe also supports several freely downloadable models that can be run locally:
- https://huggingface.co/google/gemma-2-2b-it
- https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
- https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf/tree/main
The first two models are smaller and may run faster on laptops. Download instructions can be found here: https://huggingface.co/docs/hub/en/models-downloading. Note that if you download Hugging Face models using their CLI, they will be placed in the cache directory (
~/.cache/huggingface/hub/<model-name>/snapshots/<sha1>) or your operating system’s equivalent). If you usegit clone, you may place the models wherever you prefer. -
CodeScribe is written in Python, so you will need a working Python installation on a system that you can access (local or remote) during the tutorial.
-
Download and install CodeScribe from https://github.com/akashdhruv/CodeScribe
- Installation instructions are provided in the README file: https://github.com/akashdhruv/CodeScribe?tab=readme-ov-file#installation
- You are encouraged to watch the tutorials on installing and using CodeScribe in this Box folder: https://anl.app.box.com/folder/336154643880?s=zv3zdbphqprdz8rjh1c84xpeqd8yg32u. These tutorials were prepared specifically for the code translation portion.
- For instructions on using code generation and update features, you may refer to the tutorial repository: https://github.com/akashdhruv/codescribe-tutorial, which provides minimal examples.
-
You will need to integrate your CodeScribe installation with the API of your chosen LLM. Basic instructions are provided in the README file: https://github.com/akashdhruv/CodeScribe?tab=readme-ov-file#integrating-llm-of-choice. You may need to add support for your specific model in CodeScribe. To do so, examine the file https://github.com/akashdhruv/CodeScribe/blob/development/code_scribe/lib/_llm.py, copy the class that most closely resembles your target model, and create a pull request in the repository. With sufficient lead time, we will do our best to help make this work. You may also file issues on the CodeScribe repository to request assistance with adding support for a specific LLM.
-
(OPTIONAL) If you wish to build and run the code generated by the LLM, you will need access (local or remote) to an appropriate environment. As stated above, you may use any language you prefer for your hands-on activity. The instructors will work in Fortran, C, and C++.
During the tutorial
Prompts
We provide the instructor’s prompts as examples, but to get the most out of this hands-on exercise, you should develop your own set of prompts rather than simply pasting ours into your LLM.
The tutorial repository (https://github.com/akashdhruv/codescribe-tutorial) also contains a prompts directory with prompts for code translation and minimal examples for code generation.
Generated code
The full set of generated code, along with the Makefile and example test inputs and outputs, can be downloaded as a ZIP file.
The individual files are:
- constants.h
- inputs
- main.c
- Makefile
- mesh.c
- mesh.h
- move_particles.c
- particles.c
- test_input
- test_sample
- verify_mesh.c
- verify_movement.c
- verify_particles.c
The tutorial repository (https://github.com/akashdhruv/codescribe-tutorial) contains the source files used for the code translation example.
Stay in Touch
-
After the tutorial please feel free to email questions or feedback to the BSSw tutorial team at bssw-tutorial@lists.mcs.anl.gov.
-
To find out about future events organized by the IDEAS Productivity Project, you can subscribe to our mailing list (usually ~2 messages/month).
-
For monthly updates on the Better Scientific Software site, subscribe to our monthly digest.
Resources from Presentations
Links from the tutorial presentations are listed here for convenience
- Module 1: Introduction
- Module 2: Motivation and Overview of Best Practices in HPC Software Development
- COVID-19 epidemiology saga
- https://doi.org/10.25561/77482
- https://www.nicholaslewis.org/imperial-college-uk-covid-19-numbers-dont-seem-to-add-up/
- https://www.nature.com/articles/d41586-020-01003-6
- https://www.foxnews.com/world/imperial-college-britain-coronavirus-lockdown-buggy-mess-unreliable
- https://www.telegraph.co.uk/technology/2020/05/16/coding-led-lockdown-totally-unreliable-buggy-mess-say-experts/
- https://github.com/mrc-ide/covid-sim/
- https://philbull.wordpress.com/2020/05/10/why-you-can-ignore-reviews-of-scientific-code-by-commercial-software-developers/amp/
- http://doi.org/10.5281/zenodo.3865491
- Best Practices for Scientific Computing
- Good Enough Practices in Scientific Computing
- OpenSSF Best Practices Badge Program
- Good Practices for High-Quality Scientific Computing
- Rate Your Project Assesment Tool
- Progress Tracking Card (PTC) Examples
- Productivity and Sustainability Improvement Planning
- Better Scientific Software (BSSw)
- COVID-19 epidemiology saga
- Module 3: Code Translation from Fortran to C++ using CodeScribe
- Module 4: Scientific Software Design
- https://enterprisersproject.com/article/2020/6/technical-debt-explained-plain-english
- Code generation example
- https://github.com/Flash-X/Flash-X/blob/ylee/try_pushTile_spark/source/physics/Hydro/HydroMain/Spark/Hydro_interface.ini
- This link will work only if you have access to the Flash-X code repository. Please email flash-x@lists.cels.anl.gov with your github username to get access
- References
- Dubey Anshu, “Insights from the software design of a multiphysics multicomponent scientific code” Computing in Science & Engineering, 2021. DOI:10.1109/MCSE.2021.3069343
- Dubey, Anshu, et al. “Flash-X: A multiphysics simulation software instrument.” SoftwareX 19 (2022): 101168. DOI:10.1016/j.softx.2022.101168
- Rudi, Johann, et al. “CG-Kit: Code Generation Toolkit for Performant and Maintainable Variants of Source Code Applied to Flash-X Hydrodynamics Simulations.” arXiv preprint arXiv:2401.03378 (2024).
- O’Neal, Jared, et al. “Domain-specific runtime to orchestrate computation on heterogeneous platforms.” European Conference on Parallel Processing. Cham: Springer International Publishing, 2021. DOI:10.1007/978-3-031-06156-1_13
- Dubey, Anshu, et al. “A tool and a methodology to use macros for abstracting variations in code for different computational demands.” Future Generation Computer Systems (2023). DOI:10.1016/j.future.2023.07.014
- Module 5: Software Testing and Verification
- Module 6: Improving Reproducibility Through Better Software Practices
- Toward a Compatible Reproducibility Taxonomy for Computational and Computing Sciences
- Reproducibility and Replicability in Science
- Many Psychology Findings Not As Strong As Claimed
- The War Over Supercooled Water
- Researchers find bug in Python Script may have affected hundreds of studies
- National Science Foundation Data Management Plan Requirements
- Introducing the FAIR Principles for research software
- FAIR Principles for Data
- ACM Reproducible Computational Results
- ACM Artifact Review and Badging
- http://fursin.net/reproducibility.html
- National Information Standards Organization (NISO) on Reproducibility and Badging
- HPC and the Lab Manager
- Writing the Laboratory Notebook
- DIKW pyramid
- Laboratory Notebook Example
- Helpful Tools
- Floating Point Analysis Tools
- Code Ocean (Cloud platforms - publish and reproduce research code and data)
- DOIs and hosting of data, code, documents:
- Jobrunner
- Other Resources:
- The FAIR Guiding Principles for Scientific Data Management and Stewardship. Mark D. Wilkinson, et al. 2016
- FAIR4RS (previously linked to
www.rd-alliance.org/groups/fair-research-software-fair4rs-wg) - Editorial: ACM TOMS Replicated Computational Results Initiative. Michael A. Heroux. 2015
- Enhancing Reproducibility for Computational Methods
- Simple experiments in reproducibility and technical trust by Mike Heroux and students (work in progress)
- What every scientist should know about floating-point arithmetic. David Goldberg.
- Jupyter4Science: Better Practices for Using Jupyter Notebooks for Science by Nicole Brewer
Requested Citation
The requested citation the overall tutorial is:
Citation details not currently available.
Individual modules may be cited as Speaker, Module Title, in Better Software for Science with High-Performance Computing tutorial…
Acknowledgements
This tutorial is produced by the Consortium for the Advancement of Scientific Software (CASS).
This work was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program.
This work was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.