Better Software for Science with High-Performance Computing

a tutorial presented at

SupercomputingAsia 2026 (SCA 2026)/The International Conference on High Performance Computing in Asia-Pacific Region 2026 (HPCAsia 2026)

on 9:30 am - 4:30 pm JST Monday 26 January 2026

Presenters: Anshu Dubey (Argonne National Laboratory) and Akash Dhruv (Argonne National Laboratory)

This page provides detailed information specific to the tutorial event above. Expect updates to this page up to, and perhaps shortly after, the date of the tutorial. Pages for other tutorial events can be accessed from the main page of this site.

Quick Links

Slides (FigShare)

Description

Producing scientific software is a challenge. The high-performance modeling and simulation community, in particular, faces the confluence of disruptive changes in computing architectures and new opportunities (and demands) for greatly improved simulation capabilities, especially through coupling physics and scales. Simultaneously, computational science and engineering (CSE), as well as other areas of science, are experiencing an increasing focus on scientific reproducibility and software quality. Large language models (LLMs), can significantly increase developer productivity through judicious off-loading of tasks. However, models can hallucinate, therefore it is important to have a good methodology to get the most benefit out of this approach.

We propose a tutorial in which attendees will learn about practices, processes, and tools to improve the productivity of those who develop CSE software, increase the sustainability of software artifacts, and enhance trustworthiness in their use. We will focus on aspects of scientific software development that are not adequately addressed by resources developed for industrial software engineering. We will additionally impart state-of-the-art approaches for using LLMs to enhance developer productivity in the context of scientific software development and maintenance. Topics include the design, test-driven development, refactoring, code translation and testing of complex scientific software systems; and conducting computational experiments with reproducibility built in.

The inclusion of LLM assistance on coding related tasks is particularly important to include in any software productivity concern given that it has the potential to change the way development is done. It is particularly challenging to get this assistance in developing research software because of limited training data. We have developed methodologies and tools for software development and translation that use LLMs. The use of these tools and methodologies for hands-on activities will be a part of this tutorial.

Presenters

Anshu Dubey

Anshu Dubey is a Senior Computational Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory and a Senior Scientist (CASE) at the University of Chicago. She was previously at Lawrence Berkeley National Laboratory, and before that she was the associate director of the Flash Center for Computational Science at the University of Chicago and also led the CS/Applications group, who develop, maintain, and distribute the FLASH code. She received her Ph.D. in computer science from Old Dominion University and a B.Tech. in electrical engineering from the Indian Institute of Technology, New Delhi

Akash Dhruv

Akash Dhruv is an Assistant Computational Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory where he performs research in software design and algorithm development for scientific computing applications. His projects range from fundamental physics-driven engineering research to designing computational workflows that integrate numerical simulations with machine learning and artificial intelligence. He received his Ph.D. in mechanical and aerospace engineering from George Washington University, Washington D.C., and a B.Tech in mechanical engineering from National Institute of Technology, Surat.

Agenda

Time (JST)	Title	Presenter
9:30 AM	Introduction	Anshu Dubey (ANL)
9:45 AM	Motivation and Overview of Best Practices in HPC Software Development	Anshu Dubey (ANL)
10:15 AM	Code Translation from Fortran to C++ using CodeScribe	Akash Dhruv (ANL)
10:45 AM	Morning break
11:15 AM	Scientific Software Design	Anshu Dubey (ANL)
11:50 AM	Software Testing and Verification	Anshu Dubey (ANL)
12:30 PM	Lunch break
1:30 PM	Refactoring	Anshu Dubey (ANL)
2:00 PM	Improving Reproducibility Through Better Software Practices	Akash Dhruv (ANL)
2:45 PM	Afternoon break
3:15 PM	Hands on coding tasks with LLMs	Akash Dhruv (ANL) and Anshu Dubey (ANL)
4:30 PM	Adjourn

Presentation Slides

The latest version of the slides will always be available at https://doi.org/10.6084/m9.figshare.31130062.

Note that these files may include additional slides that will not be discussed during the tutorial, but questions are welcome.

How to Participate

We want to interact with you! We find these tutorials most interesting and informative (for everyone) if you ask questions and share experiences! We learn too!
Please raise your hand at any time to ask a question

Hands-On Activities

Introduction

The hands-on activity for this tutorial involves using a large language model (LLM) for code translation and generation based on specifications (prompts) that you will develop. Participation in these activities is encouraged but not required. After interested participants have had time to try the exercises on their own, the instructor will review their prompts and the resulting code with the class, and these materials will be made available to all participants.

You can participate in the hands-on section in two modes: using the LLM’s web interface, or using CodeScribe, a tool that enables chat-completion through the LLM’s API interface. The code generation objective of the hands-on can be met using the web interface; however, for code translation you will need to do some advance preparation to use CodeScribe. The advantage of using CodeScribe is gaining exposure to the chat-completion technique and becoming familiar with a tool that can be very useful for writing and maintaining code.

The instructors will work in Fortran, C, and C++, but only a surface-level understanding of these languages is required to follow the explanations. For your code generation work, you may prompt the LLM to generate code in the language of your choice. Evaluating the generated code (and revising the prompt accordingly) will be part of the hands-on activity. For the purposes of the tutorial, inspecting the code will be sufficient to gauge its appropriateness. However, if you wish to more rigorously validate the generated code, you will need access to an environment in which you can build and run it, either locally or remotely.

The code translation portion is specific to Fortran, C, and C++. In this case, seed prompts will be provided, and you may choose to use the example in the CodeScribe tutorial repository (https://github.com/akashdhruv/codescribe-tutorial) or create your own Fortran example. If you decide to attempt code translation using your own example, please ensure that the code builds successfully and that you have checks (tests) in place to verify correctness. Keep any customized examples minimal to ensure that you obtain meaningful results during the hands-on exercise, and then apply the workflow to more complex problems later. Do not start with a complex problem.

Advance preparation

If you wish to participate in the hands-on activities, we strongly encourage you to do some preparation before you leave for Osaka. This is especially important if you plan to use CodeScribe, which may require advance interaction with the tool developers to integrate new LLM APIs.

Preparation for using the LLM web interface

You will need access to an LLM chat tool. The instructors will be using ChatGPT, but any comparable LLM, including institutionally supported tools, should work.
(OPTIONAL) If you wish to build and run the code generated by the LLM, you will need access (local or remote) to an appropriate environment. As stated above, you may use any language you prefer for your hands-on activity. The instructors will work in Fortran, C, and C++.

Preparation for using CodeScribe

It is important that you complete this preparation with enough lead time that we can assist you if necessary before HPC Asia starts. We will not be able to provide support for CodeScribe setup issues during the tutorial itself.

You will need API access to an LLM. This goes beyond the web interface and may incur an additional charge on some platforms. However, many institutionally supported LLMs offer API access at no additional cost. You will be responsible for any additional costs incurred. The instructors will be using ChatGPT, but any comparable LLM should work. CodeScribe also supports several freely downloadable models that can be run locally:
The first two models are smaller and may run faster on laptops. Download instructions can be found here: https://huggingface.co/docs/hub/en/models-downloading. Note that if you download Hugging Face models using their CLI, they will be placed in the cache directory (~/.cache/huggingface/hub/<model-name>/snapshots/<sha1>) or your operating system’s equivalent). If you use git clone, you may place the models wherever you prefer.
CodeScribe is written in Python, so you will need a working Python installation on a system that you can access (local or remote) during the tutorial.
Download and install CodeScribe from https://github.com/akashdhruv/CodeScribe
- Installation instructions are provided in the README file: https://github.com/akashdhruv/CodeScribe?tab=readme-ov-file#installation
- You are encouraged to watch the tutorials on installing and using CodeScribe in this Box folder: https://anl.app.box.com/folder/336154643880?s=zv3zdbphqprdz8rjh1c84xpeqd8yg32u. These tutorials were prepared specifically for the code translation portion.
- For instructions on using code generation and update features, you may refer to the tutorial repository: https://github.com/akashdhruv/codescribe-tutorial, which provides minimal examples.
You will need to integrate your CodeScribe installation with the API of your chosen LLM. Basic instructions are provided in the README file: https://github.com/akashdhruv/CodeScribe?tab=readme-ov-file#integrating-llm-of-choice. You may need to add support for your specific model in CodeScribe. To do so, examine the file https://github.com/akashdhruv/CodeScribe/blob/development/code_scribe/lib/_llm.py, copy the class that most closely resembles your target model, and create a pull request in the repository. With sufficient lead time, we will do our best to help make this work. You may also file issues on the CodeScribe repository to request assistance with adding support for a specific LLM.
(OPTIONAL) If you wish to build and run the code generated by the LLM, you will need access (local or remote) to an appropriate environment. As stated above, you may use any language you prefer for your hands-on activity. The instructors will work in Fortran, C, and C++.

During the tutorial

Prompts

We provide the instructor’s prompts as examples, but to get the most out of this hands-on exercise, you should develop your own set of prompts rather than simply pasting ours into your LLM.

prompts.toml

The tutorial repository (https://github.com/akashdhruv/codescribe-tutorial) also contains a prompts directory with prompts for code translation and minimal examples for code generation.

Generated code

The full set of generated code, along with the Makefile and example test inputs and outputs, can be downloaded as a ZIP file.

The individual files are:

The tutorial repository (https://github.com/akashdhruv/codescribe-tutorial) contains the source files used for the code translation example.

Stay in Touch

After the tutorial please feel free to email questions or feedback to the BSSw tutorial team at bssw-tutorial@lists.mcs.anl.gov.
To find out about future events organized by the IDEAS Productivity Project, you can subscribe to our mailing list (usually ~2 messages/month).
For monthly updates on the Better Scientific Software site, subscribe to our monthly digest.

Resources from Presentations

Links from the tutorial presentations are listed here for convenience

Module 1: Introduction
Module 2: Motivation and Overview of Best Practices in HPC Software Development
Module 3: Code Translation from Fortran to C++ using CodeScribe
Module 4: Scientific Software Design
- https://enterprisersproject.com/article/2020/6/technical-debt-explained-plain-english
- Code generation example
  - https://github.com/Flash-X/Flash-X/blob/ylee/try_pushTile_spark/source/physics/Hydro/HydroMain/Spark/Hydro_interface.ini
  - This link will work only if you have access to the Flash-X code repository. Please email flash-x@lists.cels.anl.gov with your github username to get access
- References
  - Dubey Anshu, “Insights from the software design of a multiphysics multicomponent scientific code” Computing in Science & Engineering, 2021. DOI:10.1109/MCSE.2021.3069343
  - Dubey, Anshu, et al. “Flash-X: A multiphysics simulation software instrument.” SoftwareX 19 (2022): 101168. DOI:10.1016/j.softx.2022.101168
  - Rudi, Johann, et al. “CG-Kit: Code Generation Toolkit for Performant and Maintainable Variants of Source Code Applied to Flash-X Hydrodynamics Simulations.” arXiv preprint arXiv:2401.03378 (2024).
  - O’Neal, Jared, et al. “Domain-specific runtime to orchestrate computation on heterogeneous platforms.” European Conference on Parallel Processing. Cham: Springer International Publishing, 2021. DOI:10.1007/978-3-031-06156-1_13
  - Dubey, Anshu, et al. “A tool and a methodology to use macros for abstracting variations in code for different computational demands.” Future Generation Computer Systems (2023). DOI:10.1016/j.future.2023.07.014
Module 5: Software Testing and Verification
Module 6: Improving Reproducibility Through Better Software Practices

Requested Citation

The requested citation the overall tutorial is:

Anshu Dubey and Akash Dhruv, Better Software for Science with High-Performance Computing tutorial, in SupercomputingAsia 2026 (SCA 2026)/The International Conference on High Performance Computing in Asia-Pacific Region 2026 (HPCAsia 2026), Osaka, Japan, 2026. DOI: 10.6084/m9.figshare.31130062.

Individual modules may be cited as Speaker, Module Title, in Better Software for Science with High-Performance Computing tutorial…

Acknowledgements

This tutorial is produced by the Consortium for the Advancement of Scientific Software (CASS).

This work was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program.

This work was supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.