Enhancing QUOKKA: Performance Gains in GPU-Optimised RHD Code

Published: Aug. 7, 2022

2022A

In many astrophysical systems, radiation fields play a crucial role in transporting energy and momentum. Numerical modelling of such systems requires codes capable of simulating radiation-hydrodynamics (RHD) – the combined set of equations that describes the co-evolution and interaction of a fluid and a radiation field with which it can exchange energy and momentum.

The QUOKKA code is a cutting-edge open-source adaptive mesh refinement RHD code optimised for GPUs. Designed to tackle the intricate dynamics of star formation and galaxy evolution, QUOKKA leverages GPU supercomputing clusters like OzSTAR to perform its intensive computations. Seeking to further enhance QUOKKA’s performance, the science team, led by Prof. Mark Krumholz, engaged ADACS for expert assistance in optimising the code.

During a thorough initial consultation, the QUOKKA developers brought forward several key areas for potential optimisation, which they independently identified from experience with designing the code. Together with ADACS, it was determined that focusing on refactoring GPU kernels and overlapping computation with MPI communication would yield the most significant performance improvements and make best use of the strengths of ADACS developers.

ADACS developers' primary optimisation effort centred on refactoring the implementation of the piecewise parabolic method, a critical component of QUOKKA’s computational workload. By improving the memory access patterns within the GPU kernels, ADACS achieved a notable 13% speedup in the benchmark problem. This enhancement directly contributes to the efficiency of the code, enabling faster computations and better utilisation of GPU resources.

Although the science team had anticipated a two-fold performance improvement from the proposed optimisations, detailed profiling of the code revealed that the existing implementation was already nearing optimal performance, given its dependencies. Nevertheless, ADACS explored a proof-of-concept implementation using OpenMP to overlap computation with MPI communication. This experimental approach demonstrated a modest performance gain of a few percent, indicating potential for further improvement.

Despite the challenges and the realisation that QUOKKA was already highly optimised, the collaborative effort between ADACS and the QUOKKA developers yielded valuable insights and tangible improvements. The refactored GPU kernels alone offer a significant performance boost, and the proof-of-concept for overlapping communication provides a promising direction for future work.

ADACS provided the QUOKKA team with detailed documentation of the changes made, along with the OpenMP implementation as a reference for future code refactoring. These resources will enable the science team to continue enhancing QUOKKA, potentially implementing the full overlapping communication strategy to unlock further performance gains.


Project Image: A 3D Rayleigh-Taylor instability simulated on a 256x256x256 grid.

Project Details

Node: Swinburne University of Technology
Project Length: 12 weeks
Development Team:
  • Patrick Clearwater (project lead)
  • Conrad Chan
  • David Liptai
Research Scientist: Mark Krumholz

Check out some of our other projects.

See all projects.