Optimising GPU Performance for the QUOKKA Code

CI: – Dr. M. Krumholz

QUOKKA is a new open-source adaptive mesh refinement radiation-hydrodynamics (RHD) code that is optimised for GPUs. The code solves the Euler equations of compressible gas dynamics coupled to a two-moment formulation of the equation of radiative transfer, all on an adaptive mesh. It will be suitable for a wide range of problems in star formation and galaxy evolution, where radiative transfer of energy and momentum plays a significant part in the dynamics. The code passes a wide range of tests that demonstrate its accuracy, and it achieves extreme performance, reaching a peak performance of almost 100 million zone updates per second per GPU for pure hydrodynamics, and 20 million zone updates per second per GPU for RHD; the RHD performance of QUOKKA is faster than the performance on pure hydrodynamics achieved by CPU-based codes.
However, there are likely significant opportunities to improve code performance. We find a factor of two performance reduction in going from a single GPU to multiple GPUs on a node (though with little further degradation in going to multiple nodes), likely as a result of limited memory bandwidth. In these cases the code spends 30-40% of its time filling ghost zones, a cost that could likely be reduced by better overlapping communication and computation. Similarly, the code performs approximately a factor of two more slowly in AMR problems, again likely as a result of memory bandwidth limitations that could be mitigated by optimising communication. Moreover, it is likely that the GPU kernel itself could be tuned to achieve better performance by altering loop ordering and memory access paterns. We request ADACS assistance in tuning the code to achieve optimal performance.