1. Gyselalib++: A Portable, Kokkos-Based Library for Exascale Gyrokinetic Simulations - Etienne Malaboeuf, CINES (10 minutes)
The development of fusion energy in magnetic confinement devices relies heavily on simulations of plasma behavior. Gyselalib++ is a new open-source C++ library under active development by a European distributed and multidisciplinary team of physicists, mathematicians, and computer scientists at EPFL, CEA/IRFM, Maison de la Simulation, IPP Garching, and CINES. Gyselalib++ is itself built on top of PDI, DDC and Kokkos and provides mathematical tools for gyrokinetic semi-Lagrangian codes for tokamak plasma simulations. This presentation will introduce the library, its design and the rationale behind its development, and will highlight its key features. It will showcase how the choice of Kokkos made it possible to achieve high performance on modern hardware with performance portability over a wide range of hardware, and will explain the need to introduce DDC to improve development safety. We will discuss feedback from this experience, analyze our successes and the limitations of the approach, especially when it comes to performance, performance portability, and programmability of the code by a highly diverse team in terms of background.
2. Expression Templates with Kokkos for Lattice QCD - Travis Whyte, Jülich Supercomputing Centre (10 minutes)
Lattice quantum chromodynamics (QCD) is a first principles approach to studying the interaction of quarks and gluons. The calculation of observables in lattice QCD requires many different operations between multidimensional arrays of various ranks. In this talk, I will describe an implementation of expression templates using Kokkos that allows for lattice QCD practitioners to simply implement linear algebra operations while avoiding temporaries for views of arbitrary rank. This abstraction has the potential to promote high productivity in the development process. The performance of various benchmarks on different architectures will also be discussed.
3. Bridging Parallel Communication and On-Node Computation with Kokkos - Evan Suggs, Tennessee Technological University (20 minutes)
Although MPI and Kokkos have long been used together, there were no well-defined methods for integrating them effectively. The only approach is to point the underlying Kokkos View buffers to MPI functions.
This causes several major pain points: handling non-contiguous Views, asynchronous operations in both models, and how MPI interacts with Kokkos Profiling. Kokkos Comm is an experimental MPI interface for the Kokkos C++ Performance Portability Programming ecosystem that aims to address these concerns and improve the productivity of Kokkos users.
Currently, Kokkos Comm integrates point-to-point collectives, handling of non-contiguous Views, and Kokkos Tools Profiling. Kokkos Comm also aims to be a springboard for new and improved features that go beyond MPI and Kokkos, allowing Kokkos to work with MPI, stream-triggered MPIs, and other non-MPI communication libraries (e.g., NCCL and RCCL). This presentation will cover the Kokkos Comm API, conversion of existing code, best practices, how Kokkos Comm can help address common issues in Kokkos/MPI, and upcoming additions to Kokkos Comm, such as persistent communication and device-initiated communication.
4. Integration of PETSc, Kokkos Core, and Kernels for Performance Portability in the Age of Accelerators - Junchao Zhang, Argonne National Laboratory (20 minutes)
PETSc, the Portable, Extensible Toolkit for Scientific Computation, provides an extensive suite of scalable parallel solvers for linear and nonlinear equations, ordinary differential equation (ODE) integrators, and optimization algorithms. Widely adopted in both industry and academia, PETSc historically achieved performance portability through the C programming language and the Message Passing Interface (MPI) programming model. It used single-threaded MPI processes for both shared and distributed memory systems. This strategy had served us very well in the microprocessor age. However, the recent proliferation of accelerator-based architectures, particularly graphics processing units (GPUs), has posed new challenges to this performance portability. To address these challenges, we have integrated PETSc with the Kokkos ecosystem, specifically Kokkos-Core and Kokkos-Kernels. In this presentation, we describe our integration approach, highlight our experiences—both effective strategies and encountered challenges—and outline future developments aimed at further enhancing performance portability across evolving computational architectures.
5. Parallel Sweep Algorithms for Cartesian and Honeycomb Grids - Ansar Calloo, CEA (20 minutes)
The linear Boltzmann transport equation (BTE) is the governing equation for expressing the behaviour of neutral particles in a system such as a nuclear reactor. BTE can be solved for the flux of particles using deterministic methods whereby the equation is discretised in the phase space of is fundamental variables. This discrete equation is then usually solved using the source iteration. In this talk, we will present how the sweep algorithm which is based upon a wavefront pattern has been optimised in the context of SMP for CPU and also some preliminary results on GPU. The goal is to show how to adapt the sweep algorithm to be efficient on new supercomputer architectures.We will briefly introduce DONUT (Discrete Ordinates NeUtron Transport), a modern C++ miniapp for solving BTE based on the discrete ordinates and discontinuous Galerkin discretisations for Cartesian and honeycomb grids.