Loading…
May 5-8, 2025
Chicago, IL
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for the event to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to find out more information.

This schedule is automatically displayed in Central Time (UTC/GMT -6 hours). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Venue: Salon A-C clear filter
Wednesday, May 7
 

9:00am CDT

Updates from the Kokkos Team
Wednesday May 7, 2025 9:00am - 10:20am CDT
1. Welcome and overview, Damien Lebrun-Grandie, Oak Ridge National Laboratory (10 minutes)

2. Update on the Ecosystem and Community, Christian Trott, Sandia National Laboratories (20 minutes)

3. Kokkos Core update, Damien Lebrun-Grandie, Oak Ridge National Laboratory (30 minutes)

4. Kokkos-Kernels update, Luc Berger-Vergiat, Sandia National Laboratories (20 minutes)
Speakers
LB

Luc Berger-Vergiat

Sandia National Laboratories
DL

Damien Lebrun-Grandie

Senior Computational Scientist, Oak Ridge National Laboratory
CT

Christian Trott

Distinguished Member of Technical Staff, Sandia National Laboratories
Christian Trott is a High Performance Computing expert at Sandia National Laboratories, where he co-leads the Kokkos core team, developing performance portability solutions for engineering and science applications. He heads Sandia's delegation to the ISO C++ committee and is a principal... Read More →
Wednesday May 7, 2025 9:00am - 10:20am CDT
Salon A-C

10:45am CDT

Kokkos in Applications
Wednesday May 7, 2025 10:45am - 12:05pm CDT
1. FleCSI Applications, Ben Bergen & Hyun Lim, Los Alamos National Laboratory (10 minutes)
The Flexible Computational Science Infrastructure (FleCSI) programming system provides a clutter-free environment that allows developers to focus on the arithmetic operations of their methods without the distraction of computer science details that are often visible in legacy simulation codes. To this end, FleSCI provides light-weight wrappers over the raw Kokkos interface that resemble native C++ keywords, e.g., forall. Using this design philosophy, we have been able to evolve our support to cover various Kokkos policies and execution spaces. HARD is a FleCSI-based application for radiation hydrodynamics that is performance portable across a variety of systems, e.g., El Capitan, Venado, and Crossroads, and inherits FleCSI’s support for multiple distributed-memory and tasking backends, e.g., Legion, HPX, and MPI. In this talk, we will demonstrate the basic data-parallel interface with implementation and usage examples. We will also present results for several test problems in inertial confinement fusion with comparisons between different backends and performance assessments in different heterogeneous computing environments.

2. DDC: A Performance Portable Library Abstracting Computation on Discrete Domains, Thomas Padioleau, CEA Paris-Saclay (10 minutes)
The Discrete Domain Computation (DDC) library is a modern C++ library that aims to offer to the C++ world an equivalent to the xarray.DataArray Python environment. The Xarray library introduces labeled multidimensional arrays, enabling more intuitive data manipulation by associating dimensions with user-provided names rather than relying on positional indexing. This approach simplifies indexing, slicing, and broadcasting while reducing common indexing errors. Inspired by these ideas, DDC extends the Kokkos library providing zero-overhead dimension labeling for multidimensional arrays along with performance-portable multidimensional algorithms. This labeling mechanism enables compile-time detection of indexing and slicing errors, ensuring safer and more expressive array operations in C++. In this presentation, we will introduce the core concepts of DDC and demonstrate its usage through a simple example that highlights its key features.

3. TChem-atm - A Performance Portable Chemistry Solver for Atmospheric Chemistry, Oscar Diaz-Ibarra, Sandia National Laboratories (20 minutes)
TChem-atm (https://github.com/PCLAeroParams/TChem-atm) is a performance-portable software library designed to support atmospheric chemistry applications, specifically computing source term Jacobian matrices. The software utilizes Kokkos as its portability layer, preparing it for next-generation computing architectures. The software interface employs a hierarchical parallelism design to leverage the massive parallelism available on modern computing platforms, including model parallelism, batch parallelism, and nested parallelism for each problem instance. Additionally, TChem-atm is designed to be coupled with third-party libraries that may be used to advance the state of gas and particle species over time, notably interfacing with the Tines, Kokkos-kernels, and Sundials libraries. We have tested TChem-atm in two scenarios: using a typical reaction mechanism in atmospheric science and an example involving multiple aerosol particles. This testing framework allows us to evaluate our code by varying the number of evaluations and the size of the source term (right-hand side). Finally, we report performance measurements using the CUDA, HIP, and OpenMP back ends.

4. GPU Porting of the TRUST CFD Platform with Kokkos, Rémi Bourgeois, French Atomic Energy Commission (CEA) (20 minutes)
TRUST is a High Performance Computing thermohydraulic platform for Computational Fluid Dynamics developed at the French Atomic Energy Commission (CEA). This software is designed for massively parallel (MPI) simulations of conduction, incompressible single-phase, and Low Mach Number (LMN) flows with a Weakly-Compressible multi-species solver and compressible multi-phase flows. It is used as the basis for many specialised applications in the nuclear and new energy fields across CEA. The code is being progressively ported to support GPU acceleration (Nvidia/AMD/Intel) thanks to the Kokkos library, as it is one of the demonstrators of the CExA project. In this talk we will go over our experience using Kokkos to progressively port our large code base. We will cover our enabled GPU features and performances. We will mention some of the difficulties we encountered as well as the strategies we had to adopt that sometimes differ from standard good practices due to the specificity of our application.

5. Omega: Towards a Performance-portable Ocean Model using Kokkos, Maciej Waruszewski, Sandia National Laboratories (20 minutes)
High-resolution simulations of the Earth system require resources available only on the world's largest supercomputers, which are increasingly based on GPUs. However, CPU-based systems are still frequently used to conduct simulations at coarse resolutions. To be able to take advantage of all compute platforms, we are developing Omega: the Ocean Model for E3SM Global Applications, a new ocean model written in C++ using Kokkos for performance portability. Omega will replace MPAS-Ocean to become the new ocean component of the DOE’s Energy Exascale Earth System Model (E3SM). Omega is an unstructured mesh ocean model based on the same finite-volume scheme as the current ocean component. Work on Omega began in 2023. Currently, Omega is a layered shallow water model with passive tracers. While still simple, this initial version can run on realistic size meshes and contains computational kernels representative of the full model horizontal numerics. After briefly describing Omega, this talk will go into our experiences with Kokkos and present initial performance results from a variety of compute platforms.)
Speakers
avatar for Ben Bergen

Ben Bergen

Scientist, Los Alamos National Laboratory
Ben Bergen is a computational scientist working on runtime systems, data structures, and applications development.
avatar for Hyun Lim

Hyun Lim

Scientist, Los Alamos National Laboratory
Hyun Lim is a staff scientist in CCS-7. Hyun has a background in theoretical and computational astrophysics, gravitational physics, and numerical methods.
avatar for Maciej Waruszewski

Maciej Waruszewski

R&D Computer Science, Sandia National Laboratories
Maciej is a computer scientist at Sandia National Laboratories. He is one of the developers of the DOE’s Energy Exascale Earth System Model (E3SM). He holds a PhD in atmospheric physics from the University of Warsaw.
avatar for Oscar Diaz-Ibarra

Oscar Diaz-Ibarra

Senior member of the technical staff, Sandia National Laboratories
Oscar is a senior member of the technical staff at Sandia National Laboratories, specializing in high-performance applications for atmospheric chemistry using Kokkos and modern C++. He holds a Ph.D. in chemical engineering from the University of Utah and has over 7 years of experience... Read More →
avatar for Rémi Bourgeois

Rémi Bourgeois

Researcher / Engineer, French Atomic Energy Commission (CEA)
Rémi Bourgeois is a French researcher/engineer at CEA Saclay, specializing in HPC and numerical analysis for the TRUST platform, a massively parallel thermo-hydraulic simulation tool. He earned his PhD at CEA, focusing on MHD convection, developing finite-volume methods and GPU-based... Read More →
avatar for Thomas Padioleau

Thomas Padioleau

Engineer-Researcher, CEA
Dr. Thomas Padioleau is a CEA Engineer-Researcher at Maison de la Simulation. He leads the DDC project and also works on Voice++.
Wednesday May 7, 2025 10:45am - 12:05pm CDT
Salon A-C
 
Thursday, May 8
 

9:00am CDT

Broader Kokkos Ecosystem
Thursday May 8, 2025 9:00am - 10:20am CDT
1. kokkos-fft Updates – Yuuichi Asahi, CEA (10 minutes)
kokkos-fft implements local interfaces between Kokkos and de facto standard FFT libraries, including fftw, cufft, hipfft (rocfft), and oneMKL. We are inclined to implement the numpy.fft-like interfaces adapted for Kokkos. A key concept is that "As easy as numpy, as fast as vendor libraries". In the talk, we will introduce the basic APIs and typical use cases. We will also present future development plans.

2. Fortran Porting Wish List for Kokkos – Matthew Norman, Oak Ridge National Laboratory (10 minutes)
This presentation covers the beginnings of the Yet Another Kernel Launcher (YAKL) C++ portability library, its evolution alongside Kokkos, the use of Kokkos in its current form, and remaining issues before it can be retired in lieu of using Kokkos instead. The primary outstanding issues are the inclusion of arbitrary lower bounds for Fortran-like View behavior and the ability to use an underlying pool allocator for Views for cheap frequent device allocation and deallocation so that Views can be locally created and destroyed only where needed rather than existing for the global lifetime of simulations. This may improve readability and reduce the memory high water mark in simulations. A few performance related issues will be covered as well, mainly limited to MDRangePolicy and parallel_for register usage.

3. Custom Layout and Tiling for Multi-Dimensional Data – Cedric Chevalier & Gabriel Dos Santos, CEA (10 minutes)
Performance optimizations for exascale HPC applications primarily rely on fine-tuning implementations, requiring comprehensive knowledge of heterogeneous hardware architectures that domain experts often lack. One of Kokkos' biggest successes is tying the memory layout of multi-dimensional arrays to the execution backend. It allows the exploitation of coalescence or cache, depending on the hardware. Here, we propose to go further and design custom tiled layouts that are generic for C++23's std::mdspan. Instead of running tile algorithms on flat data, like Kokkos' mdrange, we want to explore how running flat algorithms on tiled data performs. On CPU, the first experimental results with std::mdspan on a naive dense matrix multiplication demonstrate that, by replacing standard layouts with our proposed solution, we achieve an average speedup of over 2.2x, with peak performance improvements of up to 7.8x. Then, we will discuss how external indexing can improve efficiency. We will present how to exploit it with Kokkos' mdrange algorithm, and how it can behave on GPU.

4. Runtime Auto-Tuning for Kokkos Applications with APEX – Kevin Huck, University of Oregon (10 minutes)
Traditional GPU programming with libraries like CUDA or HIP requires tuning parameters exposed to the user, for example block sizes or number of teams. Kokkos also exposes portable parameters to the Kokkos user. How can Kokkos application programmers easily tune these Kokkos parameters for their application’s deployment when using any given Kokkos backend, without incurring large overheads? In particular, how do we ensure the tuning itself is portable across platforms? We propose using online, i.e., runtime, autotuning, utilizing the APEX Kokkos Tools connector to tune exposed parameters. Specifically, we discuss the Kokkos Tools Tuning Interface, tuning contexts, variable definition, the APEX runtime auto-tuning library utilizing Kokkos Tools, and distributed Kokkos auto-tuning. Applying our auto-tuning approaches to Kokkos sample kernels on Perlmutter and Frontier, we have obtained promising performance results. These results suggest Kokkos online auto-tuning is beneficial for production applications, and we invite Kokkos users to try these features and for Kokkos developers to contribute.

5. Unifying the HPC Ecosystem with std::execution – Mikael Simberg, Swiss National Supercomputing Centre (20 minutes)
Asynchronous programming models are becoming increasingly essential for fully leveraging modern hardware. In the C++ ecosystem, projects typically provide ad-hoc and varying interfaces, making interoperability difficult. Recently approved for C++26, the std::execution library promises to unify the ecosystem by providing a standard, composable interface for asynchronous operations. This talk briefly introduces the motivation and design principles of std::execution, and shares our experiences using it prior to standardization at CSCS in various projects, including Kokkos, HPX, and more. We'll discuss challenges, successes, and opportunities encountered while adopting std::execution.

6. PyKokkos: Performance Portability for Python Developers – Milos Gligoric, The University of Texas at Austin (20 minutes)
Kokkos is a programming model for writing performance portable applications for all major high performance computing platforms. It provides abstractions for data management and common parallel operations, allowing developers to write portable high performance code with minimal knowledge of architecture-specific details. Kokkos is implemented as a heavily-templated C++ library. However, C++ is not ideal for rapid prototyping and quick algorithmic exploration. An increasing number of developers use Python for scientific computing, machine learning, and data analytics. In this talk, I will present a new Python framework, PyKokkos, for writing performance portable applications entirely in Python. PyKokkos provides Kokkos-like abstractions that are easier to use and more concise than the C++ interface. We implemented PyKokkos by building a translator from a subset of Python to C++ Kokkos and bridging necessary function calls via automatically generated Python bindings. I will also cover our recent work on automatic kernel fusion with the goal to optimize PyKokkos applications. The talk will also cover our experience on developing PyKokkos, its current limitations, and future plans.
Speakers
avatar for Kevin Huck

Kevin Huck

Senior Research Associate, University of Oregon
Kevin Huck is a Senior Research Associate in the Oregon Advanced Computing Institute for Science and Society (OACISS) at the University of Oregon. He is interested in the unique problems of performance analysis of large HPC applications as well as automated methods for diagnosing... Read More →
avatar for Cedric Chevalier

Cedric Chevalier

Research Scientist, CEA
Cédric Chevalier is a research scientist at CEA in France. He is interested in developing libraries for HPC simulation codes, particularly in Linear Algebra and Mesh/Graph partitioning. His work at CEA is led by providing practical ways to exploit newer hardware, use new programming... Read More →
avatar for Gabriel Dos Santos

Gabriel Dos Santos

PhD Student, CEA
PhD student on the management of data structures representations in heterogeneous architecture for exascale-class HPC workloads, with a strong background in performance optimization, CPU microarchitectures and vectorization.
avatar for Matthew Norman

Matthew Norman

Climate Scientist, Oak Ridge National Laboratory
Matt Norman leads the Advanced Computing for Life Sciences and Engineering group in the Oak Ridge Leadership Computing Facility (OLCF). He works with weather and climate simulation, urban and wind turbine simulation, PDE discretizations for the Navier-Stokes Equations, GPU acceleration... Read More →
avatar for Mikael Simberg

Mikael Simberg

HPC Application Engineer, Swiss National Supercomputing Centre
Mikael Simberg holds a master's degree in operations research and computer science from Aalto University in Finland. He joined the Swiss National Supercomputing Centre in 2017 where he works as a software developer helping scientific projects make the best use of modern hardware through... Read More →
avatar for Milos Gligoric

Milos Gligoric

Associate Professor, The University of Texas at Austin
Milos Gligoric is an Associate Professor in Electrical and Computer Engineering at The University of Texas at Austin where he holds the Archie W. Straiton Endowed Faculty Fellowship in Engineering. His research interests are in software engineering, especially in designing techniques... Read More →
avatar for Yuuichi Asahi

Yuuichi Asahi

Research Scientist, CEA
His recent interests are HPC and AI with NVIDIA, AMD and Intel GPUs. He has a rich experience in GPU programming models including CUDA, HIP, SYCL, Kokkos, OpenMP, OpenACC, thrust, stdpar, and senders/receivers. For exascale computing, he is highly interested in improving performance... Read More →
Thursday May 8, 2025 9:00am - 10:20am CDT
Salon A-C

10:45am CDT

Tuning and Performance
Thursday May 8, 2025 10:45am - 12:05pm CDT
1. Leveraging the C Configuration Space and Tuning Library (CCS) in Kokkos Tools - Brice Videau, Argonne National Laboratory (20 minutes)
Online autotuning of runtime and applications presents untapped opportunities to increase HPC application performance and efficiency. During ECP, in order to exploit this potential, the autotuning working group at Argonne National Laboratory and the Kokkos team co-designed the Kokkos Tools tuning API and the C Configuration Space and Tuning Library (CCS). The Kokkos Tools tuning API would create a framework to plug tuners inside Kokkos and expose tuning regions to them, while the CCS library would offer an API to both capture Kokkos configuration spaces and implement tuners to optimize them. This effort led to the creation of the CCS Kokkos connector, a Kokkos tool that leverages both APIs to offer a baseline tuner for Kokkos regions. In this presentation, we will present the result of this collaboration from the perspective of CCS, the abstractions it offers and how they map to Kokkos tuning model. We will describe the capabilities of the CCS library and how it fulfills the goal of offering a standard interface to bridge the gap between tuners and applications/runtimes. We will also discuss the perspectives and future works around the CCS Kokkos connector.

2. Bottlenecks in High-Dimensional Simulations - Nils Schild, Max Planck Institute for Plasma Physics (20 minutes)
The Vlasov-Maxwell system, which describes the motion for charged particles of matter in a plasma state using a particle distribution function, is based on a 6-D phase space defined through configuration and velocity coordinates.
Considering an Eulerian grid for this system with only 32^6 degrees of freedom, the distribution function requires already 8.5 GB of memory. This implies that high-resolution simulations can only be executed on large compute clusters.
In this talk, we focus on two aspects of the open-source code BSL6D to solve a reduced version of the Vlasov-Maxwell system. The shared memory parallelization based on Kokkos applies a stencil algorithm to data, which is non-contiguous in memory, to reduce memory requirements. The inter-node communication bottleneck poses a challenge due to the large halo domain to compute domain ratio. Finally, we discuss the advantages of RAII-managed MPI communicators for distributed domains, simplifying the implementation of parallel algorithms with distributed memory concepts.

3. Accelerating SPECFEM++ with Explicit SIMD and Cache-Optimized Layouts - Rohit Kakodkar, Princeton University (20 minutes)
SPECFEM++ is a suite of computational tools based on the spectral element method used to simulate wave propagation through heterogeneous media. The project aims to unify the legacy SPECFEM codes - three separate Fortran packages (SPECFEM2D, SPECFEM3D, and SPECFEM3D_globe) - into a single C++ package. This new package aims to deliver optimal performance across different architectures by leveraging the Kokkos library. In this presentation, I will outline our efforts to enhance CPU performance using explicit SIMD types (Kokkos::Experimental::simd). High vectorization throughput can be challenging, particularly because the data involved in spectral element assembly is not always organized cache-friendly. To address this, we have implemented a strategy that prefetches the data into cache-optimized scratch views of SIMD types before executing the SIMD operations. Additionally, we have optimized data layouts using custom-defined tiled layouts that improve cache locality. As a result of these optimizations, we have achieved approximately a 2.5x speed-up compared to auto-vectorized implementations.

4. Managing Kokkos Callbacks for Benchmarking, Profiling, and Unit Testing - Maarten Arnst & Romin Tomasetti, University of Liège (20 minutes)
Many Kokkos functions have instrumentation hooks defined within the framework of Kokkos::Tools. These instrumentation hooks allow Kokkos::Tools as well as third-party tracing, profiling and testing tools to register callbacks to monitor and interact with the runtime behavior of the program. In this presentation, we will describe several utilities that we have designed to help manage such callbacks. We have implemented a manager class that can register function objects that can listen to such callbacks. And we have implemented several such function objects, such as an event recorder, an event counter, and a kernel timer that uses event stream synchronization markers on device backends. We will illustrate these utilities through their use in benchmarking, profiling, and unit testing of a Kokkos-based finite-element code.
Speakers
avatar for Brice Videau

Brice Videau

Computer Scientist, Argonne National Laboratory
Brice is a computer scientist, co-leading the performance engineering team at Argonne Leadership Computing Facility. Brice's research topics include heterogeneous programming models, system software, auto-tuning, code generation, and code transformation.
avatar for Maarten Arnst

Maarten Arnst

Associate professor, University of Liege
Associate Professor at University of Liege.
avatar for Nils Schild

Nils Schild

PhD student, Max Planck Institute for Plasma Physics
After studying physics and working on solvers for sparse eigenvalue problems in quantum mechanics at the University of Bayreuth, he moved to the Max Planck Institute for Plasma Physics in Garching (Germany). During his Ph.D., he started implementing the software BSL6D, a solver for... Read More →
avatar for Rohit Kakodkar

Rohit Kakodkar

Research Software Engineer II, Princeton University
Rohit is a Research Software Engineer in Princeton University's Research Computing department. He is focused on rewriting SPECFEM, a spectral element solver designed to simulate wave propagation through heterogeneous media. SPECFEM is extensively used within the computational seismology... Read More →
avatar for Romin Tomasetti

Romin Tomasetti

PhD student, University of Liège
PhD student at University of Liège.
Thursday May 8, 2025 10:45am - 12:05pm CDT
Salon A-C

1:35pm CDT

Algorithms
Thursday May 8, 2025 1:35pm - 3:15pm CDT
1. Gyselalib++: A Portable, Kokkos-Based Library for Exascale Gyrokinetic Simulations - Etienne Malaboeuf, CINES (10 minutes)
The development of fusion energy in magnetic confinement devices relies heavily on simulations of plasma behavior. Gyselalib++ is a new open-source C++ library under active development by a European distributed and multidisciplinary team of physicists, mathematicians, and computer scientists at EPFL, CEA/IRFM, Maison de la Simulation, IPP Garching, and CINES. Gyselalib++ is itself built on top of PDI, DDC and Kokkos and provides mathematical tools for gyrokinetic semi-Lagrangian codes for tokamak plasma simulations. This presentation will introduce the library, its design and the rationale behind its development, and will highlight its key features. It will showcase how the choice of Kokkos made it possible to achieve high performance on modern hardware with performance portability over a wide range of hardware, and will explain the need to introduce DDC to improve development safety. We will discuss feedback from this experience, analyze our successes and the limitations of the approach, especially when it comes to performance, performance portability, and programmability of the code by a highly diverse team in terms of background.

2. Expression Templates with Kokkos for Lattice QCD - Travis Whyte, Jülich Supercomputing Centre (10 minutes)
Lattice quantum chromodynamics (QCD) is a first principles approach to studying the interaction of quarks and gluons. The calculation of observables in lattice QCD requires many different operations between multidimensional arrays of various ranks. In this talk, I will describe an implementation of expression templates using Kokkos that allows for lattice QCD practitioners to simply implement linear algebra operations while avoiding temporaries for views of arbitrary rank. This abstraction has the potential to promote high productivity in the development process. The performance of various benchmarks on different architectures will also be discussed.

3. Bridging Parallel Communication and On-Node Computation with Kokkos - Evan Suggs, Tennessee Technological University (20 minutes)
Although MPI and Kokkos have long been used together, there were no well-defined methods for integrating them effectively. The only approach is to point the underlying Kokkos View buffers to MPI functions.
This causes several major pain points: handling non-contiguous Views, asynchronous operations in both models, and how MPI interacts with Kokkos Profiling. Kokkos Comm is an experimental MPI interface for the Kokkos C++ Performance Portability Programming ecosystem that aims to address these concerns and improve the productivity of Kokkos users.
Currently, Kokkos Comm integrates point-to-point collectives, handling of non-contiguous Views, and Kokkos Tools Profiling. Kokkos Comm also aims to be a springboard for new and improved features that go beyond MPI and Kokkos, allowing Kokkos to work with MPI, stream-triggered MPIs, and other non-MPI communication libraries (e.g., NCCL and RCCL). This presentation will cover the Kokkos Comm API, conversion of existing code, best practices, how Kokkos Comm can help address common issues in Kokkos/MPI, and upcoming additions to Kokkos Comm, such as persistent communication and device-initiated communication.

4. Integration of PETSc, Kokkos Core, and Kernels for Performance Portability in the Age of Accelerators - Junchao Zhang, Argonne National Laboratory (20 minutes)
PETSc, the Portable, Extensible Toolkit for Scientific Computation, provides an extensive suite of scalable parallel solvers for linear and nonlinear equations, ordinary differential equation (ODE) integrators, and optimization algorithms. Widely adopted in both industry and academia, PETSc historically achieved performance portability through the C programming language and the Message Passing Interface (MPI) programming model. It used single-threaded MPI processes for both shared and distributed memory systems. This strategy had served us very well in the microprocessor age. However, the recent proliferation of accelerator-based architectures, particularly graphics processing units (GPUs), has posed new challenges to this performance portability. To address these challenges, we have integrated PETSc with the Kokkos ecosystem, specifically Kokkos-Core and Kokkos-Kernels. In this presentation, we describe our integration approach, highlight our experiences—both effective strategies and encountered challenges—and outline future developments aimed at further enhancing performance portability across evolving computational architectures.

5. Parallel Sweep Algorithms for Cartesian and Honeycomb Grids - Ansar Calloo, CEA (20 minutes)
The linear Boltzmann transport equation (BTE) is the governing equation for expressing the behaviour of neutral particles in a system such as a nuclear reactor. BTE can be solved for the flux of particles using deterministic methods whereby the equation is discretised in the phase space of is fundamental variables. This discrete equation is then usually solved using the source iteration. In this talk, we will present how the sweep algorithm which is based upon a wavefront pattern has been optimised in the context of SMP for CPU and also some preliminary results on GPU. The goal is to show how to adapt the sweep algorithm to be efficient on new supercomputer architectures.We will briefly introduce DONUT (Discrete Ordinates NeUtron Transport), a modern C++ miniapp for solving BTE based on the discrete ordinates and discontinuous Galerkin discretisations for Cartesian and honeycomb grids.
Speakers
avatar for Ansar Calloo

Ansar Calloo

Research engineer, CEA
Ansar obtained his PhD in deterministic neutron transport at CEA. For the past fifteen years, he has been working on improving simulations for reactor physics applications first at EDF R&D, then CEA. His research interests involve nuclear reactor model, numerical methods to solve... Read More →
avatar for Etienne Malaboeuf

Etienne Malaboeuf

HPC Engineer, CINES/CEA
I focus on improving the performance of projects related to real-time and high-performance computing, while providing various forms of support to researchers using French supercomputers. I have worked on numerical simulation software in an HPC context, on supercomputers and on game... Read More →
avatar for Evan Suggs

Evan Suggs

Staff Researcher, Tennessee Technological University
Evan Drake Suggs is a Research Scientist at Tennessee Technological University in Cookeville, Tennessee. In 2023, Suggs graduated with a Master's degree in Data Science from the University of Tennessee at Chattanooga and presented his thesis work on MPI+Kokkos using the ExaMPI implementation... Read More →
avatar for Junchao Zhang

Junchao Zhang

Principal Specialist, Research Software Engineering, Argonne National Laboratory
Junchao Zhang is a software developer at Argonne. He currently works on the Portable, Extensible Toolkit for Scientific Computation (PETSc) project. Before joining PETSc, he was an MPICH developer at Argonne and developed the MPI Fortran 2008 binding and MPI tool interface of MPI-3.0... Read More →
avatar for Travis Whyte

Travis Whyte

Postdoc, Jülich Supercomputing Centre
I graduated from Baylor University with a Ph.D. in Physics, focusing on algorithmic improvements for lattice QCD simulations. Since then, I have continued to work in the field, focusing on improving iterative solvers, scattering simulations and HPC software development.
Thursday May 8, 2025 1:35pm - 3:15pm CDT
Salon A-C

3:40pm CDT

Panel Discussion to be Announced
Thursday May 8, 2025 3:40pm - 5:00pm CDT
Thursday May 8, 2025 3:40pm - 5:00pm CDT
Salon A-C
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.