Loading…
May 5-8, 2025
Chicago, IL
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for the event to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to find out more information.

This schedule is automatically displayed in Central Time (UTC/GMT -6 hours). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Thursday May 8, 2025 10:45am - 12:05pm CDT
1. Leveraging the C Configuration Space and Tuning Library (CCS) in Kokkos Tools - Brice Videau, Argonne National Laboratory (20 minutes)
Online autotuning of runtime and applications presents untapped opportunities to increase HPC application performance and efficiency. During ECP, in order to exploit this potential, the autotuning working group at Argonne National Laboratory and the Kokkos team co-designed the Kokkos Tools tuning API and the C Configuration Space and Tuning Library (CCS). The Kokkos Tools tuning API would create a framework to plug tuners inside Kokkos and expose tuning regions to them, while the CCS library would offer an API to both capture Kokkos configuration spaces and implement tuners to optimize them. This effort led to the creation of the CCS Kokkos connector, a Kokkos tool that leverages both APIs to offer a baseline tuner for Kokkos regions. In this presentation, we will present the result of this collaboration from the perspective of CCS, the abstractions it offers and how they map to Kokkos tuning model. We will describe the capabilities of the CCS library and how it fulfills the goal of offering a standard interface to bridge the gap between tuners and applications/runtimes. We will also discuss the perspectives and future works around the CCS Kokkos connector.

2. Bottlenecks in High-Dimensional Simulations - Nils Schild, Max Planck Institute for Plasma Physics (20 minutes)
The Vlasov-Maxwell system, which describes the motion for charged particles of matter in a plasma state using a particle distribution function, is based on a 6-D phase space defined through configuration and velocity coordinates.
Considering an Eulerian grid for this system with only 32^6 degrees of freedom, the distribution function requires already 8.5 GB of memory. This implies that high-resolution simulations can only be executed on large compute clusters.
In this talk, we focus on two aspects of the open-source code BSL6D to solve a reduced version of the Vlasov-Maxwell system. The shared memory parallelization based on Kokkos applies a stencil algorithm to data, which is non-contiguous in memory, to reduce memory requirements. The inter-node communication bottleneck poses a challenge due to the large halo domain to compute domain ratio. Finally, we discuss the advantages of RAII-managed MPI communicators for distributed domains, simplifying the implementation of parallel algorithms with distributed memory concepts.

3. Accelerating SPECFEM++ with Explicit SIMD and Cache-Optimized Layouts - Rohit Kakodkar, Princeton University (20 minutes)
SPECFEM++ is a suite of computational tools based on the spectral element method used to simulate wave propagation through heterogeneous media. The project aims to unify the legacy SPECFEM codes - three separate Fortran packages (SPECFEM2D, SPECFEM3D, and SPECFEM3D_globe) - into a single C++ package. This new package aims to deliver optimal performance across different architectures by leveraging the Kokkos library. In this presentation, I will outline our efforts to enhance CPU performance using explicit SIMD types (Kokkos::Experimental::simd). High vectorization throughput can be challenging, particularly because the data involved in spectral element assembly is not always organized cache-friendly. To address this, we have implemented a strategy that prefetches the data into cache-optimized scratch views of SIMD types before executing the SIMD operations. Additionally, we have optimized data layouts using custom-defined tiled layouts that improve cache locality. As a result of these optimizations, we have achieved approximately a 2.5x speed-up compared to auto-vectorized implementations.

4. Managing Kokkos Callbacks for Benchmarking, Profiling, and Unit Testing - Maarten Arnst & Romin Tomasetti, University of Liège (20 minutes)
Many Kokkos functions have instrumentation hooks defined within the framework of Kokkos::Tools. These instrumentation hooks allow Kokkos::Tools as well as third-party tracing, profiling and testing tools to register callbacks to monitor and interact with the runtime behavior of the program. In this presentation, we will describe several utilities that we have designed to help manage such callbacks. We have implemented a manager class that can register function objects that can listen to such callbacks. And we have implemented several such function objects, such as an event recorder, an event counter, and a kernel timer that uses event stream synchronization markers on device backends. We will illustrate these utilities through their use in benchmarking, profiling, and unit testing of a Kokkos-based finite-element code.
Speakers
avatar for Brice Videau

Brice Videau

Computer Scientist, Argonne National Laboratory
Brice is a computer scientist, co-leading the performance engineering team at Argonne Leadership Computing Facility. Brice's research topics include heterogeneous programming models, system software, auto-tuning, code generation, and code transformation.
avatar for Maarten Arnst

Maarten Arnst

Associate professor, University of Liege
Associate Professor at University of Liege.
avatar for Nils Schild

Nils Schild

PhD student, Max Planck Institute for Plasma Physics
After studying physics and working on solvers for sparse eigenvalue problems in quantum mechanics at the University of Bayreuth, he moved to the Max Planck Institute for Plasma Physics in Garching (Germany). During his Ph.D., he started implementing the software BSL6D, a solver for... Read More →
avatar for Rohit Kakodkar

Rohit Kakodkar

Research Software Engineer II, Princeton University
Rohit is a Research Software Engineer in Princeton University's Research Computing department. He is focused on rewriting SPECFEM, a spectral element solver designed to simulate wave propagation through heterogeneous media. SPECFEM is extensively used within the computational seismology... Read More →
avatar for Romin Tomasetti

Romin Tomasetti

PhD student, University of Liège
PhD student at University of Liège.
Thursday May 8, 2025 10:45am - 12:05pm CDT
Salon A-C

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link