HPSF Conference 2025: Full Schedule

May 5-8, 2025
Chicago, IL
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for the event to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to find out more information.

This schedule is automatically displayed in Central Time (UTC/GMT -6 hours). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

9:00am CDT

Welcome and Overview - Damien Lebrun-Grandie, Oak Ridge National Laboratory

Wednesday May 7, 2025 9:00am - 9:10am CDT

Salon A-C

Speakers

Damien Lebrun-Grandie

Senior Computational Scientist, ORNL

Damien Lebrun-Grandié is a Computational Scientist at Oak Ridge National Laboratory and co-leads the Kokkos performance portability project. He also serves on the ISO C++ Standards Committee, leveraging his expertise in high-performance computing to advance scientific applicatio... Read More →

Wednesday May 7, 2025 9:00am - 9:10am CDT
Salon A-C

Kokkos User Group Meeting, Updates from the Kokkos Team

9:10am CDT

Update on the Ecosystem and Community - Christian Trott, Sandia National Laboratories

Wednesday May 7, 2025 9:10am - 9:30am CDT

Salon A-C

Speakers

Christian Trott

Distinguished Member of Technical Staff, Sandia National Laboratories

Christian Trott is a High Performance Computing expert at Sandia National Laboratories, where he co-leads the Kokkos core team, developing performance portability solutions for engineering and science applications. He heads Sandia's delegation to the ISO C++ committee and is a principal... Read More →

Wednesday May 7, 2025 9:10am - 9:30am CDT
Salon A-C

Kokkos User Group Meeting, Updates from the Kokkos Team

9:30am CDT

Kokkos Core Update - Damien Lebrun-Grandie, Oak Ridge National Laboratory

Wednesday May 7, 2025 9:30am - 10:00am CDT

Salon A-C

Speakers

Damien Lebrun-Grandie

Senior Computational Scientist, ORNL

Wednesday May 7, 2025 9:30am - 10:00am CDT
Salon A-C

Kokkos User Group Meeting, Updates from the Kokkos Team

10:00am CDT

Kokkos-Kernels Update - Luc Berger-Vergiat, Sandia National Laboratories

Wednesday May 7, 2025 10:00am - 10:20am CDT

Salon A-C

Speakers

Luc Berger-Vergiat

Research Scientist, Sandia National Laboratories

Wednesday May 7, 2025 10:00am - 10:20am CDT
Salon A-C

Kokkos User Group Meeting, Updates from the Kokkos Team

10:45am CDT

FleCSI Applications - Ben Bergen & Hyun Lim, Los Alamos National Laboratory

Wednesday May 7, 2025 10:45am - 10:55am CDT

Salon A-C

The Flexible Computational Science Infrastructure (FleCSI) programming system provides a clutter-free environment that allows developers to focus on the arithmetic operations of their methods without the distraction of computer science details that are often visible in legacy simulation codes. To this end, FleSCI provides light-weight wrappers over the raw Kokkos interface that resemble native C++ keywords, e.g., forall. Using this design philosophy, we have been able to evolve our support to cover various Kokkos policies and execution spaces. HARD is a FleCSI-based application for radiation hydrodynamics that is performance portable across a variety of systems, e.g., El Capitan, Venado, and Crossroads, and inherits FleCSI’s support for multiple distributed-memory and tasking backends, e.g., Legion, HPX, and MPI. In this talk, we will demonstrate the basic data-parallel interface with implementation and usage examples. We will also present results for several test problems in inertial confinement fusion with comparisons between different backends and performance assessments in different heterogeneous computing environments.

Speakers

Ben Bergen

Scientist, Los Alamos National Laboratory

Ben Bergen is a computational scientist working on runtime systems, data structures, and applications development.

Hyun Lim

Scientist, Los Alamos National Laboratory

Hyun Lim is a staff scientist in CCS-7. Hyun has a background in theoretical and computational astrophysics, gravitational physics, and numerical methods.

Wednesday May 7, 2025 10:45am - 10:55am CDT
Salon A-C

Kokkos User Group Meeting, Kokkos in Applications

10:55am CDT

DDC: A Performance Portable Library Abstracting Computation on Discrete Domains - Thomas Padioleau, CEA Paris-Saclay

Wednesday May 7, 2025 10:55am - 11:05am CDT

Salon A-C

The Discrete Domain Computation (DDC) library is a modern C++ library that aims to offer to the C++ world an equivalent to the xarray.DataArray Python environment. The Xarray library introduces labeled multidimensional arrays, enabling more intuitive data manipulation by associating dimensions with user-provided names rather than relying on positional indexing. This approach simplifies indexing, slicing, and broadcasting while reducing common indexing errors. Inspired by these ideas, DDC extends the Kokkos library providing zero-overhead dimension labeling for multidimensional arrays along with performance-portable multidimensional algorithms. This labeling mechanism enables compile-time detection of indexing and slicing errors, ensuring safer and more expressive array operations in C++. In this presentation, we will introduce the core concepts of DDC and demonstrate its usage through a simple example that highlights its key features.

Speakers

Thomas Padioleau

HPC research engineer, CEA Paris-Saclay

Dr. Thomas Padioleau is a CEA Engineer-Researcher at Maison de la Simulation. He leads the DDC project and also works on Voice++.

Wednesday May 7, 2025 10:55am - 11:05am CDT
Salon A-C

Kokkos User Group Meeting, Kokkos in Applications

11:05am CDT

TChem-atm - A Performance Portable Chemistry Solver for Atmospheric Chemistry - Oscar Diaz-Ibarra, Sandia National Laboratories

Wednesday May 7, 2025 11:05am - 11:25am CDT

Salon A-C

TChem-atm (https://github.com/PCLAeroParams/TChem-atm) is a performance-portable software library designed to support atmospheric chemistry applications, specifically computing source term Jacobian matrices. The software utilizes Kokkos as its portability layer, preparing it for next-generation computing architectures. The software interface employs a hierarchical parallelism design to leverage the massive parallelism available on modern computing platforms, including model parallelism, batch parallelism, and nested parallelism for each problem instance. Additionally, TChem-atm is designed to be coupled with third-party libraries that may be used to advance the state of gas and particle species over time, notably interfacing with the Tines, Kokkos-kernels, and Sundials libraries. We have tested TChem-atm in two scenarios: using a typical reaction mechanism in atmospheric science and an example involving multiple aerosol particles. This testing framework allows us to evaluate our code by varying the number of evaluations and the size of the source term (right-hand side). Finally, we report performance measurements using the CUDA, HIP, and OpenMP back ends.

Speakers

Oscar Diaz-Ibarra

Senior Member of Technical Staff, Sandia National Laboratories

Oscar is a senior member of the technical staff at Sandia National Laboratories, specializing in high-performance applications for atmospheric chemistry using Kokkos and modern C++. He holds a Ph.D. in chemical engineering from the University of Utah and has over 7 years of experience... Read More →

Wednesday May 7, 2025 11:05am - 11:25am CDT
Salon A-C

Kokkos User Group Meeting, Kokkos in Applications

11:25am CDT

GPU Porting of the TRUST CFD Platform with Kokkos - Rémi Bourgeois, French Atomic Energy Commission (CEA)

Wednesday May 7, 2025 11:25am - 11:45am CDT

Salon A-C

TRUST is a High Performance Computing thermohydraulic platform for Computational Fluid Dynamics developed at the French Atomic Energy Commission (CEA). This software is designed for massively parallel (MPI) simulations of conduction, incompressible single-phase, and Low Mach Number (LMN) flows with a Weakly-Compressible multi-species solver and compressible multi-phase flows. It is used as the basis for many specialised applications in the nuclear and new energy fields across CEA. The code is being progressively ported to support GPU acceleration (Nvidia/AMD/Intel) thanks to the Kokkos library, as it is one of the demonstrators of the CExA project. In this talk we will go over our experience using Kokkos to progressively port our large code base. We will cover our enabled GPU features and performances. We will mention some of the difficulties we encountered as well as the strategies we had to adopt that sometimes differ from standard good practices due to the specificity of our application.

Speakers

Rémi Bourgeois

Researcher / Engineer, CEA Saclay

Rémi Bourgeois is a French researcher/engineer at CEA Saclay, specializing in HPC and numerical analysis for the TRUST platform, a massively parallel thermo-hydraulic simulation tool. He earned his PhD at CEA, focusing on MHD convection, developing finite-volume methods and GPU-based... Read More →

Wednesday May 7, 2025 11:25am - 11:45am CDT
Salon A-C

Kokkos User Group Meeting, Kokkos in Applications

11:45am CDT

Omega: Towards a Performance-portable Ocean Model using Kokkos - Maciej Waruszewski, Sandia National Laboratories

Wednesday May 7, 2025 11:45am - 12:05pm CDT

Salon A-C

High-resolution simulations of the Earth system require resources available only on the world's largest supercomputers, which are increasingly based on GPUs. However, CPU-based systems are still frequently used to conduct simulations at coarse resolutions. To be able to take advantage of all compute platforms, we are developing Omega: the Ocean Model for E3SM Global Applications, a new ocean model written in C++ using Kokkos for performance portability. Omega will replace MPAS-Ocean to become the new ocean component of the DOE’s Energy Exascale Earth System Model (E3SM). Omega is an unstructured mesh ocean model based on the same finite-volume scheme as the current ocean component. Work on Omega began in 2023. Currently, Omega is a layered shallow water model with passive tracers. While still simple, this initial version can run on realistic size meshes and contains computational kernels representative of the full model horizontal numerics. After briefly describing Omega, this talk will go into our experiences with Kokkos and present initial performance results from a variety of compute platforms.)

Speakers

Maciej Waruszewski

Computer Scientist, Sandia National Laboratories

Maciej is a computer scientist at Sandia National Laboratories. He is one of the developers of the DOE’s Energy Exascale Earth System Model (E3SM). He holds a PhD in atmospheric physics from the University of Warsaw.

Wednesday May 7, 2025 11:45am - 12:05pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos in Applications

1:35pm CDT

A Brief Overview of LANL's use of Kokkos - Daniel Holladay, Los Alamos National Laboratory

Wednesday May 7, 2025 1:35pm - 1:55pm CDT

Salon A-C

Since the commissioning of the first petascale machine, Roadrunner, in 2009 at the Los Alamos
National Laboratory (LANL), the ability for physics codes at LANL to take advantage of accelerators
has provided utility and productivity improvements for code users. The ability to take advantage of an
accelerator, and more specifically general purpose graphics processing units (GPGPUs), will quickly
move from a productivity enhancement to absolutely necessary as more than 90% of the compute
capability of the El Capitan supercomputer at Lawrence Livermore National Laboratory (LLNL) will
only be accessed through effective use of its GPGPUs, a task which has traditionally been
accomplished with vendor specific software extensions such as CUDA or HIP. Many projects with
code bases ranging from large and established FORTRAN codes to new c++ based projects have
made the decision to use Kokkos as the tool that will enable effective use of LLNL's El Capitan
compute resources as well as future machines which could likely benefit from Kokkos's capabilities.
In this talk I will give an overview of several physics code projects at LANL and their usage of Kokkos.

Speakers

Daniel Holladay

Computational Physicist, Los Alamos National Laboratory

Daniel Holladay is the deputy project leader for computer science for the project that maintains the FLAG Lagrangian multi-physics code at the Los Alamos National Laboratory (LANL). He received a Ph.D. in Nuclear Engineering from Texas A&M University in 2018 while working as a LANL... Read More →

Wednesday May 7, 2025 1:35pm - 1:55pm CDT
Salon A-C

Kokkos User Group Meeting, Adopting Kokkos

1:55pm CDT

Enhancing Fortran Code for Operational Weather Forecasting with Kokkos: Results and Lessons Learned - Timothy Sliwinski, Cooperative Institute for Research in the Atmosphere (CIRA)

Wednesday May 7, 2025 1:55pm - 2:15pm CDT

Salon A-C

At NOAA, much of the code for numerical weather prediction (NWP) and operational weather forecasting is built upon Fortran, into which decades of scientific research knowledge and expertise has been invested. Therefore, moving away from Fortran and potentially breaking what has been a highly reliable system for many years is a significant challenge.
To demonstrate new methods to modernize NOAA’s NWP models, Kokkos was selected due to its ability to work across multiple GPUs and CPUs with a single source code and the presence of the Fortran Language Compatibility Layer (FLCL), easing development of the interface between Fortran and C++ Kokkos kernels. As a first step, the YSU Planetary Boundary Layer (PBL) scheme was chosen as the target and a prototype with Kokkos was developed, tested, and performance benchmarked. In this presentation, we report the performance of this new Kokkos-enhanced Fortran code on CPU and an Nvidia GPU, the challenges of the C/Fortran interface, potential future prospects for the use of Kokkos at NOAA, and overall lessons learned from this project for anyone else interested in using Kokkos with existing Fortran source codes.

Speakers

Timothy Sliwinski

HPC Software Developer, Cooperative Institute for Research in the Atmosphere (CIRA)

Dr. Timothy Sliwinski is an atmospheric scientist with the Cooperative Institute for Research in the Atmosphere at Colorado State University. Working directly with NOAA Global System Laboratory federal scientists in the Scientific Computing Branch, Dr. Sliwinski has worked on multiple... Read More →

Wednesday May 7, 2025 1:55pm - 2:15pm CDT
Salon A-C

Kokkos User Group Meeting, Adopting Kokkos

2:15pm CDT

Using Umpire's Memory Management Capabilities with Kokkos - Kristi Belcher, LLNL

Wednesday May 7, 2025 2:15pm - 2:35pm CDT

Salon A-C

Umpire is an open-source data and memory management library created at Lawrence Livermore National Laboratory (LLNL). Although Umpire is part of the RAJA Portability Suite, it was made to be modular and can therefore be used with Kokkos and other performance portability abstractions. Umpire provides memory pools which avoid expensive calls to the underlying device-specific API making allocations, large or small, performant in HPC environments. Umpire provides numerous types of memory resources and allocators (i.e. Device, Host, Unified Memory, IPC Shared Memory, etc.). In this talk, I will discuss key Umpire features and capabilities and showcase a Kokkos example with Umpire.

Speakers

Kristi Belcher

Software Developer, Lawrence Livermore National Laboratory

Kristi is a Software Developer at Lawrence Livermore National Laboratory working primarily on Umpire, an open source library that supports parallel data and memory management on HPC platforms, and MARBL, a large multi-physics simulation code. Kristi also works on the RADIUSS project... Read More →

Wednesday May 7, 2025 2:15pm - 2:35pm CDT
Salon A-C

Kokkos User Group Meeting, Adopting Kokkos

2:35pm CDT

Early Experiences Using Kokkos for Multi-Resolution Analysis - Joseph Schuchart, Stony Brook University

Wednesday May 7, 2025 2:35pm - 2:55pm CDT

Salon A-C

MADNESS is a framework for multi-resolution analysis with application in quantum chemistry. In this talk, we will present some early experiences in using Kokkos in a port of MADNESS to the TTG data-flow programming model, which includes both a restructuring of the existing program flow and a port to accelerators.

Speakers

Joseph Schuchart

Senior Research Scientist, Stony Brook University

Joseph Schuchart is a Senior Research Scientist at the Institute for Advanced Computational Science at Stony Brook University. He has been working on distributed data flow programming models and communication models, currently working at the intersection with computational chemistry... Read More →

Wednesday May 7, 2025 2:35pm - 2:55pm CDT
Salon A-C

Kokkos User Group Meeting, Adopting Kokkos

3:40pm CDT

Experience Porting a Scientific Code from YAKL to Kokkos - James Foucar, Sandia National Labs

Wednesday May 7, 2025 3:40pm - 3:50pm CDT

Salon A-C

The DoE climate code E3SM recently ported a medium sized scientific code, RRTMGP (computes radiative fluxes in planetary atmospheres), from a kernel launcher called YAKL to Kokkos. We'd like to share tips and pain points from this effort, particularly the struggle to get to performance parity with YAKL. We found that a 1:1 port (YAKL API is very similar to Kokkos) was not nearly sufficient to achieve good performance. The main issues were how to allocate temporary views and dealing with MDRangePolicy.

Speakers

James Foucar

R&D S&E, Computer Science, Sandia National Labs

I've been a software developer for Sandia for nearly 20 years. For the last 10 yeas, I've been doing software-focussed tasks for E3SM (DoE climate model).

Wednesday May 7, 2025 3:40pm - 3:50pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

3:50pm CDT

Benchmarking Lattice QCD Staggered Fermion Kernel Written in Kokkos - Simon Schlepphorst, Forschungszentrum Juelich GmbH

Wednesday May 7, 2025 3:50pm - 4:00pm CDT

Salon A-C

Lattice quantum chromodynamics (QCD) is a numerical approach to studying the interactions of quarks and gluons, where the fundamental eqautions governing their interactions are discretized to a four dimension spacetime lattice. One of the most costly computations is the inversion of the lattice Dirac operator, a large sparse matrix. Calculating this inversion with iterative solvers leads to many applications of that operator. This study builds on previous work where we implemented the staggered fermion Dirac operator as a benchmark in Kokkos. We investigate the effects of the tiling size in combination with the use of a 4D MDRangePolicy and 7D Views.

Speakers

Simon Schlepphorst

Research Software Engineer, Jülich Supercomputing Centre

After graduating with a Master's degree in physics from the University of Bonn, Simon became a Research Software Engineer at the Juelich Supercomputing Centre developing Lattice QCD codes for current and upcoming accelerators.

Wednesday May 7, 2025 3:50pm - 4:00pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

4:00pm CDT

Leveraging Liaisons in Your Network for Software Sustainability - Elaine M. Raybourn, Sandia National Laboratories

Wednesday May 7, 2025 4:00pm - 4:10pm CDT

Salon A-C

Open source software project sustainability is a sociotechnical endeavor that often extends beyond the efforts of individual projects. HPSF and the Linux Foundation offer rich resources of expertise across communities in industry, academia, and agencies. Leveraging this collective knowledge and experience is vital to enhance project practices, especially in early identification of challenges and potential issues. This lightning talk explores the value of leveraging liaisons — key individuals who are actively participating in cross-team networks, to accelerate project sustainability. Liaisons can bridge gaps, share tacit knowledge and incentivize collaborative efforts across communities, go assist in breaking down silos. The value of leveraging liaisons was identified during the DOE Exascale Computing Project to foster strategic project alignment and outreach. Whether a small team, or a larger network of teams of teams, identifying liaisons early on can foster trust and transparency both within and across teams.

Speakers

Elaine M. Raybourn

Principal Member of the Technical Staff, Sandia National Laboratories

Elaine M. Raybourn is a social scientist at Sandia National Laboratories. She has worked in the UK (British Telecom), Germany (Fraunhofer FIT), and France (INRIA) as a Fellow of the European Research Consortium in Informatics and Mathematics (ERCIM). She supports the DOE Office of... Read More →

Wednesday May 7, 2025 4:00pm - 4:10pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

4:10pm CDT

Vertex-CFD: A Multi-Physics Solver for Fusion Applications - Marc Olivier Delchini & Daniel Arndt, Oak Ridge National Laboratory

Wednesday May 7, 2025 4:10pm - 4:20pm CDT

Salon A-C

In this talk we will introduce Vertex-CFD, a multiphysics solver that is being developed in response to needs by Oak Ridge National Laboratory (ORNL) to have accurate simulation software for use in modeling of a fusion blanket problem. Vertex-CFD is built upon Trilinos and Kokkos libraries for compatibility with CPU and GPU platforms. It is designed to generate high-fidelity solutions of multiphysics problems in complex geometries by leveraging state-of-the art computing methods and technologies. We will describe how we leverage Kokkos and Trilinos to solve the governing equations by employing a finite element method and high-order implicit temporal integrators.

Speakers

Daniel Arndt

Large-Scale Computational Scientist, Oak Ridge National Laboratory

Daniel Arndt is a computational scientist at Oak Ridge National Laboratory. He is also a mathematician by training specializing on finite element simulations. His research focuses on supporting new backends in Kokkos.

Marco Delchini

CFD developer and analyst, Oak Ridge National Laboratory

CFD analyst and developer at Oak Ridge National Laboratory for 10 years. Obtained his PhD in nuclear engineering from Texas A&M University.

Wednesday May 7, 2025 4:10pm - 4:20pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

4:20pm CDT

Writing Better Kokkos Code with an AI Assistant - Chris Siefert, Sandia National Laboratories

Wednesday May 7, 2025 4:20pm - 4:30pm CDT

Salon A-C

This talk will highlight recent results on benchmarking the ability of open weight large language models (LLMs) to produce Kokkos code, look briefly at how Sandia is deploying our Kokkos AI assistant and finally identify areas for community collaboration.

Speakers

Chris Siefert

R&D Staff, Sandia National Laboratories

Wednesday May 7, 2025 4:20pm - 4:30pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

4:30pm CDT

Performance-Portable Spectral Ewald Summation with PyKokkos - Gabriel K Kosmacher, Oden Institute, The University of Texas at Austin

Wednesday May 7, 2025 4:30pm - 4:40pm CDT

Salon A-C

We present a performance portable implementation of the Spectral Ewald method, employing shared memory and streaming parallelism to rapidly evaluate periodic two-body potentials in Stokes flow. The method splits dense particle evaluation into near-field and far-field components, where the near-field is singular and the far-field decays rapidly in Fourier space. Far-field interactions resemble a Nonuniform Fast Fourier Transform: source potentials are interpolated onto a uniform grid (p2g), an ndFFT is applied, Fourier potentials are scaled, an ndIFFT is applied, and the potentials are interpolated back (g2p). The p2g, g2p, and near-field (p2p) interactions use Kokkos hierarchical parallelism with scratch-pad memory and thread-vector range reductions.

Speakers

Gabriel K Kosmacher

Graduate Student, Oden Institute, The University of Texas at Austin

Gabriel is a PhD student at the Oden Institute for Computational Engineering & Sciences, where he is advised by George Biros. His research interests lie at the intersection of numerical analysis and scientific computing and is particularly interested in fast numerical methods for... Read More →

Wednesday May 7, 2025 4:30pm - 4:40pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

4:40pm CDT

Empowering NSM Supercomputers with Kokkos for Scalable HPC - Harsha Ugave & Samir Shaikh, Centre for Developement of Advanced Computing (C-DAC)

Wednesday May 7, 2025 4:40pm - 4:50pm CDT

Salon A-C

Kokkos is transforming how high-performance applications run on National Supercomputing Mission (NSM) systems. With NSM deploying a mix of CPUs, GPUs, and other accelerators, ensuring software runs efficiently across all these platforms can be challenging. Kokkos simplifies this by providing a single, flexible programming model that adapts to different hardware without requiring major code changes. It supports multiple backends like CUDA, HIP, SYCL, and OpenMP, making it easier for developers to write performance-portable applications. For NSM’s large-scale supercomputers, Kokkos ensures better performance and scalability, allowing applications to make full use of processors, GPUs, and memory hierarchies. It also optimizes energy efficiency by improving memory access and reducing unnecessary data movement, helping to make supercomputing more sustainable. Since Kokkos is open-source and backed by an active community, it keeps up with emerging technologies, ensuring seamless adoption of next-generation NSM systems and preparing them for the future of exascale computing.

Speakers

Samir Shaikh

Scientist, Centre for Development of Advanced Computing (C-DAC), Pune, India

Samir Shaikh is an HPC specialist at C-DAC, Pune, optimizing large-scale workloads, parallel computing, and system architecture. As a Scientist C, he enhances HPC performance for AI/ML, scientific computing, and NSM supercomputers. An IIT Guwahati M.Tech graduate, he has contributed... Read More →

Harsha Ugave

HPC Project Engineer, Centre for Development of Advanced Computing

Harsha Ugave is an HPC Engineer at C-DAC Pune, specializing in performance portability, parallel computing, and system optimization. She plays a key role in deploying and tuning HPC applications under the National Supercomputing Mission (NSM). Her work ensures efficient execution... Read More →

Wednesday May 7, 2025 4:40pm - 4:50pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

4:50pm CDT

Real-Time Performance Characterization of the ADIOS2 Library When Kokkos Is Enabled - Ana Gainaru, Oak Ridge National Laboratory

Wednesday May 7, 2025 4:50pm - 5:00pm CDT

Salon A-C

Modern performance analysis tools are increasingly capable of capturing a high volume of metrics at ever-finer granularity. This abundance of information presents an opportunity to move beyond post-mortem analysis and leverage data streaming for real-time performance monitoring and decision-making. By streaming performance data, applications can provide immediate feedback, enabling dynamic adjustments and optimizations during execution. Furthermore, this streamed data can be directed to individual scientist workstations, facilitating on-the-fly health checks and user-driven interventions to steer the application's behavior. We will demonstrate the practical application of these concepts within the ADIOS2 library, showcasing how data streaming enables detailed monitoring and analysis of an HPC application during large-scale runs.

Speakers

Ana Gainaru

Computer Scientist, Oak Ridge National Laboratory

Ana Gainaru is a computer scientist in the CSM division at Oak Ridge National Laboratory, working on performance optimization for large scale scientific applications and on profiling, managing, and analyzing large-scale data. She received her PhD from the University of Illinois at... Read More →

Wednesday May 7, 2025 4:50pm - 5:00pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

5:00pm CDT

Cabana: Particles, Structured Grids, and Extensions to Unstructured with Kokkos - Sam Reeve, ORNL

Wednesday May 7, 2025 5:00pm - 5:10pm CDT

Salon A-C

We discuss updates to Cabana, a Kokkos+MPI library for building particle applications. Cabana was created through the U.S. Department of Energy Exascale Computing Project to enable particle simulation across methods on current and future exascale supercomputers. Cabana includes particle and structured grid parallelism, data structures, algorithms, communication, and interfaces to additional libraries, all extending and working alongside Kokkos. We focus in particular on recent efforts to integrate Cabana particles within Trilinos unstructured grids for broader support of scientific applications. We will highlight further recent Cabana development, performance and portability, and application-level demonstrations.

Speakers

Sam Reeve

Staff Scientist, Oak Ridge National Laboratory

Sam Reeve is a staff scientist at ORNL, working at the intersection of materials and computational science. Current focuses include performance portability and software development for physics applications and simulation of mesoscale material phenomena. He leads the development of... Read More →

Wednesday May 7, 2025 5:00pm - 5:10pm CDT
Salon A-C

Kokkos User Group Meeting, Kokkos Lightning Talks

9:00am CDT

kokkos-fft Updates – Yuuichi Asahi, CEA

Thursday May 8, 2025 9:00am - 9:10am CDT

Salon A-C

kokkos-fft implements local interfaces between Kokkos and de facto standard FFT libraries, including fftw, cufft, hipfft (rocfft), and oneMKL. We are inclined to implement the numpy.fft-like interfaces adapted for Kokkos. A key concept is that "As easy as numpy, as fast as vendor libraries". In the talk, we will introduce the basic APIs and typical use cases. We will also present future development plans.

Speakers

Yuuichi Asahi

Research Scientist, CEA

His recent interests are HPC and AI with NVIDIA, AMD and Intel GPUs. He has a rich experience in GPU programming models including CUDA, HIP, SYCL, Kokkos, OpenMP, OpenACC, thrust, stdpar, and senders/receivers. For exascale computing, he is highly interested in improving performance... Read More →

Thursday May 8, 2025 9:00am - 9:10am CDT
Salon A-C

Kokkos User Group Meeting, Broader Kokkos Ecosystem

9:10am CDT

Fortran Porting Wish List for Kokkos – Matthew Norman, Oak Ridge National Laboratory

Thursday May 8, 2025 9:10am - 9:20am CDT

Salon A-C

This presentation covers the beginnings of the Yet Another Kernel Launcher (YAKL) C++ portability library, its evolution alongside Kokkos, the use of Kokkos in its current form, and remaining issues before it can be retired in lieu of using Kokkos instead. The primary outstanding issues are the inclusion of arbitrary lower bounds for Fortran-like View behavior and the ability to use an underlying pool allocator for Views for cheap frequent device allocation and deallocation so that Views can be locally created and destroyed only where needed rather than existing for the global lifetime of simulations. This may improve readability and reduce the memory high water mark in simulations. A few performance related issues will be covered as well, mainly limited to MDRangePolicy and parallel_for register usage.

Speakers

Matthew Norman

Climate Scientist, Oak Ridge National Laboratory

Matt Norman leads the Advanced Computing for Life Sciences and Engineering group in the Oak Ridge Leadership Computing Facility (OLCF). He works with weather and climate simulation, urban and wind turbine simulation, PDE discretizations for the Navier-Stokes Equations, GPU acceleration... Read More →

Thursday May 8, 2025 9:10am - 9:20am CDT
Salon A-C

Kokkos User Group Meeting, Broader Kokkos Ecosystem

9:20am CDT

Custom Layout and Tiling for Multi-Dimensional Data – Cedric Chevalier, CEA

Thursday May 8, 2025 9:20am - 9:30am CDT

Salon A-C

Performance optimizations for exascale HPC applications primarily rely on fine-tuning implementations, requiring comprehensive knowledge of heterogeneous hardware architectures that domain experts often lack. One of Kokkos' biggest successes is tying the memory layout of multi-dimensional arrays to the execution backend. It allows the exploitation of coalescence or cache, depending on the hardware. Here, we propose to go further and design custom tiled layouts that are generic for C++23's std::mdspan. Instead of running tile algorithms on flat data, like Kokkos' mdrange, we want to explore how running flat algorithms on tiled data performs. On CPU, the first experimental results with std::mdspan on a naive dense matrix multiplication demonstrate that, by replacing standard layouts with our proposed solution, we achieve an average speedup of over 2.2x, with peak performance improvements of up to 7.8x. Then, we will discuss how external indexing can improve efficiency. We will present how to exploit it with Kokkos' mdrange algorithm, and how it can behave on GPU.

Speakers

Cedric Chevalier

Research Scientist, CEA

Cédric Chevalier is a research scientist at CEA in France. He is interested in developing libraries for HPC simulation codes, particularly in Linear Algebra and Mesh/Graph partitioning. His work at CEA is led by providing practical ways to exploit newer hardware, use new programming... Read More →

Thursday May 8, 2025 9:20am - 9:30am CDT
Salon A-C

Kokkos User Group Meeting, Broader Kokkos Ecosystem

9:30am CDT

Runtime Auto-Tuning for Kokkos Applications with APEX – Vivek Kale, Sandia National Laboratories

Thursday May 8, 2025 9:30am - 9:40am CDT

Salon A-C

Traditional GPU programming with libraries like CUDA or HIP requires tuning parameters exposed to the user, for example block sizes or number of teams. Kokkos also exposes portable parameters to the Kokkos user. How can Kokkos application programmers easily tune these Kokkos parameters for their application’s deployment when using any given Kokkos backend, without incurring large overheads? In particular, how do we ensure the tuning itself is portable across platforms? We propose using online, i.e., runtime, autotuning, utilizing the APEX Kokkos Tools connector to tune exposed parameters. Specifically, we discuss the Kokkos Tools Tuning Interface, tuning contexts, variable definition, the APEX runtime auto-tuning library utilizing Kokkos Tools, and distributed Kokkos auto-tuning. Applying our auto-tuning approaches to Kokkos sample kernels on Perlmutter and Frontier, we have obtained promising performance results. These results suggest Kokkos online auto-tuning is beneficial for production applications, and we invite Kokkos users to try these features and for Kokkos developers to contribute.

Speakers

Vivek Kale

Principal Member of Technical Staff, Sandia National Laboratories

Thursday May 8, 2025 9:30am - 9:40am CDT
Salon A-C

Kokkos User Group Meeting, Broader Kokkos Ecosystem

9:40am CDT

Unifying the HPC Ecosystem with std::execution – Mikael Simberg, Swiss National Supercomputing Centre

Thursday May 8, 2025 9:40am - 10:00am CDT

Salon A-C

Asynchronous programming models are becoming increasingly essential for fully leveraging modern hardware. In the C++ ecosystem, projects typically provide ad-hoc and varying interfaces, making interoperability difficult. Recently approved for C++26, the std::execution library promises to unify the ecosystem by providing a standard, composable interface for asynchronous operations. This talk briefly introduces the motivation and design principles of std::execution, and shares our experiences using it prior to standardization at CSCS in various projects, including Kokkos, HPX, and more. We'll discuss challenges, successes, and opportunities encountered while adopting std::execution.

Speakers

Mikael Simberg

HPC Application Engineer, Swiss National Supercomputing Centre

Mikael Simberg holds a master's degree in operations research and computer science from Aalto University in Finland. He joined the Swiss National Supercomputing Centre in 2017 where he works as a software developer helping scientific projects make the best use of modern hardware through... Read More →

Thursday May 8, 2025 9:40am - 10:00am CDT
Salon A-C

Kokkos User Group Meeting, Broader Kokkos Ecosystem

10:00am CDT

PyKokkos: Performance Portability for Python Developers – Milos Gligoric, The University of Texas at Austin

Thursday May 8, 2025 10:00am - 10:20am CDT

Salon A-C

Kokkos is a programming model for writing performance portable applications for all major high performance computing platforms. It provides abstractions for data management and common parallel operations, allowing developers to write portable high performance code with minimal knowledge of architecture-specific details. Kokkos is implemented as a heavily-templated C++ library. However, C++ is not ideal for rapid prototyping and quick algorithmic exploration. An increasing number of developers use Python for scientific computing, machine learning, and data analytics. In this talk, I will present a new Python framework, PyKokkos, for writing performance portable applications entirely in Python. PyKokkos provides Kokkos-like abstractions that are easier to use and more concise than the C++ interface. We implemented PyKokkos by building a translator from a subset of Python to C++ Kokkos and bridging necessary function calls via automatically generated Python bindings. I will also cover our recent work on automatic kernel fusion with the goal to optimize PyKokkos applications. The talk will also cover our experience on developing PyKokkos, its current limitations, and future plans.

Speakers

Milos Gligoric

Associate Professor, The University of Texas at Austin

Milos Gligoric is an Associate Professor in Electrical and Computer Engineering at The University of Texas at Austin where he holds the Archie W. Straiton Endowed Faculty Fellowship in Engineering. His research interests are in software engineering, especially in designing techniques... Read More →

Thursday May 8, 2025 10:00am - 10:20am CDT
Salon A-C

Kokkos User Group Meeting, Broader Kokkos Ecosystem

10:45am CDT

Leveraging the C Configuration Space and Tuning Library (CCS) in Kokkos Tools - Brice Videau, Argonne National Laboratory

Thursday May 8, 2025 10:45am - 11:05am CDT

Salon A-C

Online autotuning of runtime and applications presents untapped opportunities to increase HPC application performance and efficiency. During ECP, in order to exploit this potential, the autotuning working group at Argonne National Laboratory and the Kokkos team co-designed the Kokkos Tools tuning API and the C Configuration Space and Tuning Library (CCS). The Kokkos Tools tuning API would create a framework to plug tuners inside Kokkos and expose tuning regions to them, while the CCS library would offer an API to both capture Kokkos configuration spaces and implement tuners to optimize them. This effort led to the creation of the CCS Kokkos connector, a Kokkos tool that leverages both APIs to offer a baseline tuner for Kokkos regions. In this presentation, we will present the result of this collaboration from the perspective of CCS, the abstractions it offers and how they map to Kokkos tuning model. We will describe the capabilities of the CCS library and how it fulfills the goal of offering a standard interface to bridge the gap between tuners and applications/runtimes. We will also discuss the perspectives and future works around the CCS Kokkos connector.

Speakers

Brice Videau

Computer Scientist, Argonne National Laboratory

Brice is a computer scientist, co-leading the performance engineering team at Argonne Leadership Computing Facility. Brice's research topics include heterogeneous programming models, system software, auto-tuning, code generation, and code transformation.

Thursday May 8, 2025 10:45am - 11:05am CDT
Salon A-C

Kokkos User Group Meeting, Tuning and Performance

11:05am CDT

Bottlenecks in High-Dimensional Simulations - Nils Schild, Max Planck Institute for Plasma Physics

Thursday May 8, 2025 11:05am - 11:25am CDT

Salon A-C

The Vlasov-Maxwell system, which describes the motion for charged particles of matter in a plasma state using a particle distribution function, is based on a 6-D phase space defined through configuration and velocity coordinates.
Considering an Eulerian grid for this system with only 32^6 degrees of freedom, the distribution function requires already 8.5 GB of memory. This implies that high-resolution simulations can only be executed on large compute clusters.
In this talk, we focus on two aspects of the open-source code BSL6D to solve a reduced version of the Vlasov-Maxwell system. The shared memory parallelization based on Kokkos applies a stencil algorithm to data, which is non-contiguous in memory, to reduce memory requirements. The inter-node communication bottleneck poses a challenge due to the large halo domain to compute domain ratio. Finally, we discuss the advantages of RAII-managed MPI communicators for distributed domains, simplifying the implementation of parallel algorithms with distributed memory concepts.

Speakers

Nils Schild

Ph.D. Student, Max Planck Institute for Plasma Physics

After studying physics and working on solvers for sparse eigenvalue problems in quantum mechanics at the University of Bayreuth, he moved to the Max Planck Institute for Plasma Physics in Garching (Germany). During his Ph.D., he started implementing the software BSL6D, a solver for... Read More →

Thursday May 8, 2025 11:05am - 11:25am CDT
Salon A-C

Kokkos User Group Meeting, Tuning and Performance

11:25am CDT

Accelerating SPECFEM++ with Explicit SIMD and Cache-Optimized Layouts - Rohit Kakodkar, Princeton University

Thursday May 8, 2025 11:25am - 11:45am CDT

Salon A-C

SPECFEM++ is a suite of computational tools based on the spectral element method used to simulate wave propagation through heterogeneous media. The project aims to unify the legacy SPECFEM codes - three separate Fortran packages (SPECFEM2D, SPECFEM3D, and SPECFEM3D_globe) - into a single C++ package. This new package aims to deliver optimal performance across different architectures by leveraging the Kokkos library. In this presentation, I will outline our efforts to enhance CPU performance using explicit SIMD types (Kokkos::Experimental::simd). High vectorization throughput can be challenging, particularly because the data involved in spectral element assembly is not always organized cache-friendly. To address this, we have implemented a strategy that prefetches the data into cache-optimized scratch views of SIMD types before executing the SIMD operations. Additionally, we have optimized data layouts using custom-defined tiled layouts that improve cache locality. As a result of these optimizations, we have achieved approximately a 2.5x speed-up compared to auto-vectorized implementations.

Speakers

Rohit Kakodkar

Research Software Engineer, Princeton University

Rohit is a Research Software Engineer in Princeton University's Research Computing department. He is focused on rewriting SPECFEM, a spectral element solver designed to simulate wave propagation through heterogeneous media. SPECFEM is extensively used within the computational seismology... Read More →

Thursday May 8, 2025 11:25am - 11:45am CDT
Salon A-C

Kokkos User Group Meeting, Tuning and Performance

11:45am CDT

Managing Kokkos Callbacks for Benchmarking, Profiling, and Unit Testing - Maarten Arnst, University of Liège

Thursday May 8, 2025 11:45am - 12:05pm CDT

Salon A-C

Many Kokkos functions have instrumentation hooks defined within the framework of Kokkos::Tools. These instrumentation hooks allow Kokkos::Tools as well as third-party tracing, profiling and testing tools to register callbacks to monitor and interact with the runtime behavior of the program. In this presentation, we will describe several utilities that we have designed to help manage such callbacks. We have implemented a manager class that can register function objects that can listen to such callbacks. And we have implemented several such function objects, such as an event recorder, an event counter, and a kernel timer that uses event stream synchronization markers on device backends. We will illustrate these utilities through their use in benchmarking, profiling, and unit testing of a Kokkos-based finite-element code.

Speakers

Maarten Arnst

Associate professor, University of Liege

Associate Professor at University of Liege.

Thursday May 8, 2025 11:45am - 12:05pm CDT
Salon A-C

Kokkos User Group Meeting, Tuning and Performance

1:35pm CDT

Gyselalib++: A Portable, Kokkos-Based Library for Exascale Gyrokinetic Simulations - Etienne Malaboeuf, CINES

Thursday May 8, 2025 1:35pm - 1:45pm CDT

Salon A-C

The development of fusion energy in magnetic confinement devices relies heavily on simulations of plasma behavior. Gyselalib++ is a new open-source C++ library under active development by a European distributed and multidisciplinary team of physicists, mathematicians, and computer scientists at EPFL, CEA/IRFM, Maison de la Simulation, IPP Garching, and CINES. Gyselalib++ is itself built on top of PDI, DDC and Kokkos and provides mathematical tools for gyrokinetic semi-Lagrangian codes for tokamak plasma simulations. This presentation will introduce the library, its design and the rationale behind its development, and will highlight its key features. It will showcase how the choice of Kokkos made it possible to achieve high performance on modern hardware with performance portability over a wide range of hardware, and will explain the need to introduce DDC to improve development safety. We will discuss feedback from this experience, analyze our successes and the limitations of the approach, especially when it comes to performance, performance portability, and programmability of the code by a highly diverse team in terms of background.

Speakers

Etienne Malaboeuf

HPC Engineer, CINES

I focus on improving the performance of projects related to real-time and high-performance computing, while providing various forms of support to researchers using French supercomputers. I have worked on numerical simulation software in an HPC context, on supercomputers and on game... Read More →

Thursday May 8, 2025 1:35pm - 1:45pm CDT
Salon A-C

Kokkos User Group Meeting, Algorithms

1:45pm CDT

Expression Templates with Kokkos for Lattice QCD - Travis Whyte, Jülich Supercomputing Centre

Thursday May 8, 2025 1:45pm - 1:55pm CDT

Salon A-C

Lattice quantum chromodynamics (QCD) is a first principles approach to studying the interaction of quarks and gluons. The calculation of observables in lattice QCD requires many different operations between multidimensional arrays of various ranks. In this talk, I will describe an implementation of expression templates using Kokkos that allows for lattice QCD practitioners to simply implement linear algebra operations while avoiding temporaries for views of arbitrary rank. This abstraction has the potential to promote high productivity in the development process. The performance of various benchmarks on different architectures will also be discussed.

Speakers

Travis Whyte

Postdoc, Jülich Supercomputing Centre

I graduated from Baylor University with a Ph.D. in Physics, focusing on algorithmic improvements for lattice QCD simulations. Since then, I have continued to work in the field, focusing on improving iterative solvers, scattering simulations and HPC software development.

Thursday May 8, 2025 1:45pm - 1:55pm CDT
Salon A-C

Kokkos User Group Meeting, Algorithms

1:55pm CDT

Bridging Parallel Communication and On-Node Computation with Kokkos - Evan Suggs, Tennessee Technological University

Thursday May 8, 2025 1:55pm - 2:15pm CDT

Salon A-C

Although MPI and Kokkos have long been used together, there were no well-defined methods for integrating them effectively. The only approach is to point the underlying Kokkos View buffers to MPI functions.
This causes several major pain points: handling non-contiguous Views, asynchronous operations in both models, and how MPI interacts with Kokkos Profiling. Kokkos Comm is an experimental MPI interface for the Kokkos C++ Performance Portability Programming ecosystem that aims to address these concerns and improve the productivity of Kokkos users.

Currently, Kokkos Comm integrates point-to-point collectives, handling of non-contiguous Views, and Kokkos Tools Profiling. Kokkos Comm also aims to be a springboard for new and improved features that go beyond MPI and Kokkos, allowing Kokkos to work with MPI, stream-triggered MPIs, and other non-MPI communication libraries (e.g., NCCL and RCCL). This presentation will cover the Kokkos Comm API, conversion of existing code, best practices, how Kokkos Comm can help address common issues in Kokkos/MPI, and upcoming additions to Kokkos Comm, such as persistent communication and device-initiated communication.

Speakers

Evan Suggs

Staff Researcher, Tennessee Technological University

Evan Drake Suggs is a Research Scientist at Tennessee Technological University in Cookeville, Tennessee. In 2023, Suggs graduated with a Master's degree in Data Science from the University of Tennessee at Chattanooga and presented his thesis work on MPI+Kokkos using the ExaMPI implementation... Read More →

Thursday May 8, 2025 1:55pm - 2:15pm CDT
Salon A-C

Kokkos User Group Meeting, Algorithms

2:15pm CDT

Integration of PETSc, Kokkos Core, and Kernels for Performance Portability in the Age of Accelerators - Junchao Zhang, Argonne National Laboratory

Thursday May 8, 2025 2:15pm - 2:35pm CDT

Salon A-C

PETSc, the Portable, Extensible Toolkit for Scientific Computation, provides an extensive suite of scalable parallel solvers for linear and nonlinear equations, ordinary differential equation (ODE) integrators, and optimization algorithms. Widely adopted in both industry and academia, PETSc historically achieved performance portability through the C programming language and the Message Passing Interface (MPI) programming model. It used single-threaded MPI processes for both shared and distributed memory systems. This strategy had served us very well in the microprocessor age. However, the recent proliferation of accelerator-based architectures, particularly graphics processing units (GPUs), has posed new challenges to this performance portability. To address these challenges, we have integrated PETSc with the Kokkos ecosystem, specifically Kokkos-Core and Kokkos-Kernels. In this presentation, we describe our integration approach, highlight our experiences—both effective strategies and encountered challenges—and outline future developments aimed at further enhancing performance portability across evolving computational architectures.

Speakers

Junchao Zhang

Software Engineer, Argonne National Laboratory

Junchao Zhang is a software developer at Argonne. He currently works on the Portable, Extensible Toolkit for Scientific Computation (PETSc) project. Before joining PETSc, he was an MPICH developer at Argonne and developed the MPI Fortran 2008 binding and MPI tool interface of MPI-3.0... Read More →

Thursday May 8, 2025 2:15pm - 2:35pm CDT
Salon A-C

Kokkos User Group Meeting, Algorithms

2:35pm CDT

Parallel Sweep Algorithms for Cartesian and Honeycomb Grids - Ansar Calloo, CEA

Thursday May 8, 2025 2:35pm - 2:55pm CDT

Salon A-C

The linear Boltzmann transport equation (BTE) is the governing equation for expressing the behaviour of neutral particles in a system such as a nuclear reactor. BTE can be solved for the flux of particles using deterministic methods whereby the equation is discretised in the phase space of is fundamental variables. This discrete equation is then usually solved using the source iteration. In this talk, we will present how the sweep algorithm which is based upon a wavefront pattern has been optimised in the context of SMP for CPU and also some preliminary results on GPU. The goal is to show how to adapt the sweep algorithm to be efficient on new supercomputer architectures.We will briefly introduce DONUT (Discrete Ordinates NeUtron Transport), a modern C++ miniapp for solving BTE based on the discrete ordinates and discontinuous Galerkin discretisations for Cartesian and honeycomb grids.

Speakers

Ansar Calloo

Research engineer, CEA

Ansar obtained his PhD in deterministic neutron transport at CEA. For the past fifteen years, he has been working on improving simulations for reactor physics applications first at EDF R&D, then CEA. His research interests involve nuclear reactor model, numerical methods to solve... Read More →

Thursday May 8, 2025 2:35pm - 2:55pm CDT
Salon A-C

Kokkos User Group Meeting, Algorithms

3:40pm CDT

Panel Discussion: Kokkos - Reflecting on Exascale and Charting the Future

Thursday May 8, 2025 3:40pm - 5:00pm CDT

Salon A-C

Join us for a dynamic panel session at the Kokkos User Group meeting, bringing together core Kokkos developers, library leads, ecosystem project members, application experts, numerical toolkit developers, and educators. We'll reflect on the journey since our last major release and the dawn of the exascale era, sharing diverse perspectives on successes, challenges, and lessons learned. This is your opportunity to engage with the community, ask questions, and help shape the future direction of Kokkos. Come be a part of the conversation that drives what's next!

Speakers

Kokkos User Group Meeting