Name: Broader Kokkos Ecosystem
Start: 2025-05-08T09:00:00-0500
End: 2025-05-08T10:20:00-0500

May 5-8, 2025
Chicago, IL
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for the event to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to find out more information.

This schedule is automatically displayed in Central Time (UTC/GMT -6 hours). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Thursday May 8, 2025 9:00am - 10:20am CDT

Salon A-C

1. kokkos-fft Updates – Yuuichi Asahi, CEA (10 minutes)
kokkos-fft implements local interfaces between Kokkos and de facto standard FFT libraries, including fftw, cufft, hipfft (rocfft), and oneMKL. We are inclined to implement the numpy.fft-like interfaces adapted for Kokkos. A key concept is that "As easy as numpy, as fast as vendor libraries". In the talk, we will introduce the basic APIs and typical use cases. We will also present future development plans.

2. Fortran Porting Wish List for Kokkos – Matthew Norman, Oak Ridge National Laboratory (10 minutes)
This presentation covers the beginnings of the Yet Another Kernel Launcher (YAKL) C++ portability library, its evolution alongside Kokkos, the use of Kokkos in its current form, and remaining issues before it can be retired in lieu of using Kokkos instead. The primary outstanding issues are the inclusion of arbitrary lower bounds for Fortran-like View behavior and the ability to use an underlying pool allocator for Views for cheap frequent device allocation and deallocation so that Views can be locally created and destroyed only where needed rather than existing for the global lifetime of simulations. This may improve readability and reduce the memory high water mark in simulations. A few performance related issues will be covered as well, mainly limited to MDRangePolicy and parallel_for register usage.

3. Custom Layout and Tiling for Multi-Dimensional Data – Cedric Chevalier & Gabriel Dos Santos, CEA (10 minutes)
Performance optimizations for exascale HPC applications primarily rely on fine-tuning implementations, requiring comprehensive knowledge of heterogeneous hardware architectures that domain experts often lack. One of Kokkos' biggest successes is tying the memory layout of multi-dimensional arrays to the execution backend. It allows the exploitation of coalescence or cache, depending on the hardware. Here, we propose to go further and design custom tiled layouts that are generic for C++23's std::mdspan. Instead of running tile algorithms on flat data, like Kokkos' mdrange, we want to explore how running flat algorithms on tiled data performs. On CPU, the first experimental results with std::mdspan on a naive dense matrix multiplication demonstrate that, by replacing standard layouts with our proposed solution, we achieve an average speedup of over 2.2x, with peak performance improvements of up to 7.8x. Then, we will discuss how external indexing can improve efficiency. We will present how to exploit it with Kokkos' mdrange algorithm, and how it can behave on GPU.

4. Runtime Auto-Tuning for Kokkos Applications with APEX – Kevin Huck, University of Oregon (10 minutes)
Traditional GPU programming with libraries like CUDA or HIP requires tuning parameters exposed to the user, for example block sizes or number of teams. Kokkos also exposes portable parameters to the Kokkos user. How can Kokkos application programmers easily tune these Kokkos parameters for their application’s deployment when using any given Kokkos backend, without incurring large overheads? In particular, how do we ensure the tuning itself is portable across platforms? We propose using online, i.e., runtime, autotuning, utilizing the APEX Kokkos Tools connector to tune exposed parameters. Specifically, we discuss the Kokkos Tools Tuning Interface, tuning contexts, variable definition, the APEX runtime auto-tuning library utilizing Kokkos Tools, and distributed Kokkos auto-tuning. Applying our auto-tuning approaches to Kokkos sample kernels on Perlmutter and Frontier, we have obtained promising performance results. These results suggest Kokkos online auto-tuning is beneficial for production applications, and we invite Kokkos users to try these features and for Kokkos developers to contribute.

5. Unifying the HPC Ecosystem with std::execution – Mikael Simberg, Swiss National Supercomputing Centre (20 minutes)
Asynchronous programming models are becoming increasingly essential for fully leveraging modern hardware. In the C++ ecosystem, projects typically provide ad-hoc and varying interfaces, making interoperability difficult. Recently approved for C++26, the std::execution library promises to unify the ecosystem by providing a standard, composable interface for asynchronous operations. This talk briefly introduces the motivation and design principles of std::execution, and shares our experiences using it prior to standardization at CSCS in various projects, including Kokkos, HPX, and more. We'll discuss challenges, successes, and opportunities encountered while adopting std::execution.

6. PyKokkos: Performance Portability for Python Developers – Milos Gligoric, The University of Texas at Austin (20 minutes)
Kokkos is a programming model for writing performance portable applications for all major high performance computing platforms. It provides abstractions for data management and common parallel operations, allowing developers to write portable high performance code with minimal knowledge of architecture-specific details. Kokkos is implemented as a heavily-templated C++ library. However, C++ is not ideal for rapid prototyping and quick algorithmic exploration. An increasing number of developers use Python for scientific computing, machine learning, and data analytics. In this talk, I will present a new Python framework, PyKokkos, for writing performance portable applications entirely in Python. PyKokkos provides Kokkos-like abstractions that are easier to use and more concise than the C++ interface. We implemented PyKokkos by building a translator from a subset of Python to C++ Kokkos and bridging necessary function calls via automatically generated Python bindings. I will also cover our recent work on automatic kernel fusion with the goal to optimize PyKokkos applications. The talk will also cover our experience on developing PyKokkos, its current limitations, and future plans.

Speakers

Kevin Huck

Senior Research Associate, University of Oregon

Kevin Huck is a Senior Research Associate in the Oregon Advanced Computing Institute for Science and Society (OACISS) at the University of Oregon. He is interested in the unique problems of performance analysis of large HPC applications as well as automated methods for diagnosing... Read More →

Cedric Chevalier

Research Scientist, CEA

Cédric Chevalier is a research scientist at CEA in France. He is interested in developing libraries for HPC simulation codes, particularly in Linear Algebra and Mesh/Graph partitioning. His work at CEA is led by providing practical ways to exploit newer hardware, use new programming... Read More →

Gabriel Dos Santos

PhD Student, CEA

PhD student on the management of data structures representations in heterogeneous architecture for exascale-class HPC workloads, with a strong background in performance optimization, CPU microarchitectures and vectorization.

Matthew Norman

Climate Scientist, Oak Ridge National Laboratory

Matt Norman leads the Advanced Computing for Life Sciences and Engineering group in the Oak Ridge Leadership Computing Facility (OLCF). He works with weather and climate simulation, urban and wind turbine simulation, PDE discretizations for the Navier-Stokes Equations, GPU acceleration... Read More →

Mikael Simberg

HPC Application Engineer, Swiss National Supercomputing Centre

Mikael Simberg holds a master's degree in operations research and computer science from Aalto University in Finland. He joined the Swiss National Supercomputing Centre in 2017 where he works as a software developer helping scientific projects make the best use of modern hardware through... Read More →

Milos Gligoric

Associate Professor, The University of Texas at Austin

Milos Gligoric is an Associate Professor in Electrical and Computer Engineering at The University of Texas at Austin where he holds the Archie W. Straiton Endowed Faculty Fellowship in Engineering. His research interests are in software engineering, especially in designing techniques... Read More →

Yuuichi Asahi

Research Scientist, CEA

His recent interests are HPC and AI with NVIDIA, AMD and Intel GPUs. He has a rich experience in GPU programming models including CUDA, HIP, SYCL, Kokkos, OpenMP, OpenACC, thrust, stdpar, and senders/receivers. For exascale computing, he is highly interested in improving performance... Read More →

Thursday May 8, 2025 9:00am - 10:20am CDT
Salon A-C

Kokkos User Group Meeting

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!