Loading…
May 5-8, 2025
Chicago, IL
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for the event to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to find out more information.

This schedule is automatically displayed in Central Time (UTC/GMT -6 hours). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Wednesday May 7, 2025 2:55pm - 3:15pm CDT
An ongoing challenge for HPC containers is how to make host resources such as GPU devices and proprietary interconnects performantly available inside a container. In Charliecloud, the key requirement is placing shared libraries (.so files) into the container and then running ldconfig(8) to update its cache. (Other implementations also have to deal with device files and their associated permissions and symlinks, but because Charliecloud bind-mounts host /dev into the container, this is not needed).

Charliecloud has done this for some time using ch-fromhost(1), which is a large, reverse-engineered shell script that copies the needed files into a writeable image. It is difficult to maintain, does not support SquashFS or other write-only images, and adds a workflow step.

Other implementations typically use “OCI hooks”, which are arbitrary vendor-provided or custom programs run at various phases during container setup. These also present maintainability / bit-rot problems, can be opaque, and because their interface is solely “do it for me”, any invalid assumptions that hooks make can be difficult or impossible to work around.

A different approach is the emerging Container Device Interface (CDI) standard, with contributors from nVidia, Intel, Los Alamos, and others. This gives prescriptive JSON/YAML descriptions of what is needed. Charliecloud has implemented CDI in its runtime ch-run(1), bind-mounting requested files and using an unprivileged tmpfs overlay (available since Linux 5.11 in February 2021) to avoid modifying the image. This is a considerably simpler and more maintainable way to make host resources available inside a container.

This talk will provide an overview of CDI, adaptations of the standard to our fully unprivileged workflow, and our C implementation. We will also demonstrate the functionality for nVidia GPUs and HPC/Cray Slingshot interconnect.

LA-UR-25-22140
Speakers
RP

Reid Priedhorsky

Scientist, Los Alamos National Laboratory
I am a staff scientist at Los Alamos National Laboratory. Prior to Los Alamos, I was a research staff member at IBM Research. I hold a Ph.D. in computer science from the University of Minnesota and a B.A., also in computer science, from Macalester College.My work focuses on large-scale... Read More →
Wednesday May 7, 2025 2:55pm - 3:15pm CDT
Illinois River

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link