Communication Libraries¶
CSCS provides common communication libraries optimized for the Slingshot 11 network on Alps.
For most scientific applications relying on MPI, Cray MPICH is recommended. MPICH and OpenMPI may also be used, with limitations. Cray MPICH, MPICH, and OpenMPI make use of libfabric to interact with the underlying network.
Most machine learning applications rely on NCCL or RCCL for high-performance implementations of collectives. NCCL and RCCL have to be configured with a plugin using libfabric to make full use of the Slingshot network.
See the individual pages for each library for information on how to use and best configure the libraries.