Machine learning applications and frameworks¶
CSCS supports a wide range of machine learning (ML) applications and frameworks on its systems. Most ML workloads are containerized to ensure portability, reproducibility, and ease of use across systems.
Users can choose between running containers, using provided uenv software stacks, or building custom Python environments tailored to their needs.
First time users are recommended to consult the LLM tutorials to get familiar with the concepts of the Machine Learning platform in a series of hands-on examples.
Running ML applications with containers (recommended)¶
Containerization is the recommended approach for ML workloads on Alps, as it simplifies software management and maximizes compatibility with other systems.
Users are encouraged to build their own containers, starting from popular sources such as the Nvidia NGC Catalog, which offers a variety of pre-built images optimized for HPC and ML workloads. Examples include:
- PyTorch NGC container (Release Notes)
- JAX NGC container (Release Notes)
- TensorFlow NGC container (deprecated since 25.02, see Release Notes)
Documented best practices are available for:
Extending a container with a virtual environment
For frequently changing Python dependencies during development, consider creating a Virtual Environment (venv) on top of the packages in the container (see this example).
Helpful references:
- Introduction to concepts of the Machine Learning platform: LLM tutorials
- Running containers on Alps: Container Engine Guide
- Building custom container images: Container Build Guide
Using provided uenv software stacks¶
Alternatively, CSCS provides pre-configured software stacks (uenvs) that can serve as a starting point for machine learning projects. These environments provide optimized compilers, libraries, and selected ML frameworks.
Available ML-related uenvs:
Extending a uenv with a virtual environment
To extend these environments with additional Python packages, it is recommended to create a Python Virtual Environment (venv) layered on top of the packages in the uenv. See this PyTorch venv example for details.
Building custom Python environments¶
Users may also choose to build entirely custom software stacks using Python package managers such as uv
or conda
.
Most ML libraries are available via the Python Package Index (PyPI).
Note
While many Python packages provide pre-built binaries for common architectures, some may require building from source.
To ensure optimal performance on CSCS systems, we recommend starting from an environment that already includes:
- CUDA, cuDNN
- MPI, NCCL
- C/C++ compilers
This can be achieved either by:
- building a custom container image based on a suitable ML-ready base image,
- or starting from a provided uenv (e.g.,
prgenv-gnu
or PyTorch uenv),
and extending it with a virtual environment.