Gordon Bell runs 2026¶
Info
Gordon Bell runs will take place on Clariden from Tuesday, April 7, 2026 to Monday, April 13, 2026.
During this period, Clariden will be temporarily expanded to 2300 GH200 nodes, Daint will operate with a reduced compute capacity, while Santis will be unavailable.
All times on this page are local CSCS time (Europe/Zurich, CEST).
Warning
During the daily reserved window, Clariden will be dedicated to Gordon Bell teams and unavailable for regular user jobs.
Regular user jobs can still be submitted at any time and will be scheduled automatically during the open-access windows.
Clariden¶
Connecting¶
Connecting to Clariden via SSH is the same as for Daint and Santis, see the SSH guide for more information.
Add the following to your SSH configuration to enable you to directly connect to Clariden using ssh clariden.
Host clariden
HostName clariden.alps.cscs.ch
ProxyJump ela
# change cscsusername to your CSCS username
User cscsusername
IdentityFile ~/.ssh/cscs-key
IdentitiesOnly yes
Reservations¶
The cluster reconfiguration starts on April 7 at 07:00 and is expected to complete by approximately 12:00. The special Gordon Bell configuration is currently planned to remain in place until April 13 at 07:00, when the cluster will be restored to its standard layout. The temporary Gordon Bell layout provides 2300 nodes on Clariden. In practice, around 2050 nodes are expected to be available for production runs.
Currently planned Gordon Bell reservation windows:
| Date | Time (CEST) | Status |
|---|---|---|
| April 7 | 07:00-approximately 12:00 | Cluster reconfiguration to the Gordon Bell layout |
| April 7 | 14:00-20:00 | 440 nodes reserved for Gordon Bell runs |
| April 7 | 14:00-18:00 | 600 nodes reserved for Apertus training runs |
| April 7/8 | 20:00-08:00 | 600 nodes reserved for Apertus training runs |
| April 8 | 08:00-12:00 | 1500 nodes reserved for Gordon Bell runs |
| April 8 | 12:00-21:00 | 2300 nodes reserved for Gordon Bell runs |
| April 8/9 | 21:00-11:00 | 600 nodes reserved for Apertus training runs |
| April 9 | 11:00-15:00 | 2300 nodes reserved for Gordon Bell runs |
| April 9/10 | 16:00-06:00 | 600 nodes reserved for Apertus training runs |
| April 9/10 | 16:00-06:00 | 1200 nodes reserved for Apertus training runs (MoE runs) |
These times may still change as the week progresses. Any updates, including possible weekend extensions, will be communicated as early as possible. When there are non-reserved nodes available, regular user jobs will be scheduled as usual, with the maximum job length temporarily reduced to 6 hours.
On April 13, the special Gordon Bell configuration ends at 07:00, when the cluster will be restored to its standard layout.
Software environment¶
Clariden and Santis use the same software image during this period:
- USS 1.3.1
- NVIDIA driver 590
- No CPE software stack
Daint uses a different image:
- USS 1.1.0
- NVIDIA driver 550
- CPE software stack available
Warning
Clariden does not provide the CPE software stack. Gordon Bell teams should therefore prepare and validate their software environment on Clariden (or Santis), rather than on Daint. For most users, this means using a uenv, a container-based workflow, or a self-managed software stack.
Storage¶
The same shared filesystems are available across Daint, Clariden, and Santis:
- capstor, iopsstor, and VAST are mounted on Clariden during the Gordon Bell runs
- Home is shared between Daint, Clariden, and Santis
- Scratch spaces are shared between Daint, Clariden, and Santis
- Store/Project filesystems are mounted
For most large-scale run data, staging, and scratch-like workloads, /capstor/scratch/cscs/${USER} is the recommended choice.
Lustre striping¶
Uenvs¶
uenv images can be striped across multiple OSTs on the Lustre filesystem, which can significantly improve I/O performance for large files.
Striping is applied automatically to all uenv images created in repositories that were either created in the last few months, or who have updated their repos.
If your uenv images were created before this change, you can update your repository to apply striping to your existing images.
Disabling core-dumps¶
If a large job crashes and tries to write core-dump files on thousands of processes, it will overwhelm the filesystem. Therefore we strongly recommend to disable them with the following command:
MPI¶
MPI jobs on Clariden must be started with the Shasta MPI integration:
MPI may need longer than the default timeout to initialize in large scale runs. As a precaution, we recommend increasing the timeout from 180 seconds to 300.
NCCL¶
See the container engine documentation for information on using NCCL in containers. The NCCL documentation contains general information on configuring NCCL. This is especially important when using uenvs, as the relevant environment variables are not set automatically. Because Clariden and Santis do not provide CPE, Gordon Bell teams are strongly encouraged to validate their NCCL and MPI configuration in the exact runtime environment they plan to use for production.