Gordon Bell runs 2026¶

Info

Gordon Bell runs will take place on Clariden from Tuesday, April 7, 2026 to Monday, April 13, 2026.

During this period, Clariden will be temporarily expanded to 2300 GH200 nodes, Daint will operate with a reduced compute capacity, while Santis will be unavailable.

All times on this page are local CSCS time (Europe/Zurich, CEST).

Warning

During the daily reserved window, Clariden will be dedicated to Gordon Bell teams and unavailable for regular user jobs.

Regular user jobs can still be submitted at any time and will be scheduled automatically during the open-access windows.

Clariden¶

Connecting¶

Connecting to Clariden via SSH is the same as for Daint and Santis, see the SSH guide for more information.

Add the following to your SSH configuration to enable you to directly connect to Clariden using ssh clariden.

Host clariden
    HostName clariden.alps.cscs.ch
    ProxyJump ela
# change cscsusername to your CSCS username
    User cscsusername
    IdentityFile ~/.ssh/cscs-key
    IdentitiesOnly yes

Reservations¶

During the Gordon Bell period, Clariden will operate with a different schedule than usual.

Date	Time	Status
April 7, 2026	07:00-approximately 12:00	Cluster reconfiguration to the Gordon Bell layout
April 7-12, 2026	09:00-19:00 daily	Reserved for Gordon Bell teams
April 7-12, 2026	19:00-09:00 daily	Open access for regular users
April 13, 2026	until 07:00	Final overnight open-access period
April 13, 2026	from 07:00	Cluster restored to its standard configuration

On April 7, the cluster reconfiguration starts at 07:00 and is expected to complete by 12:00. The reservation starts once the reconfiguration is complete.

On April 13, the special Gordon Bell configuration ends at 07:00, when the cluster will be restored to its standard layout.

The temporary Gordon Bell layout provides 2300 nodes on Clariden. In practice, around 2050 nodes are expected to be available for production runs.

For regular users¶

Please plan your workloads with the following temporary changes in mind:

Clariden is unavailable to regular user jobs during the reserved window 09:00-19:00.
Jobs can still be submitted at any time and will be queued until an open-access window becomes available.
The maximum job length on Clariden is temporarily reduced from 12 hours to 6 hours during this period.
The Apertus training reservation (600 nodes) remains unchanged and continues during the overnight window.
The weekend of April 11-12, 2026 Clariden is expected to remain accessible. However, an extension of the reserved period may be required depending on progress during the week. Any change will be communicated as early as possible.

For Gordon Bell teams¶

The reserved period on Clariden is intended for large-scale Gordon Bell runs.
Reservation details, including any reservation names and operational instructions, will be communicated directly to the participating teams.
The maximum reservation window is 10 hours per day.
The CSCS Gordon Bell support team will be available to help teams prepare and execute successful runs.

Software environment¶

Clariden and Santis use the same software image during this period:

USS 1.3.1
NVIDIA driver 590
No CPE software stack

Daint uses a different image:

USS 1.1.0
NVIDIA driver 550
CPE software stack available

Warning

Clariden does not provide the CPE software stack. Gordon Bell teams should therefore prepare and validate their software environment on Clariden (or Santis), rather than on Daint. For most users, this means using a uenv, a container-based workflow, or a self-managed software stack.

Storage¶

The same shared filesystems are available across Daint, Clariden, and Santis:

capstor, iopsstor, and VAST are mounted on Clariden during the Gordon Bell runs
Home is shared between Daint, Clariden, and Santis
Scratch spaces are shared between Daint, Clariden, and Santis
Store/Project filesystems are mounted

For most large-scale run data, staging, and scratch-like workloads, /capstor/scratch/cscs/${USER} is the recommended choice.

Lustre striping¶

Uenvs¶

uenv images can be striped across multiple OSTs on the Lustre filesystem, which can significantly improve I/O performance for large files. Striping is applied automatically to all uenv images created in repositories that were either created in the last few months, or who have updated their repos. If your uenv images were created before this change, you can update your repository to apply striping to your existing images.

check and update uenv repositories for striping

$ uenv repo status
$ uenv repo update

Disabling core-dumps¶

If a large job crashes and tries to write core-dump files on thousands of processes, it will overwhelm the filesystem. Therefore we strongly recommend to disable them with the following command:

disable writing of core-dump files

$ ulimit -S -c0

MPI¶

MPI jobs on Clariden must be started with the Shasta MPI integration:

srun --mpi=cray_shasta ...

MPI may need longer than the default timeout to initialize in large scale runs. As a precaution, we recommend increasing the timeout from 180 seconds to 300.

export PMI_MMAP_SYNC_WAIT_TIME=300

NCCL¶

See the container engine documentation for information on using NCCL in containers. The NCCL documentation contains general information on configuring NCCL. This is especially important when using uenvs, as the relevant environment variables are not set automatically. Because Clariden and Santis do not provide CPE, Gordon Bell teams are strongly encouraged to validate their NCCL and MPI configuration in the exact runtime environment they plan to use for production.