Skip to content

Gordon Bell runs 2026

Info

Gordon Bell runs will take place on Clariden from Tuesday, April 7, 2026 to Monday, April 13, 2026.

During this period, Clariden will be temporarily expanded to 2300 GH200 nodes, Daint will operate with a reduced compute capacity, while Santis will be unavailable.

All times on this page are local CSCS time (Europe/Zurich, CEST).

Warning

During the daily reserved window, Clariden will be dedicated to Gordon Bell teams and unavailable for regular user jobs.

Regular user jobs can still be submitted at any time and will be scheduled automatically during the open-access windows.

Clariden

Connecting

Connecting to Clariden via SSH is the same as for Daint and Santis, see the SSH guide for more information.

Add the following to your SSH configuration to enable you to directly connect to Clariden using ssh clariden.

Host clariden
    HostName clariden.alps.cscs.ch
    ProxyJump ela
# change cscsusername to your CSCS username
    User cscsusername
    IdentityFile ~/.ssh/cscs-key
    IdentitiesOnly yes

Reservations

The cluster reconfiguration starts on April 7 at 07:00 and is expected to complete by approximately 12:00. The special Gordon Bell configuration is currently planned to remain in place until April 13 at 07:00, when the cluster will be restored to its standard layout. The temporary Gordon Bell layout provides 2300 nodes on Clariden. In practice, around 2050 nodes are expected to be available for production runs.

Currently planned Gordon Bell reservation windows:

Date Time (CEST) Status
April 7 07:00-approximately 12:00 Cluster reconfiguration to the Gordon Bell layout
April 7 14:00-20:00 440 nodes reserved for Gordon Bell runs
April 7 14:00-18:00 600 nodes reserved for Apertus training runs
April 7/8 20:00-08:00 600 nodes reserved for Apertus training runs
April 8 08:00-12:00 1500 nodes reserved for Gordon Bell runs
April 8 12:00-21:00 2300 nodes reserved for Gordon Bell runs
April 8/9 21:00-11:00 600 nodes reserved for Apertus training runs
April 9 11:00-15:00 2300 nodes reserved for Gordon Bell runs
April 9/10 16:00-06:00 600 nodes reserved for Apertus training runs
April 9/10 16:00-06:00 1200 nodes reserved for Apertus training runs (MoE runs)

These times may still change as the week progresses. Any updates, including possible weekend extensions, will be communicated as early as possible. When there are non-reserved nodes available, regular user jobs will be scheduled as usual, with the maximum job length temporarily reduced to 6 hours.

On April 13, the special Gordon Bell configuration ends at 07:00, when the cluster will be restored to its standard layout.

Software environment

Clariden and Santis use the same software image during this period:

  • USS 1.3.1
  • NVIDIA driver 590
  • No CPE software stack

Daint uses a different image:

  • USS 1.1.0
  • NVIDIA driver 550
  • CPE software stack available

Warning

Clariden does not provide the CPE software stack. Gordon Bell teams should therefore prepare and validate their software environment on Clariden (or Santis), rather than on Daint. For most users, this means using a uenv, a container-based workflow, or a self-managed software stack.

Storage

The same shared filesystems are available across Daint, Clariden, and Santis:

  • capstor, iopsstor, and VAST are mounted on Clariden during the Gordon Bell runs
  • Home is shared between Daint, Clariden, and Santis
  • Scratch spaces are shared between Daint, Clariden, and Santis
  • Store/Project filesystems are mounted

For most large-scale run data, staging, and scratch-like workloads, /capstor/scratch/cscs/${USER} is the recommended choice.

Lustre striping

Uenvs

uenv images can be striped across multiple OSTs on the Lustre filesystem, which can significantly improve I/O performance for large files. Striping is applied automatically to all uenv images created in repositories that were either created in the last few months, or who have updated their repos. If your uenv images were created before this change, you can update your repository to apply striping to your existing images.

check and update uenv repositories for striping
$ uenv repo status
$ uenv repo update

Disabling core-dumps

If a large job crashes and tries to write core-dump files on thousands of processes, it will overwhelm the filesystem. Therefore we strongly recommend to disable them with the following command:

disable writing of core-dump files
$ ulimit -S -c0

MPI

MPI jobs on Clariden must be started with the Shasta MPI integration:

srun --mpi=cray_shasta ...

MPI may need longer than the default timeout to initialize in large scale runs. As a precaution, we recommend increasing the timeout from 180 seconds to 300.

export PMI_MMAP_SYNC_WAIT_TIME=300

NCCL

See the container engine documentation for information on using NCCL in containers. The NCCL documentation contains general information on configuring NCCL. This is especially important when using uenvs, as the relevant environment variables are not set automatically. Because Clariden and Santis do not provide CPE, Gordon Bell teams are strongly encouraged to validate their NCCL and MPI configuration in the exact runtime environment they plan to use for production.