Scheduled Maintenance and System Unavailability Policy¶
To ensure the reliability and performance of the Alps production vClusters, CSCS continues to implement rolling updates aimed at reducing downtime during routine maintenance. However, regular interventions are still necessary at this stage.
Advance notice¶
We strive to announce scheduled system unavailability at least one week in advance. In some cases, earlier notice may be possible, although this depends on external factors and internal approval processes.
Shared infrastructure¶
Alps is a shared research infrastructure supporting a diverse range of research communities, partners, and projects. Occasionally, the system may be temporarily dedicated to specific scientific projects to enable large-scale capability runs.
Maintenance and availability cadence¶
To help users plan their activities within each allocation quarter, we provide a tentative schedule of system unavailability. Please note that this schedule is subject to change based on operational requirements:
Routine maintenance¶
- Cadence: Occurs weekly, depending on need
- Typical duration: Half a day; occasionally up to one full day
Extraordinary maintenance¶
- Cadence: At least once per quarter
- Typical duration: Two days; may be extended if necessary
Dedicated large-scale capability runs of scientific projects¶
- Cadence: At most once per quarter
- Typical duration: One week
Communication and feedback¶
CSCS values the constructive feedback provided by users. We will use this input to enhance our communication practices and to develop mitigation strategies for scheduled events that may significantly impact system usability.