Clusters

Introduction

Cluster Service – Introduction

The Cluster Service allows users to provision on-demand compute clusters composed of virtual machines or bare metal nodes. Designed for AI, ML, and general-purpose workloads, it provides a streamlined way to deploy and manage multiple instances through a unified UI and API. Networking, storage, and image management are abstracted and handled by the platform, so users can focus on running workloads without managing infrastructure complexity.

Benefits of the Cluster Service

No networking complexity: Instances are auto-connected within a shared project network.
Easy provisioning: Minimal configuration via a single pane of glass covering compute, networking, and storage.
Use only what you need: Fractional flavours reduce cost for inference, testing, and right-sized workloads.
Run GPU workloads out of the box: CUDA-enabled images are available and ready to use.
Full control over lifecycle: Start, stop, and delete instances without requiring orchestration tools.
OS-level access: SSH into any node with root access to configure and operate the system directly.

Architecture

This diagram illustrates two high-level deployment models for compute infrastructure:

Left Side: Virtualised Architecture Compute resources are virtualised using a hypervisor running on top of a host OS. Virtual machines (VMs) are provisioned for CPU or GPU workloads. GPU VMs include a guest OS with CUDA support to enable accelerated computing.
Right Side: Bare-Metal Architecture Applications run directly on the host OS without a virtualisation layer. CUDA is installed at the host level, providing direct access to GPU resources for maximum performance.

Cluster Provisioning

To create a new cluster, navigate to the Clusters section under Compute on the left menu.

Select Create new resource.

Provide a name, select a project, and choose the deployment region. In the next step, define the workload pool configuration:

Set the pool name using alphanumeric characters and hyphens.
Select the desired node type.
Enable persistent storage and specify the desired volume size.
Choose the operating system version and define the number of replicas.
Enable or disable public IP exposure as needed.
Configure firewall rules to control inbound access (e.g., allow SSH on port 22).
Optionally, you can provide a cloud-init configuration to run custom setup scripts during the first boot of each node.
Once the configuration is completed, review the cluster and provision it. When provisioning is complete, the cluster status will be set to ‘Provisioned’, and connection details, including the SSH private key and workload pool IP addresses, will be available in the cluster overview.

Accessing the Cluster

This section describes how to access your cluster using SSH. You will download the private key, identify the public IP address of the instance, and establish a secure connection using a supported terminal.

Prerequisites

The following information is required to establish a connection. Refer to the steps below to locate each item.

The public IP address of the instance. This is shown in the workload pool details after provisioning.
The SSH private key associated with the cluster. This key is available for download from the cluster UI.
The SSH username. For Ubuntu-based images, the default username is cloud-user.

Connecting to the cluster

1. Download the private key

Navigate to the cluster view and select Download SSH private key.

This key is used for authentication when initiating an SSH session. Store the file securely and set the correct permissions. Use the following command to create a temporary file containing the SSH private key. This example writes the key to /tmp/ssh-key-193.143.123.227 and sets secure permissions to restrict access to the current user.

cat > /tmp/ssh-key-193.143.123.227 <<'KEY'
-----BEGIN OPENSSH PRIVATE KEY-----
<REDACTED PRIVATE KEY CONTENT>
-----END OPENSSH PRIVATE KEY-----
KEY

chmod 600 /tmp/ssh-key-193.143.123.227

2. Locate the public IP address

From the cluster dashboard:

Choose All pools.
Select the workload pool to view its details.
Under Network, find the Public IP field.

In this example, the cluster pool is assigned the public IP address 193.143.123.230.

Remove the previous host key (optional)

If the VM was re-created or reassigned, the SSH host key may have changed. Use the following command to remove the previous host key entry from the known hosts file:

ssh-keygen -R 193.143.123.227

This step prevents host key verification errors during SSH connections.

3. Connect using SSH

Run the following command from your terminal:

ssh -i /tmp/ssh-key-193.143.123.227 cloud-user@193.143.123.227

If prompted to confirm the authenticity of the host, type yes to proceed. Upon successful login, the shell prompt appears:

cloud-user@user-guide-pool-<pool-id>:~$

This confirms that the connection is active and the key was accepted. The prompt includes:

cloud-user — the Linux user account used for the SSH session.
user-guide-pool-<pool-id> — the hostname of the instance, which <pool-id> is the unique identifier for the workload pool this VM belongs to.

Monitoring and observability

Users can monitor resource utilisation directly from the operating system using the Interactive Process Viewer htop.

htop

More information can be found in GitHub-htop

Create a new pool on the existing cluster

Once inside the cluster, open the Workload Pools tab. Use the “+ Add Pool” button to start a new pool configuration.

Once you are in the cluster, use the “+ Add Workload Pool” button to start the new pool configuration.

Define the name, base image, instance type, replica count, and any required firewall rules, following the steps outlined in the Cluster Provisioning section. Finalise by selecting Create. The new pool will appear in the list as provisioned.

Evicting a VM from an existing pool

It is possible to evict a Virtual Machine from an existing pool. This process does not automatically migrate any existing workloads from the evicted VM to other Virtual Machines in the pool. The user has to manage the workload before eviction manually. Select the VM to evict from the pool by clicking the three dots on the right.

After a short while, the VM will be removed from the pool, its resources returned to the available resource pool, and any public IP associated with the VM will be released. This process will also delete any data stored on ephemeral storage. To retain data, persistent storage must be configured separately.

The pool should now show that only 3 Virtual Machines remain active out of the 3 originally in the pool.

Deleting a pool from an existing cluster

This process operates at a higher level than deleting a single VM. The user does not need to evict the VMs individually. Once the pool is deleted, all VMs within it will also be evicted automatically. All associated resources, including ephemeral data and public IPs, will be released from the cluster. Users must ensure that any data they wish to retain is saved to persistent volumes beforehand. Select the pool you want to delete. Click the three-dot menu on the right and choose Delete Pool.

To confirm the delete operation, enter the name of the pool in the confirmation panel.

The selected pool is not part of the cluster.

Delete a cluster

The cluster user-guide-test contains one or more workload pools with running Virtual Machines. Before initiating the deletion process, ensure that any data that needs to persist is stored in a persistent volume. Ephemeral data will be permanently deleted. In the Workload Pools tab, click the green Provisioned button in the top-right corner of the cluster view.
Select Delete cluster from the dropdown menu to begin the deletion process.

Deleting a cluster will permanently remove all resources associated with it, including:

All workload pools
All Virtual Machines
Any ephemeral storage
Any assigned public IPs
All resources will be evicted and released back to the shared resource pool.

A confirmation panel appears. To confirm the operation, enter the name of the cluster in the dialogue box. Once the deletion is initiated, the cluster status changes to Deprovisioning.
When the process completes, the cluster is removed from the Compute Clusters list and all associated resources are fully deallocated.

Getting Started

AI Services

Compute

Network

Storage

Manage

Introduction

Cluster Service – Introduction

Benefits of the Cluster Service

Architecture

Cluster Provisioning

Accessing the Cluster

Prerequisites

Connecting to the cluster

1. Download the private key

2. Locate the public IP address

Remove the previous host key (optional)

3. Connect using SSH

Monitoring and observability

Create a new pool on the existing cluster

Evicting a VM from an existing pool

Deleting a pool from an existing cluster

Delete a cluster

Getting Started

AI Services

Compute

Network

Storage

Manage

​Introduction

​Cluster Service – Introduction

​Benefits of the Cluster Service

​Architecture

​Cluster Provisioning

​Accessing the Cluster

​Prerequisites

​Connecting to the cluster

​1. Download the private key

​2. Locate the public IP address

​Remove the previous host key (optional)

​3. Connect using SSH

​Monitoring and observability

​Create a new pool on the existing cluster

​Evicting a VM from an existing pool

​Deleting a pool from an existing cluster

​Delete a cluster

Introduction

Cluster Service – Introduction

Benefits of the Cluster Service

Architecture

Cluster Provisioning

Accessing the Cluster

Prerequisites

Connecting to the cluster

1. Download the private key

2. Locate the public IP address

Remove the previous host key (optional)

3. Connect using SSH

Monitoring and observability

Create a new pool on the existing cluster

Evicting a VM from an existing pool

Deleting a pool from an existing cluster

Delete a cluster