NOTE: This will be the last release supporting Deployment Manager. Please migrate your workflows to Terraform (found in the tf folder). All future features / functionality will be integrated with Terraform.
The following describes setting up a Slurm cluster using Google Cloud Platform, bursting out from an on-premise cluster to nodes in Google Cloud Platform and setting a multi-cluster/federated setup with a cluster that resides in Google Cloud Platform.
Also, checkout the Slurm on GCP code lab.
The supplied scripts can be modified to work with your environment.
SchedMD provides professional services to help you get up and running in the cloud environment. SchedMD Commercial Support
Issues and/or enhancement requests can be submitted to SchedMD's Bugzilla.
Also, join comunity discussions on either the Slurm User mailing list or the Google Cloud & Slurm Community Discussion Group.
The supplied scripts can be used to create a stand-alone cluster in Google Cloud Platform. The scripts setup the following scenario:
The default image for the instances is CentOS 7.
On the controller node, slurm is installed in:
/apps/slurm/
The login nodes mount /apps and /home from the controller node.
To deploy, you must have a GCP account and either have the GCP Cloud SDK installed on your computer or use the GCP Cloud Shell.
Steps:
Edit the slurm-cluster.yaml
file and specify the required values
NOTE: For a complete list of available options and their definitions, check out the schema file.
Spin up the cluster.
Assuming that you have gcloud configured for your account, you can just run:
$ gcloud deployment-manager deployments [--project=<project id>] create slurm --config slurm-cluster.yaml
Check the cluster status.
You can see that status of the deployment by viewing: https://console.cloud.google.com/deployments
and viewing the new instances: https://console.cloud.google.com/compute/instances
To verify the deployment, ssh to the login node and run sinfo
to see how
many nodes have registered and are in an idle state.
A message will be broadcast to the terminal when the installation is complete. If you log in before the installation is complete, you will either need to re-log in after the installation is complete or start a new shell (e.g. /bin/bash) to get the correct bash profile.
$ gcloud compute [--project=<project id>] ssh [--zone=<zone>] g1-login0
...
[bob@g1-login0 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 8 idle~ g1-compute-0-[1-9]
debug* up infinite 2 idle g1-compute-0-[0-1]
NOTE: By default, Slurm will hide nodes that are in a power_save state -- "cloud" nodes. The GCP Slurm scripts configure PrivateData=cloud in the slurm.conf so that the "cloud" nodes are always shown. This is done so that nodes that get marked down can be easily seen.
Submit jobs on the cluster.
[bob@g1-login0 ~]$ sbatch -N2 --wrap="srun hostname"
Submitted batch job 2
[bob@g1-login0 ~]$ cat slurm-2.out
g1-compute-0-0
g1-compute-0-1
Tearing down the deployment.
$ gcloud deployment-manager [--project=<project id>] deployments delete slurm
NOTE: If additional resources (instances, networks) are created other than the ones created from the default deployment then they will need to be destroyed before deployment can be removed.
To deploy, you must have a GCP account and either have the GCP Cloud SDK and Terraform installed on your computer or use the GCP Cloud Shell.
Steps:
basic.tfvars
file and specify the required values$ terraform init
$ terraform apply -var-file=basic.tfvars
Tearing down the cluster
$ terraform destroy -var-file=basic.tfvars
NOTE: If additional resources (instances, networks) are created other than the ones created from the default deployment then they will need to be destroyed before deployment can be removed.
The deployment will create a
NOTE: When creating a compute image that has gpus attached, the process can take about 10 minutes.
If the compute image needs to be updated, it can be done with the following command:
$ gcloud compute images create <cluster_name>-compute-#-image-$(date '+%Y-%m-%d-%H-%M-%S') \
--source-disk <instance name> \
--source-disk-zone <zone> --force \
--family <cluster_name>-compute-#-image-family
Existing images can be viewed on the console's Images page.
There are two files, custom-controller-install and custom-compute-install, in the scripts directory that can be used to add custom installations for the given instance type. The files will be executed during startup of the instance types.
There are multiple ways to connect to the compute nodes:
$ srun --pty $SHELL
[g1-login0 ~]$ srun --pty $SHELL
[g1-compute-0-0 ~]$
By default, all instances are configured with OS Login.
OS Login lets you use Compute Engine IAM roles to manage SSH access to Linux instances and is an alternative to manually managing instance access by adding and removing SSH keys in metadata. https://cloud.google.com/compute/docs/instances/managing-instance-access
This allows user uid and gids to be consistent across all instances.
When sharing a cluster with non-admin users, the following IAM rules are recommended:
To allow ssh to login nodes without external IPs, configure IAP for the group.
This allows users to access the cluster only through the login nodes.
With preemptible_bursting on, when a node is found preempted, or stopped, the slurmsync script will mark the node as "down" and will attempt to restart the node. If there were any batch jobs on the preempted node, they will be requeued -- interactive (e.g. srun, salloc) jobs can't be requeued.
Bursting out from an on-premise cluster is done by configuring the ResumeProgram and the SuspendProgram in the slurm.conf to resume.py, suspend.py in the scripts directory. config.yaml should be configured so that the scripts can create and destroy compute instances in a GCP project. See Cloud Scheduling Guide for more information.
Pre-reqs:
There are two options: 1) setup DNS between the on-premise network and the GCP network or 2) configure Slurm to use NodeAddr to communicate with cloud compute nodes. In the end, the slurmctld and any login nodes should be able to communicate with cloud compute nodes, and the cloud compute nodes should be able to communicate with the controller.
Configure DNS peering
Use IP addresses with NodeAddr
TreeWidth=65533
update_node_addrs
to true
in config.yamlCreate a base instance
Create a bare image and install and configure the packages (including Slurm)
that you are used to for a Slurm compute node. Then create an image
from it creating a family either in the form
"
Create a service account and service account key that will have access to create and delete instances in the remote project.
Install scripts
Install the resume.py, suspend.py, slurmsync.py and config.yaml.example from the slurm-gcp repository's scripts directory to a location on the slurmctld. Rename config.yaml.example to config.yaml and modify the approriate values.
Add the path of the service account key to google_app_cred_path in config.yaml.
Add the compute_image_family to each partition if different than the naming
schema, "
Modify slurm.conf:
PrivateData=cloud
SuspendProgram=/path/to/suspend.py
ResumeProgram=/path/to/resume.py
ResumeFailProgram=/path/to/suspend.py
SuspendTimeout=600
ResumeTimeout=600
ResumeRate=0
SuspendRate=0
SuspendTime=300
# Tell Slurm to not power off nodes. By default, it will want to power
# everything off. SuspendExcParts will probably be the easiest one to use.
#SuspendExcNodes=
#SuspendExcParts=
SchedulerParameters=salloc_wait_nodes
SlurmctldParameters=cloud_dns,idle_on_node_suspend
CommunicationParameters=NoAddrCache
LaunchParameters=enable_nss_slurm
SrunPortRange=60001-63000
Add a cronjob/crontab to call slurmsync.py to be called by SlurmUser.
e.g.
*/1 * * * * /path/to/slurmsync.py
Test
Try creating and deleting instances in GCP by calling the commands directly as SlurmUser.
./resume.py g1-compute-0-0
./suspend.py g1-compute-0-0
The simplest way to handle user synchronization in a hybrid cluster is to use
nss_slurm
. This permits passwd and group resolution for a job on the compute
node to be serviced by the local slurmstepd process rather than some other
network-based service. User information is sent from the controller for each
job and served by the slurm step daemon. nss_slurm
needs to be installed on
the compute node image, which it is when the image is created with deployment
manager or Terraform. For details on how to configure nss_slurm
, see
https://slurm.schedmd.com/nss_slurm.html.
Slurm allows the use of a central SlurmDBD for multiple clusters. By doing this, it also allows the clusters to be able to communicate with each other. This is done by the client commands first checking with the SlurmDBD for the requested cluster's IP address and port which the client then uses to communicate directly with the cluster.
Some possible scenarios:
The following considerations are needed for these scenarios:
For more information see:
Multi-Cluster Operation
Federated Scheduling Guide
$ scontrol setdebugflags +powersave
...
$ scontrol setdebugflags -powersave
Cluster environment not fully coming up
For example: