Running Jobs

This page discusses setting up and running jobs with Slurm

Generally speaking, to run a job on a cluster you will need the following:

  • to have an Account
  • to be associated with a Project
  • to have an active SSH connection to the cluster

Batch jobs are jobs that run non-interactively under the control of a "batch script" or "SLURM job". These are created using text files containing a number of job directives and LINUX commands or utilities. Batch scripts are submitted to the batch scheduler where they are queued to run when resources become available.

ARCC resources all utilize job or batch scheduling software to monitor and run code, ideally achieving optimum utilization of that resource, including processors, memory, and disk space.

A handy migration reference to compare MOAB/Torgue commands to SLURM commands can be found on the SLURM home site: Batch system Rosetta stone

How to Set Up a Job in Slurm
To run a job on a cluster, first prepare a shell script, using your favorite editor, which invokes whatever commands you want, including your self-written program.

SLURM Scripts

In SLURM scripts, lines beginning with "#SBATCH" are Job directive lines. These directives feed the batch system parameters such as how many nodes to reserve for your job and how long to reserve those nodes. Directives can also specify things like what to name STDOUT files, what account to charge, whether to notify you by email when your job finishes, etc. Many of the useful switches are demonstrated within the scripts; you don't have to specify all of them when you submit your job. For example:

#!/bin/sh

### This is a general SLURM script. You'll need to make modifications for this to 
### work with the appropriate packages that you want. Remember that the .bashrc 
### file will get executed on each node upon login and any settings in this script
### will be in addition to, or will override, the .bashrc file settings. Users will
### find it advantageous to use only the specific modules they want or 
### specify a certain PATH environment variable, etc. If you have questions,
### please contact ARCC for help.

### Informational text is usually indicated by "###". Don't uncomment these lines.

### Lines beginning with "#SBATCH" are SLURM directives. They tell SLURM what to do.
### For example, #SBATCH --job-name my_job tells SLURM that the name of the job is "my_job".
### Don't remove the "#SBATCH".

### Job Name
#SBATCH --job-name my_job

### Declare an account for the job to run under
#SBATCH --account=account_name

### By default, the standard output and error streams are sent to files in the current 
### working directory with names:
### job_name.osequence_number  (output stream)
### job_name.esequence_number  (error stream)
### where job_name is the name of the job and sequence_number is the job number assigned 
### when the job is submitted.
### Use the directives below to change the files to which the standard output and 
### error streams are sent.
#SBATCH -o stdout_file
#SBATCH -e stderr_file

### The directive below directs that the standard output and error streams are to be merged, 
### intermixed, as standard output. 
# # # This doesn't work yet....  SBATCH -j oe

### mailing options
#SBATCH --mailtype=abe
#SBATCH --mail-user=username@uwyo.edu

### Specify Resources
### 2 nodes, 16 processors (cores) each node
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16

### Set max walltime (days-hours:minutes:seconds)
#SBATCH --time=0-01:00:00

### Change to working directory (location where qsub was done)
#  #  #  echo "Working Directory:  $PBS_O_WORKDIR"
#  #  # cd $PBS_O_WORKDIR

### Start the job
### Command normally given on command line
srun my_program < input.file

Hello World script

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --account=ARCC
#SBATCH --time=0-00:01:00
 
###<doesn't work yet>SBATCH -S /bin/bash 
### If copying this code, don't forget to change the account
### specification below to an account you have access to.
 
# print out a hello message
# indicating which host the script ran on
echo "Hello World from hosts: $SLURM_NODELIST"

Submitting a job
Now that you have a script created, all you have to do is submit your SLURM script to the batch scheduler using the sbatch command:

[username@mmmlog4 ~]$ sbatch myJobScript

Which should return something like this:

[username@mmmlog4 ~]$ sbatch HelloWorld.pbs
Submitted batch job 39

You can use the squeue command to display the status of all your jobs:

$ squeue -u yourUserName

and scancel to delete a particular job from the queue:

$ scancel jobID

Viewing the results
Once your job has completed, you should see two files in the directory from which you submitted the job. By default, these will be named <jobname>.oXXXXX and <jobname>.eXXXXX (where the <jobname> is replaced by the name of the SLURM script and the X's are replaced by the numerical portion of the job identifier returned by sbatch).
In the Hello World example, any output from the job sent to "standard output" will be written to the hello.oXXXXX file and any output sent to "standard error" will be written to the hello.eXXXXX file.

The above information is generally applicable, but be aware that every cluster is different. You will be far more successful in your efforts if you read and understand the documentation concerning the specific cluster you are going to use. For information about the compute resources supported by ARCC, please proceed to the ARCC systems page.