Running Jobs on CSC Clusters using PBS
Use of the cluster compute nodes is managed using the Portable Batch System (PBS). This system essentially keeps users from stepping on each other’s toes by dedicating the requested number of processors to every job. To use the cluster, users submit jobs to an execution queue. PBS locates the requested number of nodes, locks them for the exclusive use of the job, and executes the job on the user’s behalf. To run jobs on the cluster, you need to write PBS scripts that instruct PBS how to run your job.
Writing and Submitting PBS Scripts
A PBS script is essentially a shell script prefixed by several lines that describe how PBS should schedule and run the job. You don’t execute a PBS script at the command line directly, though! PBS scripts should be submitted to the queue using the ‘qsub’ command as shown in the following example.
Here is a minimal PBS batch script:
#PBS -S /bin/bash
#PBS -l nodes=5:ppn=2,ncpus=10
#PBS -l walltime=12:00:00
#PBS -q friendlyq
mpirun -np 10 -machinefile $PBS_NODEFILE ./program
The lines at the top of the file, prefixed with #PBS, are interpreted by PBS when the job is submitted. Common PBS directives control the number of nodes requested, the desired walltime, the desired queue, and output delivery options. The remainder of the PBS script is executed on one of your assigned compute nodes when your job is run.
To submit a job to the queue, use the queue submit command:
The queue manager will respond with a job identification number, such as:
The job identification number can be used to delete, hold, or release a PBS job. For example, to delete the job, use the command:
To view the status of jobs waiting and running in queues, use the queue status command:
This returns a list of every job waiting and executing on the system:
hemisphere.cs.colorado.edu: Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time 246247.hemisphe username friendly dist_82 2353 1 2 -- 120:0 R 10:46 246417.hemisphe username friendly dist_lar_4 19509 1 2 -- 120:0 R 109:0 246418.hemisphe username friendly dist_lar_5 1 2 -- 120:0 Q 246419.hemisphe username friendly dist_lar_6 1 2 -- 120:0 Q
Jobs can be in several states as shown in the ‘S’ column. Jobs with state ‘Q’ are queued, ‘R’ are running, and ‘E’ are exiting. (Jobs shouldn’t be in the exiting state too long; if jobs take more than 10 seconds to end, there’s probably a problem with the system.) For detailed information about any job, use:
This full job status report includes a textual comment that can be helpful in determining why a queued job is not running.
Once a job terminates, it disappears from the queue and qstat can no longer retrieve information about it. To examine the status of recently completed jobs, use the queue history command:
Running Parallel Jobs using MPI
Because CSC systems frequently include multiple versions of MPI, PBS files used to run MPI programs require specific configuration information. First, the full path to mpirun must be specified. This is usually in the /opt/mpich-* directory and must match the version use to compile and link the program. Second, the mpirun command must include the “machinefile $PBS_NODEFILE” argument as shown below:
#PBS lines at top of script
/opt/mpich-version/bin/mpirun -machinefile $PBS_NODEFILE -np 10 ./program
If you accidentally omit the -machinefile argument, then your jobs may be run on a random collection of nodes instead of the nodes that PBS allocated for you. In this case, your jobs will be flagged as queue violators and will be terminated when the cluster’s automatic queue compliance checker executes.
Running Single-Processor Jobs
PBS can also be used to run single processor jobs, which makes running parameter studies easy. The only item is concern is the job’s memory requirement. If a job uses less than the amount of memory available to each CPU in a node, then it’s safe to let another job run on the other CPU. If the job requires more than that amount of memory, then all of the CPUs in the entire node should be reserved.
For example, a Hemisphere node contains 2 GB of memory. If your single processor job requires less than 1 GB of memory, then you can request one node and one processor. If your job requires more than 1 GB of memory, then you should request one node and reserve both processors.
To request one node and one CPU:
#other PBS lines at top of script
#PBS -l nodes=1:ppn=1,ncpus=1
To request one node and reserve both CPUs:
#other PBS lines at top of script
#PBS -l nodes=1:ppn=2,ncpus=2
Jobs that exceed the amount of physical memory available to a node cause swapping and will be terminated, so make sure to reserve the correct number of processors.