The purpose of the queuing system is (1) to promote the efficient utilization of our computer facilities, and (2) to promote equity of access to those resources across our user community. This page is designed to help users get started with this system.
Users should edit their shell scripts to add special directives to the queue system, beginning with "#PBS", that request resources, declare a required walltime, and direct standard output and error. Users can then "submit" their job using commands like "qsub," as described in the next section.
A simple example of such a script can be found in the attached apoa.sh, which runs the ApoA1 benchmark. Other examples will be posted soon.
A job name is assigned with a "#PBS -N" statement, the destination queue is specified using a "#PBS -q" statement:
#PBS -N solution_equilibration_273K |
#PBS -q short |
The standard output and standard error can be directed to files using the "#PBS -o" or "#PBS -e" directives. These two streams can be joined using a "#PBS -j oe" directive.
#PBS -e solution_equil.err #PBS -o solution_equil.log |
Users can request resources using a "#PBS -l" statement. Resources include the walltime (in mm:ss or hh:mm:ss format) and the number of nodes and number of processors per node. In the example below, several alternative examples of node requests are given to illustrate the possible syntax; only one would be included
#PBS -l walltime=14:30:00 |
#PBS -l nodes=1:ppn=4 OR #PBS -l nodes=1:ppn=8 OR #PBS -l nodes=n024:ppn=8 OR #PBS -l nodes=1:ppn=8+nodes=1:ppn=4 OR #PBS -l nodes=n024:ppn=8+nodes=1:ppn=8 |
Some or all of these arguments can also be given at the command line. Command-line settings override any settings in the script.
[bashprompt]$ qsub -q short -l walltime=5:00:00 -l nodes=2:ppn=8 -e test_stderr.txt ./test_simulation.sh |
Some notes and suggestions for users:
The following commands can be used to submit and manage jobs:
command |
purpose |
---|---|
qsub jobscript |
Submit job in script jobscript. Can accept other arguments as discussed above. |
qsub -I -l nodes=1:ppn=4 |
Request interactive job with indicated resources. |
qdel jobID |
Delete job number jobID. Seems to kill processes on compute nodes cleanly. |
qstat |
List active jobs |
qstat -f jobID |
List detailed information for job number jobID. |
qnodes |
List all nodes and their state and properties. |
qnodes -l down |
List those nodes currently down. |
qnodes -l active or qnodes -l active |
List nodes currently used for jobs. |
qnodes -l free |
List nodes currently free. |
qmgr -c "print server" |
Print queue configuration details |
The PBS queue system allocates a set of nodes and processors to an individual job, either for the walltime specified in the job or the maximum walltime in the queue. It then provides a set of environmental variables to the shell in which the script runs, such as PBS_NODEFILE, the temporary node file describing allocated CPUs.
When running with OpenMPI's mpiexec, the submitted script seems to just launch processes without needing that nodefile specified as an argument to mpiexec, although it's not clear whether that behavior is a feature or a bug
The following tables are available in printer-friendly form in an attached file. Note that the settings can be adjusted to meet users' needs as those needs become clear.
Queue attributes on Cyrus1 and Quantum2
|
debug |
short |
long |
---|---|---|---|
max walltime |
20 min |
24 hr |
6 days |
max nodes per job |
1 |
2 |
1 |
priority |
100 |
80 |
60 |
Queue attributes on Darius
|
debug |
short |
long |
---|---|---|---|
max walltime |
20 min |
24 hr |
12 days |
max nodes per job |
1 |
4 |
8 |
priority |
100 |
80 |
60 |