User Manual
Torque/Maui Portable Batch System (PBS)
PBS Torque is a workload management system for GNU/Linux farms (clusters). It supplies tools to submit,
monitor, and delete jobs. It has the following components:
- pbs_server - Is the Job Server which provides the basic batch services such as receiving/creating a batch job, modifying the job and running the job.
- pbs_mom - Is the Job executor. Is a daemon that places the job into execution machine when it receives a copy of the job from the Job Server.
- maui - Is the Job Scheduler that contains the site's policy controlling which job is run and where and when it is run.
The steps needed to run a batch job are:
- Create a Job script containing the following options:
- PBS options for requesting the resources that will be needed (i.e. walltime, memory, etc).
- variables and commands to run the task
- Submit the Job with qsub command or create a run script to automate the process
- Monitor and manage the Job
CREATE A JOB SCRIPT
To create the script we need to know some variables and options included in Torque.
PBS Environment Variables
The following list are some of the more useful environment variables available for you to use in your scripts:
OPTION | Description |
PBS_O_HOST | The host machine on which the qsub command was run |
PBS_O_HOSTNAME | The login name on the machine on which the qsub was run |
PBS_O_HOME | The home directory from which the qsub was run |
PBS_O_WORKDIR | The working directory from which the qsub was run |
The following variables relate to the environment where the job is executing:
OPTION | Description |
PBS_O_QUEUE | The orginial queue to which the job was submitted |
PBS_JOBID | The identifier that PBS assigns to the job |
PBS_JOBNAME | The name of the job |
PBS Options
These are some of the commonly used PBS options that you may use in a job script. The options starts with "
#PBS "
OPTION | Description |
#PBS -N myJob | Assigns a job name. The default is the name of PBS job script |
#PBS -q queuename | Assigns the queue your job will use |
#PBS -l walltime 01:00:00 | The maximum wall-clock time during which this job can run |
#PBS -l mem=200mb | The Maximum amount of physical memory used by the job |
#PBS -o myPath | The path of the standard output file |
#PBS -e myPath | The path of the standard error file |
#PBS -j oe | Join option that merges the standard error stream with the standard output stream of the job |
#PBS -M user@ific.uv | Declares the user or list of users to whom mail is sent by the execution server |
#PBS -m b | Sends mail to the user when the job begins |
#PBS -m e | Sends mail to the user when the job ends |
#PBS -m a | Sends mail to the user when job aborts (with an error) |
To use your alias definitions in your pbs scripts you must included this line in your .bashrc:
shopt -s expand_aliases
Job Script template
The following job script template should be modified for the need of the job. A job script may consist of PBS directives, comments and executable statements.
A PBS directive provides a way of specifying job attributes in addition to the
command line options. For example:
## PBS OPTIONS
#PBS -N testing
#PBS -q short
#PBS -l nodes=1:ppn=1,walltime=00:30:00
#PBS -M anmaro@ific.uv.es
#PBS -m bae
for a in {1..5}
do
sleep $a
done
Another example to run a process with ana. S1, S2 and S3 are variables come from an external script. S1 is the initial file to process, S2 is the last file to process and S3 is the run number.
This example executes one ana command per input file and create one output .evt.root file per input.
#PBS -N testing
#PBS -q short
#PBS -l nodes=1:ppn=1,walltime=00:30:00
#PBS -M anmaro@ific.uv.es
#PBS -m bae
# PATHS DEFINITION
INPUT=/data4/NEXT/NEXT1IFIC/Run4/FMWK/FDATA/dataRaw2PMaps/${S3}
OUTPUT=/data4/NEXT/NEXT1IFIC/Run4/FMWK/FDATA/dataRaw2PMaps/test_output/${S3}
LOG_FILES=$PBS_O_WORKDIR/log_files
XML_FILE=$PBS_O_WORKDIR/xml
mkdir -p $OUTPUT
for a in $(seq $S1 $S2) # This loop executes one ana command per input file
do
num=`printf "%#03d\n" $a` # File format 000,001,.. (with 3 characters)
input_file=`cd ${INPUT} && ls *_${num}.* ` # Read the input data file
ana -b 1 -x $XML_FILE/RecalibrateAnodeJOB.xml -n -1 -i $INPUT/${input_file} -o $OUTPUT/Run_${num}.evt.root -g $OUTPUT/histos/histo_$num.histo
done
A variant of a loop to run ana with a list of input files is shown in the next example. Here the variable called file will contains a list of input files.
file=' '
for a in $(seq $S1 $S2)
do
num=`printf "%#03d\n" $a`
file+=${INPUT}/'Raw2PMaps2_3350_file_'${num}'.evt.root '
#echo $file >> $PBS_O_WORKDIR/input
#i=`expr $i + 1`
ana -b 1 -x $XML_FILE -n -1 -i ${file} -g $OUTPUT/${S3}-PMTER_${S1}_${S2}.root
done
For those people who use python scripts, they should use a wrapper bash script to call the python script. Here you can see an example:
#PBS -N test_python
#PBS -q short
#PBS -l walltime=00:30:00
#PBS -M anmaro@ific.uv.es
#PBS -m bae
#PBS -o log_files
#PBS -e log_files
#PBS -j eo
python $PBS_O_WORKDIR/ReCal.py $S1 $S2
SUBMITTING A JOB
The command to submit a job is
qsub. For example, you can submit a new job with:
qsub script.sh
Additionally, you can include a list of parameters in the qsub with the -v option. Then, you could use these parameters inside of the script.
qsub script.sh -v S1=$ini,S2=$PATH,S3=20
MONITOR AND MANAGE THE JOB
Torque/PBS has some tools to monitoring and managing the jobs (qstat, qdel, etc)
Option | Description |
qstat -a | check status of jobs, queues, and the PBS server |
qstat -f | get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. |
qdel jobID | delete a job from the queue |
qhold jobID | hold a job if it is in the queue |
qrls jobID | release a job from hold |
Starting to use Neutrinos PBS
You must be access to neutrinos1 across SSH and follow the steps on the previous sections (see The steps needed to run a batch job).
At this moment we have available three kind of queues (This is not the final configuration is only for testing purposes) :
- short - for jobs that require less than 4 hour of time
- long - for jobs that require less than 24 hours
If you have any question, please let me know:
anmaro@ific.uv.es
Torque Cheat Sheet
Frequently Used Commands
Command | Description |
qsub [script] | Submit a pbs job |
qstat [jobid] | Show status of pbs batch jobs |
qdel [jobid] | Delete pbs batch job |
qhold [jobid] | Hold pbs batch jobs |
qrls [jobid] | Release hold on pbs batch jobs |
Check Queue and Job Status
Command | Description |
qstat -q | List all queues |
qstat -a | List all jobs |
qstat -au [userid] | list jobs for userid |
qstat -r | Running Jobs |
qstat -f [jobid] | List full information about job_id |
qstat -Qf [queue] | List full information about queue |
qstat -B | List summary status of the job server |
References
http://www.clusterresources.com/torquedocs/commands/qsub.shtml
http://www.clusterresources.com/torquedocs/2.1jobsubmission.shtml
http://www.eresearchsa.edu.au/pbs_exitcodes
--
AntonioMari - 05 Mar 2013