Using the paralleljob command to submit jobs

On Big Red, a script named paralleljob provides a convenient method for submitting some parallel (multiple-processor) programs to the LoadLeveler batching and queuing system. Suitable programs must consist of just one executable file (in contrast to some master/worker programs in which the master and workers are different executable files).

The script is designed so that programs can be submitted as jobs by prefixing the command-line with the word paralleljob. It is also possible to specify the number of processes to start, how long the job should be allowed to run and the queue to which the job should be submitted. The default is to launch 4 processes for up to 2 hours in the MED queue of Big Red. The general form of the command is

rr@BigRed:~> paralleljob program-name [ program-options ]  [ -CPUS np ] \
                                      [ -wallhours n ] [ -queue queue-name ]

... where program-name is the name of the program that you wish to be submitted as a job, program-options are command-line options that you would pass to the program, np is the number of processes to start, n is the number of hours that the job should be allowed to run, and queue-name is the name of the queue to which the job will be submitted.

Examples:

For example, suppose you've written a program called speedster that takes options that specify speed and the name of the file to be processed. To run the program with 4 processes for up to 2 hours in the MED queue, you would enter the command

rr@BigRed:~> paralleljob speedster -speed super mydata.dat

To launch 16 processes and run for up to 48 hours, you would enter the command

rr@BigRed:~> paralleljob speedster -speed super mydata.dat -CPUS 16 \
                                   -wallhours 48

And, to launch 512 processes and run for 10 hours in the BIG queue of Big Red, you would enter the command

rr@BigRed:~> paralleljob speedster -speed super mydata.dat -CPUS 512 \
                                   -wallhours 10 -queue BIG

If the program that you wish to run is not on your default path, use the fully qualified path name of the program.

When your job runs, the current working directory of your program is the directory from which you ran the paralleljob command.

Processor and Walltime Limits

How many processes can be launched in which queues, and how long can they be allowed to run? In the default queue (MED) you can request up to 128 processes for up to 336 hours (14 days). In the BIG queue, you can request up to 1024 processes for up to 120 hours (5 days). The FAST queue is available for debugging, and it allows up to 16 processes for up to 2 hours. Those limits were effective at the time of writing. The definitive word can be found at the usage policies page.

Accessing paralleljob

Paralleljob should be on your path by default, and its manual page should be on your MANPATH by default. The best source of information about paralleljob job is its Unix manual page:
rr@BigRed:~> man paralleljob 

Fine print

Paralleljob works only for parallel applications that are "single program multiple data" (i.e., a single binary). It does not work for programs that are "multiple programs multiple data" (i.e., programs that consist of more than one binary).

An appropriate version of mpirun must be on your path or specified by the MPIRUN environment variable. Each flavor of MPI has its own version of mpirun. If you have compiled your parallel application and if the parallel compiler (mpicc or another appropriate compiler) that you used is on your path, then mpirun should already be on your path.

If you need to quote arguments, paralleljob handles only double-quotes. It cannot provide the protection that is usually afforded by single-quotes because the Bourne shell provides no mechanism for escaping characters within strings in single-quotes. Single-quotes are treated as double-quotes by paralleljob.