TREE-PUZZLE on Big Red

TREE-PUZZLE builds evolutionary trees from molecular sequence data using maximum-likelihood methods. On Big Red, TREE-PUZZLE is installed in /N/soft/`whatami`/tree-puzzle-5.2. Documentation is available in /N/soft/`whatami`/tree-puzzle-5.2/doc, and it can be read on-line at http://www.tree-puzzle.de/manual.html. Both the single-process (serial) version of TREE-PUZZLE and the multi-process version of TREE-PUZZLE are installed. The single-process version is puzzle, and you can place it on your path using the command

	soft add +tree-puzzle

The multi-process version can be run using a script called ppuzzlejob, which should be available from the command line by default. The script submits a job to the batch queue. This page documents use of the multi-process version. Hereafter, all references to TREE-PUZZLE are to the parallel version.

General information about submitting jobs

The script ppuzzlejob submits a job to the job queue of Big Red. It should report that your job has been submitted. When your job has finished, you will receive mail. You can check on the status of your job by running the command

	llq -u your_user_id

Using default options

If you are satisfied with using 4 processes for up to 2 hours and with TREE-PUZZLE's default options, you can run TREE-PUZZLE by changing into the directory that contains your data file, and running the command

	ppuzzlejob my_data_file

where my_data_file is the name of the file that contains your data.

Using more than 4 processes

Use the -p option to specify the number of processes that you would like to use.

	ppuzzlejob -p num_of_procs my_data_file

Replace num_of_procs with the number of processes that you want and my_data_file with the name of your data file. For example, to use 64 processes on a file named globin.a, run the command

	ppuzzle -p 64 globin.a

When specifying processes, use a multiple of 4. If you do not, your request will be replaced with the multiple of 4 that is just larger than your request. Doubling the number of processes halves execution time up to at least 12 processes in published results (Schmidt et al. Bioinformatics 18:502-504, 2002). The degree to which TREE-PUZZLE scales beyond 12 processes is unknown, although TREE-PUZZLE probably scales quite well given the nature of the problem that it solves. The maximum number of processes that you can request is 128, in the queue to which the job is submitted. Another queue is available that supports more processes (see Unix manual page for ppuzzlejob).

Running for more than 2 hours

Jobs are allowed to run for only 2 hours unless you request more time. You can request at most 336 hr (14 days) from the default queue to which ppuzzlejob submits jobs. (Other queues are available that allow less time - see Unix manual page for ppuzzlejob for details.)

Use the -wallhours option to request more time in integer hours. For example, to run the same job as above for 42 hours, you would run the command

	ppuzzle -p 64 globin.a -wallhours 42

Specifying options to TREE-PUZZLE

TREE-PUZZLE accepts options from a file that is separate from your data file. The options file contains two lines per option: the first contains the option name and the second the value. The file ends with a "y" to signal the end of options. For example, to set the value of option "t" to 10

	t
	10
	y

would be in a file (without leading spaces in each line).

When running ppuzzlejob, specify the option file using the -f option

	ppuzzlejob -f optfile datafile

For example, to run TREE-PUZZLE with an option file named globin.opts, a data file named globin.dat and 32 processes, you would run the command

	ppuzzlejob -p 32 -f globin.opts globin.dat

See TREE-PUZZLE documentation for the options and their meanings.