HPC @ IU Tutorial - Compiling and Running Parallel jobs using MPICH on AVIDD
Setting your environment, using softenv, up to use MPICH instead of LAM
Note: The current default AVIDDsoftenvconfiguration results in use of MPICH!
Setting your environment, using softenv, up to use MPICH instead of LAM
Note: The current default AVIDD softenv configuration results in use of MPICH!
~/.soft file unchanged.
But if you modified your ~/.soft file to use per instructions on this workshop (to use LAM), then edit the file and:
+lam-gm-intel line as well as the @remove mpich-gm-intel
line+mpiexec.ita-lam to just ita.
## @remove +mpich-gm-intel ## +lam-gm-intel ## +ita-lam +mpiexec +ita +totalview ## Possibly more lines before the default @avidd line @avidd
resoft.
There is a file named dot_soft.mpich within the MPI_Tutorial directory (assuming you copied
the example programs from hpc's account) which you could use to update/replace your ~/.soft file.
[agopu@bh2 agopu]$ cd ~/MPI_Tutorial/HelloWorld [agopu@bh2 HelloWorld]$ make clean ; make [agopu@bh2 HelloWorld]$ cd ~/MPI_Tutorial/RoundRobin; make clean ; make [agopu@bh2 RoundRobin]$ cd ~/MPI_Tutorial/ParallelPi; make clean ; make [agopu@bh2 ParallelPi]$ cd ~/MPI_Tutorial/ParallelPi_VT; make clean ; make [agopu@bh2 ParallelPi_VT]$ cd ~/MPI_Tutorial/ParallelDiffusion; make clean ; make [agopu@bh2 ParallelDiffusion]$ cd ~/MPI_Tutorial/MPI_IO; make clean ; make
Important Note:
Makefile to compile your code,
you must note that different Intel Trace Collector includes/libraries as well as MPI includes/libraries
would have been (and should have been) used in the Make process. The change is transparent because of
the use of softenv, but if you look at environment variables, for example $CPATH, you'll
notice that it has different values depending on whether you add MPICH or LAM in your ~/.soft
file.
Once you have your programs compiled, you can use mpiexec to run them. Note that you can
use mpirun as you had done in the case of LAM (without the C parameter)
but using mpiexec has its advantages.
First, grab a couple of interactive nodes from PBS. For example, to get 2 nodes, 2 processors per node, for 30 minutes, you could do:
[agopu@bh2 agopu]$ qsub -I -l nodes=2:ppn=2 -l walltime=30:00
Get, set, go!
[agopu@bc01 agopu]$ cd ~/MPI_Tutorial [agopu@bc01 agopu]$ mpiexec HelloWorld/helloWorld [agopu@bc01 agopu]$ mpiexec RoundRobin/roundRobin [agopu@bc01 agopu]$ mpiexec ParallelPi/parallelPi 2000000 [agopu@bc01 agopu]$ mpiexec ParallelPi_VT/parallelPiVt 2000000 [agopu@bc01 agopu]$ mpiexec ParallelDiffusion/parallelDiffusion 8000 1000 100Assuming /N/gpfs/${USER}/testfile exists...
[agopu@bc01 agopu]$ mpiexec MPI_IO/mpi_io /N/gpfs/${USER}/testfile /N/gpfs/${USER}/testfile2
MPI_Tutorial directory,
named submit_helloWorld.mpich.sh to submit jobs to PBS on a non-interactive basis.
Its content is shown below for your convenience.
#PBS -l nodes=2:ppn=2,walltime=5:00 #PBS -m ae #PBS -N helloW #PBS -j oe #PBS -V ## Assumes you have mpiexec setup in your environment (i.e +mpiexec within ## your ~/.soft file or "soft add +mpiexec" on the command prompt ## And that you've not removed MPICH from the environment mpiexec $HOME/MPI_Tutorial/HelloWorld/helloWorldOnce you have a PBS script, you can submit a batch job using
qsub, the same way as was show in
the Working with Portable Batch Scheduler (PBS) section.
Let us assume you saved your script in a file named submit_helloWorld.sh, then you can do:
[agopu@bh2 agopu]$ qsub ~/MPI_Tutorial/HelloWorld/submit_helloWorld.sh
As explained in various sections on this workshop,
you will need to change the program name and program parameters within the script for use with
different programs. For example, to run the parallelPi code using MPICH, you will have something
like this within the script:
. . . mpiexec $HOME/MPI_Tutorial/ParallelPi/parallelPi 200000 . . .
You can use ITA in the exact same way irrespective of whether you use MPICH or LAM. See the ITA related sections on this workshop for more information on this:
Note: It is assumed you have done the necessary tweaking to the firewall as well as your X settings as explained in the Tweaking Security Settings section of this workshop.
Append a -tv flag to mpiexec to invoke Totalview and debug your programs.
It is assumed you have
done the necessary tweaking to the firewall as well as your X settings as explained in the
Tweaking Security Settings section of this workshop.
For example, if you want to debug the roundRobin program, then you could do this on a compute node:
[agopu@bc81 agopu]$ cd ~/MPI_Tutorial/RoundRobin/ [agopu@bc81 RoundRobin]$ mpiexec -tv roundRobin
Apart from that difference in the style of invocation, everything else you could do with LAM code inside Totalview stays the same. See the TotalView related sections of this workshop for more information:
Important note to MPICH Users about TotalView crash on AVIDD: Please note that there is a bug in the TotalView program which causes TV to crash after a parallel program completes execution ; This happens when TV is run on our existing AVIDD setup with mpiexec; we have contacted the developers of TotalView regarding this and they are working to come up with a fix.
You will still be able to use TV to debug your parallel programs, just that you'll need to do the above step everytime your program completes, to restart it.
| Previous | Up: Table of Contents | Next |
|---|