The Big Red Cluster
Outstanding Problems
"Signal 15" errors
MPICH-MX job processes are susceptible to termination due to a lack of available Myrinet ports on one or more nodes allocated to the job. This situation is a result of "orphaned" processes from previous MPICH-MX jobs. Because of existing scheduler policy, which allows a single user to execute multiple jobs on one or more nodes, these orphaned processes are difficult to identify. Administrators continue to work on a solution, which is often manifested in job output files containing the string "Killed by signal 15".




