| Date |
System(s) |
Problem |
| May 1 |
At 6:18pm, the GPFS file system became inaccessible due to a hardware failure. Access was restored at 9:36pm. Running jobs were affected by this outaged |
| Apr 5 |
Myrinet |
At 3:39pm, the myrinet network went down due to a mapper issue. The myrinet network was returned to service at 11:47am on April 6 |
| Apr 5 |
GPFS |
At 3:39pm, the GPFS file system became inaccessible due to a hardware failure. Access was restored at 10:35am on April 6. |
| Feb 4 |
BigRed |
At 11:00pm BigRed was returned to service. Cooling towers were reparied, power maintenance was performed, network upgrades were performed. Regularly scheduled maintenance for 2/5/2008 is canceled. |
| Feb 3 |
BigRed |
At 10:23am on February 3, the BigRed machine room experienced loss of cooling. Admins shutdown the machine at 10:30am. |
| Date |
System(s) |
Problem |
| Nov 28 |
GPFS |
GPFS was inaccessible from 12:34PM until 8:59PM. Running jobs were affected by this event. |
| Nov 16 |
GPFS |
GPFS was inaccessible from 4:59PM until Nov 17 at 2:03am. Running jobs were affected by this event. |
| Nov 1 |
GPFS |
GPFS was inaccessible from ~4pm until 6:20pm. Running jobs were affected by this event. |
| Oct 25 |
GPFS |
GPFS was inaccessible from ~midnight until 12:47am. Running jobs were not affected by this event. |
| Oct 23 |
GPFS |
GPFS was inaccessible from ~11:00pm until Oct 24 at 9:00am. Running jobs were not affected by this event. |
| Oct 20 |
GPFS |
GPFS was inaccessible from 8:47am to 12:25pm. Running jobs were affected by this event. |
| Oct 19 |
GPFS |
GPFS was inaccessible from 5:54pm until Oct 20 at 2:26am. Running jobs were affected by this event. |
| Oct 16 |
GPFS |
GPFS was inaccesible from ~8:00pm until Oct 17 at 12:07am. Running jobs were not affected by this event. |
| Oct 7 |
GPFS |
GPFS was in accessible from ~4:26pm until 10:16pm. Running jobs were not affected by this event. |
| Sep 30 |
Big Red, GPFS |
Power outage, ~6:25am to 3:49pm EDT. All systems were down during
this event. |
| Sep 25 |
Big Red, GPFS |
Power outage, 4:14pm to 10:08pm EDT. All systems were down during
this event. |
| Jul 4 |
Big Red, GPFS |
Power outage, ~9:00am to 2:30pm EDT. All systems were down during
this event. |
| Jun 21 |
GPFS |
Failure of several blades during benchmarking of expansion
hardware resulted in GPFS instability, approximately 3:45pm until
5:50pm. GPFS recovered without a restart, though remounts did occur
on several nodes: some jobs were lost. |
| Apr 27 |
GPFS |
Communication failure during test of a firmware update process
resulted in GPFS instability, approximately noon until 5pm. GPFS
restarted; no data was lost. |
| Apr 17 |
Login nodes |
Network configuration change on campus resulted in a routing
error; access to login nodes was unavailable from approximately 2:30pm
until 3:00pm. |
| Apr 10-11 |
Racks 1-8 |
NFS server issues resulted in hanging NFS mounts (see Apr 6
outage). |
| Apr 6 |
Rack 9 |
10:45am - 3:00pm; NFS server issues resulted in hanging NFS mounts
on the compute blades. |
| Mar 7-10 |
GPFS |
Switch and disk issues resulted in multiple GPFS outages |
| Jan 31 |
GPFS |
Failed disk controller resulted in a GPFS outage from 16:27 until
22:15 EST. No files appear to have been lost during the rebuild
process.
|