Staff are working to restore service.
[2014-07-08 16:28:26 | Lars Rohrbach]
Service for cluster.eecs and inst-fs.eecs is back to normal.
The NetApp filer in SDH that manages backup snapshots of production storage failed over to its High Availability partner last Thursday afternoon, just before the long weekend, due to a failure of the in-room UPS system during a systems self-test. The monthly backup tape dumps (which take several days to finish) were restarted on the cluster partner, causing it to experience unusually high load for several days, which interfered with NFS automounts. After the tape backups finished, several other ongoing snapshot transfers prevented the systems from immediately reverting to normal mode. Both systems have now been switched back to their primary service mode.
We are in the process of repairing/refitting the UPS to prevent similar problems in the future.
Resolved as of 2014-07-08 13:30:00