Last evening about 21:26 there was a power failure in the main EECS server room in Soda Hall. The UPS held up some systems until about 22:00 at which time staff shut the room down completely. Many EECS services were disrupted as a result. The problem was found to be in the UPS which fried the high voltage output terminal block. The terminal block was bypassed and the UPS has been removed from service until further notice. Most EECS computing services have been restored as of 03:15 this morning with the exception of the IRIS web server, which was restored around 10:15am Sunday.
Thanks to the staff who came in early Sunday morning to deal with this issue.
The departmental mail server (imap.eecs) and LDAP servers should now be working mostly properly. However, there are a few unresolved technical problems. You may encounter SSL certificate validation errors on our LDAP server (ldap.eecs and ldap.cs). As such, it may not be completely reliable for the remainder of today, and possibly across this weekend as well. We are actively working on solving the underlying problem, and will post an update as soon as we can.
No incoming or outgoing email was lost.
Technical background: Earlier today, the SSL certificates on our LDAP servers expired. This caused the LDAP daemon on ldap.eecs to hang, and refuse new connections. This cacscaded into a failure of part of IMAP system, and the IMAP daemon on imap.eecs similarly hung. We’ve managed to restart the daemons on both servers. Unfortunately, we’re running into a problem on the LDAP server, and are unable to install new, un-expired SSL certificates. As a result, we’re not completely confident that our LDAP servers won’t crash again, and take the IMAP server with it. Again, we’re continuously working on trying to fix this. [Read more…] about LDAP Server Issues