IMAP access is currently off while staff are trying to resolve the issue. IMAP service will be restored as soon as possible.
Several background IMAP processes have been crashing resulting in the observed behavior. Currently the IMAP server is being patched.
[2008-10-13 18:12:14 | Eric Fraser]
The EECS IMAP Server appears to be functioning properly once again.
Symptoms that were observed during the failure:
The store daemon (stored) on the IMAP server wasn’t able to complete its indexing tasks before timing out. This caused the watcher process to think that something was wrong with stored causing it to restart. During this continual restart process of stored, the imap server was not able to move mail around, ie. if you were to send a message to the Trash, it would hang and/or fail.
The following items were completed: * Patched the Messaging server.
* Patched the OS (Recommended cluster) of both High-Availability nodes.
* Rebooted and restarted the systems and messaging server.
After the combination of these tasks, the IMAP server began behaving correctly. Currently the various mail queues are clearing out. Mail that was delivered today will trickle into mailboxes for the next hour or so.
[2008-10-14 09:22:39 | Vadim Kogan]
The current issues are a bit different – reconstruct dies & watcher restarts stored & imapd. This apears to lead
to slow access to messages and slowness when moving
messages to other folders including Trash.
Staff is working on resolving these issues.
[2008-10-20 14:49:44 | Michael G Bordua]
The problems with the server daemons stopping and restarting have been resolved.
There is still an issue with temporary slowness moving mail.
Staff is working to resolve this issue.
[2008-10-24 11:51:28 | Helpdesk]
On Friday, Oct 24, at around 11:30am, IMAP access (via both webmail and personal clients) became unavailable. This is being looked into.
[2008-10-24 12:14:17 | Helpdesk]
As of about 12pm, MUA access to IMAP has resumed and while the new web interface, https://webmail.eecs.berkeley.edu/, is still not working, the old web interface, https://imap.eecs.berkeley.edu, is working.
[2008-10-25 10:38:48 | Michael G Bordua]
https://webmail.eecs.berkeley.edu is back up.
The problem was some configuration files got corrupted.
The webmail instance was re-installed because of this.
[2008-10-30 11:55:35 | Helpdesk]
There remains an issue with temporary slowness moving mail. E.g., messages take a while to be moved to the trash folder or deleted.
Staff is still working to resolve this issue.
[2008-10-31 13:04:30 | Fred A Archibald]
Since about 11-11:30 AM today, no new mail is available from IMAP. IDSG staff are currently working on the problem.
[2008-10-31 18:11:46 | Vadim Kogan]
IMAP should be better now. We will continue to monitor it, and may restart it a number of times over the weekend.
[2008-11-03 11:09:26 | Jon Kuroda]
The IMAP service continues to be unreliable with intermittent disconnections and delays during login, retrieval of messages, and deleting messages.
IMAP service is currently off, and staff are working to restore access to e-mail service.
[2008-11-03 13:50:57 | Helpdesk]
IMAP users are continuing to experience intermittent delays. Staff are currently investigating the source of the problem, and looking to repair the service.
[2008-11-03 15:58:13 | Vadim Kogan]
Solaris kernel software issues. Some have been fixed in the release that just came out, but some are not fixed yet. Ways to work around or at least reduce the impact of the problem are currently being worked on.
[2008-11-04 14:41:40 | Eric Fraser]
As you are aware, there have been reoccurring problems with the EECS IMAP server for the past three weeks. This has resulted in slow performance of routine tasks such as moving email to folders, outgoing mail copies being placed in the Sent folder, and downloading of messages from the IMAP server to the mail client. There have also been times such as last Friday where one could not contact the IMAP server at all for the greater part of the day.
Information about the identity of the problem has been scarce on the IRIS website due to the fact that it has been an unusually difficult bug to trace. The setup and configuration that has caused so much trouble has been in production in the same state for almost a year. Additionally, this is a vendor supported platform with proven production service in sites much larger than our own. Finally, little to no clues have been found in any of the logs.
Nevertheless, we believe that we have identified the problem to be linked to a fragmented zfs filesystem causing the slow performance of the background IMAP processes that handle folder operations. The problem is unusual because we are nowhere near capacity of the filesystem, which is when you would expect to see these types of problems.
Below are the steps being taken now to alleviate the problem:
Tuning adjustment of OS and IMAP software to lessen the impact the current problems.
Installing and configuring a new IMAP server on more powerful hardware with the latest version of Solaris and zfs. As soon as this is running, we will be migrating users to the new hardware.
Prior to migration, we will be upgrading the version of the IMAP software on the current system. This will allow the migration to be done in a less intrusive manner.
The software upgrade process will take place tomorrow, November 5, at 8:00pm and will last one hour.
Resolved as of 2008-11-10 15:05:00