[2015-05-01 09:40:35 | Rob McNicholas]
The jabber server hung again this morning around 9am. There were some outstanding operating system patches for the Linux host server, so those were installed and the server rebooted. As of around 9:16am, jabber.eecs is again up and allowing connections.
Staff will continue to monitor the server closely and restart it as needed.
[2015-05-01 15:17:32 | Rob McNicholas]
We believe we have identified the problem causing the jabber server to hang. The latest Openfire update included an updated component (Apache MINA) which seems to have a new preference for IPv6. Our server had an autoconfigured IPv6 address but it was not routing IPv6, so Openfire’s IPv6 connection attempts were hanging and the server would eventually run out of resources.
IPv6 has been disabled at the operating system level and at this time the server has remained stable for 90 minutes. There are still some previous threads that are hanging, so one more server restart will happen tonight at 6pm to clear those.
[2015-05-04 16:19:54 | Rob McNicholas]
The change we made last week increased the server’s “time-to-hang” but did not resolve the problem; the server has repeatedly hung today. Numerous reports on Openfire forums indicates this problem is widespread but does not seem to be on the developer’s radar so it is not clear when it will be fixed. We are therefore reverting back to the previous version of Openfire (3.9.3). We will however retain the updated SSL certificate which was installed during the upgrade process.
[2015-05-04 16:45:09 | Rob McNicholas]
The EECS jabber server has been downgraded to the previous version (Openfire 3.9.3) due to stability issues with the version we installed last week (3.10.0).
After restoring the files and database from before the upgrade, the upgraded SSL certificate mentioned in a previous message was reinstalled again. We hope all clients are now reporting secure connections with no SSL warnings. If this is not the case, please send a note to help@eecs letting us know what client you are using and what message you are getting.
Resolved as of 2015-05-04 16:35:00