CE Goes Through Longest Unplanned Downtime - Here's What Happened
Yesterday, CrazyEngineers experienced the longest unplanned downtime in the history of CE. To make the matters worse, it also blocked our official email IDs during whole the time. Though no emails were lost, we couldn't receive or send emails to our members informing about the downtime. The aim of this post is to explain what exactly happened -
You might be aware that CrazyEngineers is hosted in the cloud. Our web host wanted to move the instance that hosts CE to a new 'parent' infrastructure. This move usually takes about 5 hours and was therefore scheduled to commence at 1:00 AM IST on Monday. The idea was to get this done when CE experiences the lowest traffic every week.
The server admins began the move at the planned time. Once the automatic process starts, there's nothing the admins can do until the process is over. The monitoring systems indicated that the move was in progress smoothly, but in reality it was way slower than it should be and at a point it got stuck, while the indicators failed to catch it. The admins were under the impression that the move would be over 'very soon' all the time. This went on throughout the day yesterday and finally had to cancel everything. It was discovered that there was a nasty issue that prevented the process.
The whole thing resulted into CE being not accessible to anyone for more than 24 hours. The fixes are being worked out & hopefully we'll be able to execute the operation without any downtime (or short downtime during off hours) in this week. Thank you for your patience.
We usually make sure that CE stays up all the time and throughout past years we've maintained 99.99% uptime; with the recent exception. I hope we'll be able to manage this in a far better way in future.
Thanks to all of those who sent emails & expressed concerns about CE's unavailability. We had to make a few announcements yesterday, which we will make today.
Let the action resume! 👍