CE Goes Through Longest Unplanned Downtime - Here's What Happened
CEans,
Yesterday, CrazyEngineers experienced the longest unplanned downtime in the history of CE. To make the matters worse, it also blocked our official email IDs during whole the time. Though no emails were lost, we couldn't receive or send emails to our members informing about the downtime. The aim of this post is to explain what exactly happened -
You might be aware that CrazyEngineers is hosted in the cloud. Our web host wanted to move the instance that hosts CE to a new 'parent' infrastructure. This move usually takes about 5 hours and was therefore scheduled to commence at 1:00 AM IST on Monday. The idea was to get this done when CE experiences the lowest traffic every week.
The server admins began the move at the planned time. Once the automatic process starts, there's nothing the admins can do until the process is over. The monitoring systems indicated that the move was in progress smoothly, but in reality it was way slower than it should be and at a point it got stuck, while the indicators failed to catch it. The admins were under the impression that the move would be over 'very soon' all the time. This went on throughout the day yesterday and finally had to cancel everything. It was discovered that there was a nasty issue that prevented the process.
The whole thing resulted into CE being not accessible to anyone for more than 24 hours. The fixes are being worked out & hopefully we'll be able to execute the operation without any downtime (or short downtime during off hours) in this week. Thank you for your patience.
We usually make sure that CE stays up all the time and throughout past years we've maintained 99.99% uptime; with the recent exception. I hope we'll be able to manage this in a far better way in future.
Thanks to all of those who sent emails & expressed concerns about CE's unavailability. We had to make a few announcements yesterday, which we will make today.
Let the action resume! 👍
Yesterday, CrazyEngineers experienced the longest unplanned downtime in the history of CE. To make the matters worse, it also blocked our official email IDs during whole the time. Though no emails were lost, we couldn't receive or send emails to our members informing about the downtime. The aim of this post is to explain what exactly happened -
You might be aware that CrazyEngineers is hosted in the cloud. Our web host wanted to move the instance that hosts CE to a new 'parent' infrastructure. This move usually takes about 5 hours and was therefore scheduled to commence at 1:00 AM IST on Monday. The idea was to get this done when CE experiences the lowest traffic every week.
The server admins began the move at the planned time. Once the automatic process starts, there's nothing the admins can do until the process is over. The monitoring systems indicated that the move was in progress smoothly, but in reality it was way slower than it should be and at a point it got stuck, while the indicators failed to catch it. The admins were under the impression that the move would be over 'very soon' all the time. This went on throughout the day yesterday and finally had to cancel everything. It was discovered that there was a nasty issue that prevented the process.
The whole thing resulted into CE being not accessible to anyone for more than 24 hours. The fixes are being worked out & hopefully we'll be able to execute the operation without any downtime (or short downtime during off hours) in this week. Thank you for your patience.
We usually make sure that CE stays up all the time and throughout past years we've maintained 99.99% uptime; with the recent exception. I hope we'll be able to manage this in a far better way in future.
Thanks to all of those who sent emails & expressed concerns about CE's unavailability. We had to make a few announcements yesterday, which we will make today.
Let the action resume! 👍
Replies
-
Anoop KumarFinally .. welcome aboard 👍
-
lalAi ai captain! Replenished the supplies. Now let the sail resume!
-
Ankita KatdareIt just felt so empty to not see this page the entire day. 😔
It is GOOD to be back! 😁 Let's roll! -
Kaustubh KatdareIt looks like the troubles aren't over yet. We just had another downtime and had to reboot the server.
We'll work on resolving the issues during off hours today. -
Rehana Thakuri am always waiting for updates on this page
-
Saandeep SreerambatlaPlan an upgrade on Saturday night or Sunday night IST, thats the best time 😀
-
Rehana Thakurthanks to give me quickly reply
-
Kaustubh Katdare
We wont' be hold on for that longer. We're planning for a maintenance update tonight - around 1-2 AM.English-ScaredPlan an upgrade on Saturday night or Sunday night IST, thats the best time 😀 -
Abhishek Rawal
How long will it take ? 😨Kaustubh KatdareWe wont' be hold on for that longer. We're planning for a maintenance update tonight - around 1-2 AM. -
Kaustubh Katdare
Less than one hour.Abhishek RawalHow long will it take ? 😨 -
Abhishek Rawal
That's good news for me.Kaustubh KatdareLess than one hour. -
Sanyam KhuranaOh, finally CE is back..!!
I was worried, where the CE gone yesterday, and how it's back....
Yeah..😁 -
CE DesignerWas much damage done as a result of this downtime, besides all the broken hearts? Haha, its funny, I realised yesterday that CE has become such a part of my daily routine. It threw me of a bit yesterday.
-
Anoop MathewThanking all admins of CE for their combined efforts to stabilize the issue.
Thanks #-Link-Snipped-# for quick response on Facebook. -
Abhishek RawalOh gawd!
It took 2:12 AM to 7:40 AM for maintenance today.I was waiting entire night.
Nevermind, CE is live !
Crazyengineers, Let's hustle -
Kaustubh KatdareYes, it again took longer than expected. While the server is up, error fixing process is still running at the background.
I expect another scheduled maintenance sometime around weekend, late night hours. Will post an update if the downtime is required. -
Kaustubh KatdareUpdate: We had another ~4 hours of downtime this morning (IST). The load on server spiked and services went down. Once we got the server up, repairing the databases took extra time.
I thank all CEans for their patience. We're now working to find out the causes of the server failures and fix them. I'll keep everyone posted through this thread.
You are reading an archived discussion.
Related Posts
The war between Facebook’s Graph Search and the ones on Google, Yelp, and LinkedIn with people-powered, connections-based search is a BIG one.
For the past decade or two, Google has...
Dear All,
Can anybody tell me who is the supplier OR manufacturer of Single Phase VFD?
We've a small goodie for all our visitors (and existing members, as well). You may now register on CrazyEngineers with a click of a button - using FB, Twitter &...
When the Indian cheap smartphone market is tumbling and tolling with a new entry every week, this time around, we have an Android Ice Cream Sandwich running - Karbonn A25....
Sir,I am a computer engineer graduate but in future i won't be able to do this job due to family problems.So, please suggest me which job is best for me...