Recent Downtime
At approximently 7:30AM PST, on June 22nd, our primary SAN controller experienced a catastrophic failure, bringing it offline. This controller was the primary controller for most of the database nodes, and most of the web nodes. Redundant systems failed to come online, even though they had reported as 'ready' before the primary systems failed.
After replacing the failed controller, it begain booting and copying it's configuration from it's peer server. Unfortunately, as soon as the configuration was copied, the secondary controller also died.
After replacing the second failed controller, we began powering the servers that reiled on the SAN for their data - all of the database servers and the network-attached storage (NAS) file servers that store all the media, static content, and most of our web files as well.
This process only took a few hours. The main delay was an extended period (24 hours) of checking the continuty of the data on the disks. This was the majority of the downtime experienced.
As we started pulling our servers online after check was complete, we noticed an issue with the firmware's on the new SAN controllers. The newer firmware versions were conflicting with the storage array, and thus, the controllers couldn't talk to the disks.
The manufacturer told us that this was a known issue, and provided us with a method of repairing it. We then backed up all the data again, and proceeded to apply the firmware patch.
After the patch, we were able to restore the drives, and start booting up the critical systems, followed by the non-critical systems.
We can assure you that at no time during the hardware failure was any of your personal information compromised. We take the sacred trust you put in us with your information VERY seriously.
Thank You
We realize that you rely on the forums as part of your Minecraft experience. Once again, we sincerely apologize for the downtime and hope you'll continue to enjoy the Minecraft Forums.
The Minecraft Forums Team
Stuff happens. It is called real life and if you had bothered to actually read the OP you would realize that there were multiple failures including stuff that was completely beyond their control. More to the point, once the whole network of servers went down, they didn't dare simply flip the switch to put everything back on right away.
I, for one, am very glad that they took their time to do it right. Had they not done so, posts would have been lost or much more bitching about how various parts of the forums weren't working or pages missing from the wiki and all sorts of other potential problems could have happened.
The only part of the summary that I find a little bit offensive is the claim it only took 24 hours to fix the problems. The site has been going up and down for much longer than just one day. Futhermore, I don't really understand why other parts of the "CURSE Network" came up but the Minecraft stuff didn't. I'm sure some of that is due to the size of the Minecraft community and because so much of the community is "officially" using these servers. Just be honest with us, and don't try to sugar coat what must have been a network technician's worst nightmare come true. Don't try to downplay the extent of the problems.
Also for the kiddies to note, this was also a financial disaster in terms of not just having to replace equipment (which costs $$$) and a whole bunch over technician overtime (even more $$$), but it also cost advertising revenue (still more $$$) not to mention a loss of goodwill (where many in the community may go elsewhere). I'm sure these guys are completely aware of that, but give them a break and don't go rubbing it into their face.
For those who think that CURSE "owes" you something because you paid for Minecraft, that is pure and utter BS on your part. You paid for the game, not forum access. Did anybody ever have problems actually playing the game during this time period? I thought not. Your "pre-paying owner of Minecraft" has nothing to do with this whole thing so crawl back under your rock and ditch that holier than thou attitude. Mojang did not promise you anything in terms of forum access when you purchased this game.
You can "Ditch CURSE" if you care, because there are other forum communities out there. Look them up, and I for one would be glad if somebody who is so petty like this would leave the "community" here on these forums on CURSE. Don't let the door slam you in the behind as you leave. I'm not going to miss people with an attitude like this.
hey i just downloaded minecraft and was wanting to get better texture how do look up what mindcraft im on 1.6 or 1.5 where can i find this info
~StoneIronPIG! AKA,P0rtal
_
/ \
|
| + /
|------
\
and yes the LEFT arrow looks like a tilted Pick-Axe
After doing all of that we were finally able to get our Linux servers access to their data and have them running again.
It wasn't as simple as just having a backup, it wasn't as simple as just copying files. I know from the outside it may seem that way but if it was we would've had the site up long ago.
The 24 hour figure was the 24 hours of Windows running check disk on a many terabyte RAID 50 array.(This was one portion of the disks. See Xtek's post above for the full time.)
After about 48 hours we were able to bring the Windows .NET sites online. They were able to mount their volumes from the SAN. The bug listed was only affecting our Linux based sites which includes MCF.
There are three of our top technicians taking a very long sleep out in Atlanta right now after what certainly was a technician's worst nightmare. Plus the five or so hardware company representatives that were there or on the phone.
I have been awake for nearly 24 hours now.
No, they had nothing to do with this. This was entirely a hardware failure.
It's just a messy, annoying, and ironic failure. There's nothing anyone could have done to predict it, and they acted as fast as they possibly could to get the site back.
Also, realize this could have ended a lot worse. Say the HDD array failed so bad that we lost data. That would be bad, and chances are, we'd be down significantly longer than we were.
Also, you could still play Minecraft while the forums were down. You paid for the game, not unrestricted access to the forums and website.
1. Unlikely. Hardware is, though, prone to malfunctioning as well as firmware and other software.
2. Because, if you read the post, the SAN server malfunctioned, backup launched and malfunctioned, then so did the support NAS servers. there was no way they could've hosted it.
3. Depends on the issue, as this time it had much to do with the firmware provided by the provider of said hardware.
4. Most probably not.
5. Curse is, after all, not a company with millions of dollars and hundreds of employees. It's simply financially impossible to do such a thing.
6. Unless they reroute all of their structure, no.
I don't have anything to say to this. Mojang has no affiliation with the MinecraftForums. They were origininally set up by Quatroking, as well as the wiki. Curse provides quite a relief for the hosting, and it can handle a LOT more traffic now. If you were here in the early stages of launching you would've known how much better the handling is now. I suggest you actually try to host a site with thousands of visitors daily, and then for 24 sites. Think before you put down **** about a "company" That is willing to provide hosting services for a forum for a game, and incorporating it into their network. This is the first time such a thing has happened.
No, lulzsec has other things on their hands as far as I'm concerned.
Actually play minecraft/APB/whatever (Play APB.)
Thank god you guys are super computer geniuses.