Log in

View Full Version : Downtime: The perfect storm



Vercetti
2013-05-06, 22:19
Hi all,

The past few days, we have witnessed the perfect storm.
Website down, TS down, Gameservers (BF3/Minecraft) down. This was caused by various reasons:

Website down:
The website hoster moved the websites to new hardware and wanted to use new software, to speed up the website on their boxes. This caused some instability (don't ask me how/why). However something went wrong last night, and they had to perform an update and reload all the operating systems and god knows what. Then somehow at the datacenter something went wrong, and it took the whole night + today to fix it. This all could be true, but I think it's a pile of bullshit. Anyways, the website is up again, and all services are getting restored.

TS / Minecraft server:
- Because the DNS are configured at the hosters, all servers (like TS and minecraft), which were approached by xx.oldguys.eu, were not reachable. Therefore TS seemed down, however a direct IP connection was possible.
- Recently we made a backup of the Teamspeak, mirroring the main Teamspeak. However this caused us to go over the 512 slots limit, shutting down the backup Teamspeak and affecting the main Teamspeak, not starting automatically anymore (getting shut down every hour). The automatic startup script worked for the TS3 server program, however it did not start the virtual TS server after this. This has been changed.
- The VPS on which the TS was hosted on, was on a Linux box. This box became very unstable recently for no reason. We contacted the hoster, Â*after which they ‘fixed’ it……

Battlefield 3 servers:
- EA had a major issue which caused all servers above 30/32 slots to crash on a regular basis. Therefore these servers are very unstable.
- The dedibox at Clanhost, hosting all the servers, required a restart (all servers were invisible). The hoster first blamed this on the EA issue, however this was a hardware/software related issue. The restart fixed it today.

Summary:
We had a perfect storm which we were not adequately equipped for.

Actions taken to prevent this:

General:
We are loyal to our suppliers, with which we want to solve issues when they occur. We have a long lasting relationship with our current webhost (3 years) and over a year now with our Gameserver and Teamspeak hoster. However these are non performing, which causes us to review our services held at those companies. We are going to completely overhaul our infrastructure, which will be a huge operation the coming months:

Website:
- A new website engine will be implemented (Vbulletin), which will also be optimized for mobile access. The theme was already being designed, however today we obtained the license. The website will go live when it is finished, which we aim to be before our Gamescom visit. However this is not certain, we are working hard to fix this. We will try to move all user accounts, however all passwords and posts will probably get lost. This engine will provide a better stability and possibilities.
- We will switch hosters in the near future, where we have more control over the services and will get better support. This will happen after the new website is completed.

Teamspeak:
- We are moving the TS to a stable Windows server 2008 VPS box, and we will have a backup running on a Windows Server 2008 box as well. We already had this one up, however the rights need to be sorted. This server will have the same ip (ts.oldguys.eu), but will not require a port number. We will update you all when this move is effective (probably tomorrow morning).
- We will have the main and backup Teamspeak to be approachable by 2 easy to remember domain names: ts.oldguys.eu and ts.oldguys.org for the main ts, ts2.oldguys.eu and ts.oldguys.org for the backup teamspeak. This is already effective.

Gameservers (BF3):
The hoster (Clanhost) is non-performing and also the support is not as they promised. We have been with them for over a year now, however we renewed our contract for 6 months 2 weeks ago. But after the recent downtime, this will lead to a termination of the services after this period, when no changes in the services occur.

We have failed you to provide the quality which we stand for, however we will put everything in motion now to fix this.

Shadowfox
2013-05-06, 22:31
http://hikingartist.files.wordpress.com/2012/01/it-slave.jpg

Eindbaas
2013-05-07, 14:05
Thanks for this detailed post, good to see you guys are doing everything to prevent this from ever happening again.

fergushk
2013-05-07, 23:10
Thanks for the update. I would bill the supplier for the downtime, you are probably covered.

PsychoEMT
2013-05-09, 22:51
Greedy assed non-support giving gamehost mofo's. ÂÂ*If they can't provide quality service, then I agree with you, leave their collective asses.

Tabernac
2013-05-12, 12:40
had my fingers burnt with them before... before i moved to G-Portal.. never looked back their servers are reliable and the customer service is great..

Also noticed that the hit detection on their servers are great.. might help with Janbos heart rate on sunday mornings :)