|
FQuest Notice: 4th Generation servers
<<Copied from the NINE down thread, as it may have been overlooked... It was in response to the server stability issues we have been plagued with on SIX, and now NINE>>
<<2 minor edits>>
We have new servers (CS/DS/HC/DAS/SAS) on the way...
I am also beginning work on Clustered server cells (3 per group/ 1 Front-End (w/HA crossover)/ 2 Back-End), that should almost guarantee 99.99999999% server uptime (not counting outages outside of our control)...
We were asked a few days ago about server redundancy, well the '... not yet' is getting closer...
I cannot discuss the design details at the moment, but can say that this has been months in the works and Conceptual phase is complete...
We do know where the problems with SIX and NINE are, but cannot do anything about it right now... The 'halts' are buried deep in the Linux 2.2.x SMP core, and there is no rhyme or reason why/when the servers lockup... Most of the time, the servers are not even loaded when this happens... I've seen it halt when load average was at 0.60 *and less*... Grrrr!
SIX and NINE (3rd Generation) are highly dependent on the 2.2.x Kernel series, and I only wish I could backpeddle and bring online the 2.0.x series again... Nevertheless, we must move forward to gain the Cluster designs, which also requires 2.2.x...
Over the next two months, we are going to be reorganizing the entire server structure to begin swithover to 4th Generation design... I fired up a new DS server tonight that will be handling the DNS... In two weeks, the first of our DAS/SAS (Dedicated/Shared Application Servers) will go online handling services like MySQL, QMail, DNS, Real Audio, etc... This will free up resources on the community servers to where they are Apache engines only, and all CGI will be rerouted to Dedicated CGI engines for processing...
As you can see, all of this cannot be afforded or built overnight, but we are committing *everything* we have to bringing a new era of stability to our name...
In Conclusion: The problems with SIX and NINE should be solved very soon as I now have history with NINE server and have observed it following the same pattern that SIX did... What it takes to solve it is more servers and breaking each server down to a single CPU (until Linux gets it's act together and bulletproofs the SMP core)... Tonight was the first step by bringing online a new single CPU server with Linux 2.2.11, which I will be observing *very* closely...
Bottom Line: Linux 2.2.x + SMP + Dual CPU's == BIG Mistake!
--
Terra
--Is also very weary of the Catch-22 situation!--
FutureQuest
|