PDA

View Full Version : [FQuest Alert] HC01 Crash


Terra
02-01-2004, 03:48 AM
At 2:32am this morning, the HC01 server crashed due to a failed hard drive... Normally the server would proceed to work properly, but it appears this time the loss of the swap partition on the failed drive was enough to take the server down... 9.9/10 times, Linux will cope with this gracefully, but this particular time it caused several critical system programs to crash to where it could not recover...

Repairs are underway and we should have the server back online within the hour...

Our sincerest apologies for the inconvenience this has caused...

--
Terra
sysAdmin
FutureQuest

Terra
02-01-2004, 04:59 AM
HC01 is now back in full production and all seems to be running well...

--
Terra
sysAdmin
FutureQuest

Randall
02-01-2004, 07:05 PM
At 2:32am this morning And within 16 minutes you'd discovered the problem, started fixing it and even taken a moment to tell us about it?

Good grief. Any other host I've known would be hearing about it from their customers on Monday. ("Uh, is there a problem with our web site?")

:QTthumb:

Randall

MPaul
02-02-2004, 03:41 PM
the HC01 server crashed due to a failed hard drive
When a hard drive fails, doesn't data become corrupted, or even get lost?

Kevin
02-02-2004, 03:46 PM
Originally posted by MPaul:
When a hard drive fails, doesn't data become corrupted, or even get lost?
Not if the hard drive is mirrored by a second hard drive which ours are.

The reason that the server itself failed is that it was the primary drive that failed instead of the secondary drive. The swap space on the primary drive is used before the swap space on the secondary drive and once that swap space became unreadable the system crashed. Because the primary drive is also the one that the BIOS is configured to boot from the system could not come back up on its own.

--
Kevin