PDA

View Full Version : [FQuest Alert] HC02 Server


sheila
03-24-2007, 02:10 PM
A few minutes ago the HC02 server went down and then rebooted itself.

Unfortunately, our logs show no indication at this time of the cause for this incident.

Downtime was extremely brief.

Event start: 1:56 PM ET
Event end: 1:58 PM ET
Duration: 2 minutes

Our apologies for any inconvenience,

sheila
03-24-2007, 03:50 PM
A few minutes ago there was a repeat event. HC02 was again down for just 2 minutes, and rebooted itself.

Event start: 3:58 PM ET
Event end: 4:00 PM ET
Duration: 2 minutes

We are actively monitoring the console at this time, to try and get any clues as to what may be causing these brief outages.

Our apologies for any inconvenience,

Buck
03-24-2007, 04:13 PM
Maybe it's Spring allergies? ;)

sheila
03-24-2007, 04:20 PM
It certainly seems that way. It just did it again. Oh, and just again now. Bah.

We have just replaced the kernel with the previous one, so this latest reboot should swap in the old kernel and we will see if that resolves the matter. We are not getting any clues from the console.

Previous event:
Event start: 4:02 PM ET
Event end: 4:04 PM ET

Current event:
Event start: 4:16 PM ET
Event end: 4:18 PM ET

Total downtime for both events: 4 minutes


Another possibility that we are considering, is that this might be a hardware issue.

The old kernel is back in place. We will look to see if this resolves the matter...

Our sincere apologies for these repeated events,

sheila
03-24-2007, 04:34 PM
All right, we just had another event. Last straw.

We are dispatching a tech to the data center now for a full hardwarde swap. This will start in about 30 minutes.

Event Start: 4:28 PM ET
Event End: 4:31 PM ET
Duration: 3 minutes

Hardware swap to begin in about 30 minutes...
We will post more information as it becomes available.

sheila
03-24-2007, 05:08 PM
Our tech is currently at the data center and HCO2 has just been taken down for the hardware swap.

Additional updates will follow...

sheila
03-24-2007, 05:29 PM
The hardware swap is now complete and the HC02 server has been returned to full service.

We will be continuing to monitor the server, of course, but we hope that this will resolve the sudden restart issues the server was exhibiting earlier today.

Kevin
03-24-2007, 06:58 PM
If anyone is curious it appears the system had a bad power supply. It is possible that it was the motherboard but I doubt it.

The system was rebooting for no apparent reason and without logging anything. We believed it was a kernel issue or an overload of some kind so I started adding more debugging. After it happened a few times I realized the system was actually powering off and on which of course doesn't log anything. After swapping out the entire system except for the hard drives I played with the old box again and it is powering on and off at random but it is now mostly off instead of mostly on.

I have never actually seen a box do that. Usually they are either working or dead when the PSU or mobo goes.

Jeff
03-24-2007, 08:43 PM
Not to put you on the spot, but what brand of power supply and motherboard?

Kevin
03-25-2007, 10:26 AM
In that particular case it was an Asus A8V motherboard and an Antec SmartPower 450.

Bradley
03-26-2007, 02:27 AM
In that particular case it was an Asus A8V motherboard and an Antec SmartPower 450.

I've got the A8V Deluxe in one (http://valid.x86-secret.com/show_oc?id=181234) of my systems, it's worked great since I got it back in 2004. Same with the Antec PS that powers it as well. :rasberry:

Sorry to hear about the HC allergies lately though.

Kevin
03-26-2007, 09:48 AM
Actually, I have one at home as well. We bought a few of them for servers until Asus started making versions with less PCI slots in exchange for integrated video.