PDA

View Full Version : [FQuest Alert] UNITY hard drive replacement


Terra
06-17-2003, 03:07 AM
Within the hour, UNITY will be taken offline to replace a failed hard drive in the RAID array...

Downtime should be minimal to perform the procedure...

Our apologies for any inconvenience this may cause...

--
Terra
sysAdmin
FutureQuest

Randall
06-17-2003, 03:18 AM
What's the average lifetime of a FutureQuest hard drive? They seem to blow up on a regular basis -- or is that just because you have lots and lots of them?

:safegrin: Randall

# Thinking that maybe this is a good time to back up the ol' Maxtor

Bradley
06-17-2003, 03:30 AM
You know Terra, he's like a ferret.. he orders a bunch of them "just in case" and drags them off under the raised floor. Then when needed scurries off and gets one and replaces it.
..
Actually, with the pounding those drives take it should be expected for them to fail, I guess thats why he utilizes raid for moments like these so data isnt lost that often. Users PC drives are much less utilized than those servers so that is why we see them last a bit longer, during my time at a local computer shop they seemed to have drives stacked up waiting to replace a dead one on the server cause the stress of all the databases and such caused.

*I beleive that was a somewhat decent answer, didn't explain the lifetime as they vary quite often, Terra does use some of the best drives available (expensive too!) so it doesnt leave much to be expected that one of them will eventally blow*

--
Brad
*beep beep beep* *grumbles* ..not another one :P

Terra
06-17-2003, 03:30 AM
That is not easy to quantify, as we've had some drives running for years... The life of a FutureQuest hard drive is a very hard life indeed, therefore the drive failures are not really out of the ordinary and is to be expected... It is one of the prices you pay for ultra high speed drives that are _constantly_ reading and writing...

We keep plenty of spare drives on tap, and respond to any drive failures swiftly even if that means unscheduled downtime...

--
Terra
sysAdmin
FutureQuest

Jeff
06-17-2003, 03:30 AM
I've said this before too. It only makes me worry since I have the exact same drives in my main non raid system at home and so many seem to only last a year or two :eek:

edit: I thought I was a slow poster earlier in the day, but two posts since I typed this :)

Jeff
06-17-2003, 03:36 AM
1.) Do the mysql or mail servers use IDE drives? It seems like we see lots of the 10K SCSI drives being replaced with new ones, but I havne't seen many reports of replacing the IDE drives? I guess to a certain extent that would be logical since the SCSI drives are used when the load on them is higher, but still it seems like a very high failure rate on the SCSI drives which I'm always telling people they should get.

2.) It also does make me think about running programs at home which 'ping' the hard drive every so often writing to a log, etc... I never thought of that as shortening the life of my drive...

3.) And finally it helps me make a decision about a server I have which doesn't have a raid system - get a raid system by the end of the year :)

Terra
06-17-2003, 03:45 AM
Jeff:
Believe it or not, the MySQL data access patterns are nowhere near the intensity of the Community Servers...

Up until MYSQL06, the drives are IDE for which have had their fair share of problems as well (slow-drive syndrome)...

If I wasn't happy with the performance of our Maxtor SCSI drives, then I might be a bit more perturbed at the failures... However, they are solid performers that can take a serious pounding day in and day out...

In the end, it is just another day in the life of a busy production server...

--
Terra
sysAdmin
FutureQuest

Jeff
06-17-2003, 03:50 AM
Thanks Terra,

It's interesting, the maintenance announcements here only make me worry about systems elsewhere that don't have the same degree of hardware and backups in place... sorry to always drive the threads a little off topic since there isn't any worry at all for those concerned only with their FutureQuest accounts since they are in very very very good hands. But I always like to get a few tidbits of information from "Terra's Hardware" - the best real world advice out there ;)

Terra
06-17-2003, 03:55 AM
Well, if you beat on your drives like we do - then:

1) RAID them
2) Back them up
3) When a failure happens, know beforehand how you are going to fix it...

That is the best real-world advice I can give... :)

--
Terra
sysAdmin
FutureQuest

Terra
06-17-2003, 04:10 AM
The drive replacement on UNITY has just been completed...

Total downtime was less than 5 minutes...

--
Terra

doraevon
06-17-2003, 10:45 AM
Just out of curiosity, why was any downtime required? Are these RAID arrays non-hot-swap arrays? If so, why did you choose this route?

I'm curious because, like other posters, I'm always looking for insight into hardware/software decisions from people who have to deal with large, high-availability situations.

Doug

Arthur
06-17-2003, 10:56 AM
Just out of curiosity, why was any downtime required? Are these RAID arrays non-hot-swap arrays? If so, why did you choose this route? Doug, that question has come up before. See for example this post (http://www.aota.net/forums/showthread.php?postid=86967#post86967) and this post (http://www.aota.net/forums/showthread.php?postid=75020#post75020).

It's a policy that over the years has proven to be the safest and most reliable. Yes, it results in a 'down time' of 4 to 5 minutes, but it guarantees a sane state of the RAID system afterwards.

Arthur

doraevon
06-17-2003, 11:34 AM
Thanks for the links Arthur, they answered my question(s). Now if I had just been less lazy and taken the time to use the search option of the forum, I could have saved you the time of replying... sorry...

Doug