View Full Version : [FQuest Alert] Router Crash
Terra
02-23-2007, 05:12 PM
We suffered a major core router crash that took down our route announcements to the outside world...
I have placed the routers into an emergency dogleg configuration that has temporarily fixed the problem...
I am currently working on a permanent solution to this condition and there may be intermediate hiccups while our routes converge outside of our network...
Our sincerest apologies for any inconvenience this network outage has caused...
I will follow up once the work has been completed....
--
Terra
sysAdmin
FutureQuest
JRepici
02-23-2007, 05:34 PM
"You had a bad day"?
Paula, Simon, and Randy have NO idea :)
Terra
02-23-2007, 05:49 PM
Eeesh, bad day in the networking realm is right...
First, I find out early this morning that Level3 nuked our advertisements due to a problem with the RADB (radb.net) records... This just caused a minor hiccup as the other two upstreams took over gracefully...
This one was not so nice... For a moment, I was thinking that it was another RADB problem, and all upstreams filtered out our advertisements...
The problem was (quickly) found that our core HUGGIN router BGP session crashed and created an unrecoverable ripple to the other routers causing our announcements to be dropped at next refresh... This would not have been so bad, if the route preferences weren't weighted in such a way that all outbound was heading out the Internap uplink... This is where the dogleg emergency router config comes into play, as it resets all the route prefs to equalize the pipes when only two routers are working... Normally HUGGIN handles this route path equalization, but when HUGGIN is down, the two other routers must fend for themselves (much much more complicated and problematic)... Unfortunately, this took longer than anticipated as my own link into the router was saturated slowing down the command input...
I will be studying this particular condition to see if I can break the two frontside router's preference interdependence and shift it back deeper into our routing core... This way if HUGGIN crashes, the two frontside routers will be much simpler (and faster) to throw into the dogleg configuration...
--
Terra
sysAdmin
FutureQuest
Andilinks
02-23-2007, 10:23 PM
Thank you for your efforts Terra and thank you for the hundreds and hundreds of good days that we've had. :)
vBulletin® v3.6.8, Copyright ©2000-2012, Jelsoft Enterprises Ltd.