PDA

View Full Version : FQuest Alert: "too many open files"


Terra
12-25-1998, 04:47 AM
Problem: Too many Apache LogFiles open...
Solution: Finalize coding on 'Stage3' methods...

This error may popup occasionaly in your log files... "too many open files"

It is a situation that I am aware of and taking the necessary corrective steps to resolve it...

Over a month ago - I instigated 'Stage2' logging techniques which would split and rotate the logfiles nightly to facilitate running STATS on them... We have quickly outgrown this method, and I'm having to redesign this sub-system... 'Stage2' was a quick fix to solve several problems - yet now the problems it has solved, has worked great - but now 'Stage2' has become a problem in itself...

In order to preserve system stability - I am considering pulling this project temporarily and going back to pre-Stage2 until Stage3 is complete...

Stage3 is a centralized, forking background daemon that Apache will pipe the 'entire' logs to - and it in effect will spawn children - as needed - to handle the sheer bulk of the logs (over 400megs a day)... This will be a daunting task and utmost caution is required to ensure 100% integrity...

The task ahead of me is not easy to explain, other than this needs to be done 'VERY' soon - or my life is going to become very difficult...

I have already made custom adjustments to the Linux kernel to increase the per-process open filehandles - and also system-wide open filehandles... But now I have to focus on not using this bandaid (not 100% foolproof) and get down to business and attack the problem at it's core...

I hope the above somehow made sense - the task involved runs very deep to the OS core operation, and much effort will go into improving our overall stability and capability...

--
Andrew Gillespie
Systems Administrator
FutureQuest.net

------------------
www.FutureQuest.net (http://www.FutureQuest.net)
--The best way to predict the future is by inventing it--

Terra
12-25-1998, 06:15 AM
Update:

12/25 5:24am: Increased the Linux OS kernel to 4095 filehandles...
12/25 6:08am: Recompiled Apache server increasing it's limits to 900 filehandles from previous 256...

(On a positive note, the new Apache is *heavily* Pentium-II optimized, so the response from the Apache pool should increase by 6% - 10% overall)

The above 2 steps should provide some breathing room...

I will be working today on finding domains that are no longer eligible for Stage2 logging... The criteria is that a domain must generate at least 2meg in logfiles a day... To facilitate - I am having to rewrite core resolving routines and my scripts that drive the STATs resolving to handle this new 3rd split...

In the end Stage3 should scale *much* more easily than Stage2 did...

SideNote: The ultimate solution to all this should happen around mid-Feb / early-Mar, as we are preparing to upgrade our network... With that upgrade comes 2 C-Blocks of IP's... That is going to give me an extreme amount of flexibility in building the backend server farm... At this point - I have to work within very strict limitations, but the overall result of that is a leaner/meaner highly tweaked/customized server, that has been tuned to run optimal with the least amount of resources available... Sometimes having to do things within the most difficult constraints, results in a well though-out underlying design that will grow and scale with the years to come...

--
Terra
sysAdmin
FutureQuest

[This message has been edited by ccTech (edited 12-25-98).]

hearts
12-25-1998, 02:34 PM
huh?

Terra
12-25-1998, 04:13 PM
In other words, I'm fixing something that has the potential of breaking...

The logging subsystem is being redesigned as it is starting to become overloaded...

Some have been affected by it - especially heavy CGI users...

Just a heads up that I'm aware of the problem and have been working all day on correcting the situation...

--
Terra
sysAdmin
FutureQuest

Del
12-25-1998, 10:26 PM
What he's saying is that you've finally come across a hosting company that starts working on fixing stuff before it actually breaks http://www.aota.net/ubb/smile.gif Yet another reason I love it here!



------------------
Del
http://www.downinit.com/

queticon
12-25-1998, 11:45 PM
Well I guess I did make the right choice when I came to futurequest. Thank you guys

------------------
Dan Garwood
Trustmark Associates, Inc.
Senior Leasing Officer
www.trustmarkleasing.com
dgarwood@trustmarkleasing.com

Terra
12-26-1998, 12:28 AM
Due to the reconfiguration of the logging system - there will be no STATS run tonight...

I am still rewriting the nightly maintenance scripts to handle this new situation and they are not stable yet...

I apologize for any inconvenience due to this, and I am working diligently to get the STATS run back online by tomorrow evening (12/26)... Another monkey wrench is I have year end processing coming up in a few days so I'm incorporating the year-end routines as well...

--
Terra
sysAdmin
FutureQuest

zeegraf
12-26-1998, 06:13 PM
Still another reason FQuest has made me a loyal customer after only a week... potential problems are diagnosed and repaired BEFORE they can actually become a problem, and it's explained to everyone in a clear concise, not-too-technical manner. http://www.aota.net/ubb/smile.gif


------------------
Don Z.
www.zeegraf.com (http://www.zeegraf.com)

"To poldly mow air moebius
gumby four" --Kirk on Novacaine

Terra
12-28-1998, 09:24 AM
I have gotten the 'Stage2' customers caught up on the nightly stats...

I am still reworking the code to handle the 'Stage1' customers... I hope to have everybody caught up by tonights run...

You can tell if you are 'Stage1' by viewing your 'logs_web' directory and seeing if this file 'access' is the latest file and/or looking for a '.stage1' file...

--
Terra
sysAdmin
FutureQuest

------------------
www.FutureQuest.net (http://www.FutureQuest.net)
--FutureQuest goal: (10x+8)/(x+1)=9.99--
--The best way to predict the future is by inventing it--

Terra
12-29-1998, 03:54 PM
Update: Stage1 Conversions

I have started finalizing the conversion for the 'Stage1' customers... You can tell if you have been updated by looking in your 'logs_web' directory for this file: '.stage2-1'...

This conversion has turned out to be a much larger project then anticipated, due to these logfiles being -live-... I have to restart the Apache server for every domain that I convert, so I do this early in the morning (2am - 4am) as the service gets spotty due to the frequest restarts...

I have been getting apprx 10 domains converted per session and going in alphabetical order...

I wish there was a way to just flip a switch to make this happen, but it's a looooong tedious process that must be done by hand... http://www.aota.net/ubb/frown.gif

I make every effort to do my upgrades/conversions transparently - but this does not always work, since you may find your STATS not updated for the last few days...

If you have a desperate need for STATs to be manually updated - drop me an email at 'stats@FutureQuest.net' and I will place it priority... Otherwise it will follow the alphabetical conversion process...

Sorry for the inconvenience, in the end this will help increasing our stability...

--
Terra
sysAdmin
FutureQuest

------------------
www.FutureQuest.net (http://www.FutureQuest.net)
--FutureQuest goal: (10x+8)/(x+1)=9.99--
--The best way to predict the future is by inventing it--

Terra
01-02-1999, 08:42 PM
I have *finally* gotten everyone's STATS caught up...

This project was 7 non-stop days in the making - with a ton of recoding to handle the increased requirements to process the nightly Log/STATS runs...

The next level will be Stage3, which is tentative to start right before February...

The Log/STATS sub-system has been reworked and hooked for Stage3 takeover... This should ameliorate the processing time required as it increases as each day goes by...

e.g. Time required to crunch stats
Dec 1st (1.0 hrs)
Dec 31st (9.0 hrs)
Estimated improvements
Jan 1st (1.15 hrs) *I lose a little here due to increased volume*
Jan 31st (6.0 hrs) *but gain here*

Stage3:
Feb 1st (1.5 hrs)
Feb 28th (4.5 hrs)

You start to see the trend... http://www.aota.net/ubb/wink.gif

I would have never believed all the work that goes into generating the STATS, until I started designing the sub-system myself... This has been the second most difficult project that I've done for FutureQuest, with over 11 seperate programs that work in unison to walk/grok/slurp/rarp/hash/split/compress/crunch the log files... **sorry, just couldn't resist that outburst of techno-babble**

Many take the STATS for granted, but after being on this side of the fence and dug down deep in the trenches -- I will never take them for granted, *ever* again... http://www.aota.net/ubb/wink.gif

--
Terra
Bean Counter
FutureQuest.net

------------------
www.FutureQuest.net (http://www.FutureQuest.net)
--FutureQuest goal: (10x+8y)/(x+y)=9.99--
--The best way to predict the future is by inventing it--


[This message has been edited by ccTech (edited 01-02-99).]