Quote:
|
once the script is open the damage is done. limiting the number of concurrent instances must be handled before the script is ran. a web server mod could do this. i just don't see how this could be handled from within the script itself.
|
*sigh* - figures they would take that (easy-way-out) stance...
I am now pretty much concerned about the aptitude/skill level of the program's author - and if even such a program should even be publicly available to begin with... Seems that the author is following the stick-your-head-in-the-sand-avoidance-plan and pray-it-just-goes-away...
No - the damage has not already been done yet - only the cost of spawning the CGI script...
If there is rate limiting built into the script, it will execute these functions *first*... If the spider has exceeded the limits imposed, then kick back a '403' status code... If the spider gets caught in a loop, then redirect them to microsoft.com or something and let them spin their wheels over there...
One of my favorite quips is:
"No Bond, I expect you to die!"
In short, if your script has ~3 seconds (sometimes more) of runtime, then by running the rate limit code first, then the total runtime would be well under 1 second, not a guaranteed 3 seconds per each invocation... It is not so much the memory that is hurting, but rather the runtime is what is causing the log jam...
If the spider is really brain damaged, then when it reaches a point - kick out an email to yourself... Most likely by this time, it has already hit my radar screen and will most likely earn a slot in my firewall...
Quote:
|
the chances of caching the same product as someone else wants a bit later is quite small.
|
Caching is not going to help in an appreciable way, since the behavior of spiders is to walk through a site consecutively... The only way caching would help here is if two spiders walked the same path back-to-back, where spider #2 would catch the items cached via spider #1 romp...
Overall though, it is a good idea to pull from a local cache, if Amazon provides a way to check to see if your cached item is current or stale... If stale, toss the item and re-retrieve from Amazon, else avoid the network interaction and provide the cached item...
--
Terra
--silicone - breakfast for the timid--
FutureQuest