FutureQuest, Inc. FutureQuest, Inc. FutureQuest, Inc.

FutureQuest, Inc.
Go Back   FutureQuest Community > General Site Owner Support (All may read/respond) > General FutureQuest Hosting Support
User Name
Password  Lost PW

Reply
 
Thread Tools Search this Thread Display Modes
Old 12-07-2003, 01:07 PM   Postid: 101974
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
can you put my cgi back please?

would appreciate it if you could re-activate my cgi-bin. Have e-mailed support for last three hours without luck.

I have deleted the script that you objected to. We can discuss that script afterwards but I would just like my cgi-bin back up and alive as at the moment, I have no:

- guestbook
- forum
- site search
- CNC and therefore newsletter sending ability
- newsletter subscription
- no amazon links working due to that script being pulled

so I now have a site which doesn't work to all intents and purposes and even when it is back up and working, no amazon means it can't pay its way.

I do understand where you guys are coming from on this one. It is just that I did all that was possible to prevent my cgi being hit by spiders, both through robots.txt and .htaccess. I even put a NOINDEX for seven days meta tag in the html pages produced by the script!

I hope we can work things out....just give me back my cgi please!!
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-07-2003, 01:13 PM   Postid: 101976
 Terra
CTO FutureQuest, Inc.
 
Terra's Avatar
 
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,672
What timing...

I was working up a followup to the FAN sent last night, with a more formal explanation of what happened... I am still sifting through the rubble though and will take a bit of time...

One of the first things I had done when arriving at the office was to re-enable your CGI ability (completed at 12:51 pm EST) since you had agreed to remove the offending script...

--
Terra
--sleeping is evil--
FutureQuest
Terra is offline   Reply With Quote
Old 12-07-2003, 01:23 PM   Postid: 101977
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
ok, great.

now we can start a civilised discussion about the future of THAT script.

I wanted to add a mod_rewrite to it which I then realised, due mainly to your intervention, would have caused mayhem on the server once I was hit with Google spiders.

So I never went down that road. And I further, pro-actively updated both my .htaccess and robots.txt to try and keep both google and other spiders from out of my cgi-bin in general and that script in particular.

Is there any way we can proceed from here?? What was the particular problem last night? What spider was it?

My site without amazon is a non-site. Simple as that. I can't picture a situation beyond having to shut the site down. Would I be entitled to a refund from FQ for unused credit. Pains me to be in this situation really.

The script in question is not one of these scripts you can find around which generates pages on the fly....it is an Amazon web services tool which I use in conjunction with .shtml pages which were generated automatically. I am sure you are well aware of how the script has been used. Would it be possible to put the script back on site if I delete the .shtml pages it has been used together with??

Hoping for a solution....thanks for the cgi reactivation
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-07-2003, 01:49 PM   Postid: 101982
 Terra
CTO FutureQuest, Inc.
 
Terra's Avatar
 
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,672
The script has no rate or limiting controls built into it, and is highly subjective to vast amounts of abuse...

It's runtime averages 3 seconds, with a memory footprint averaging 6.7MB...

It does not take much to hose that script... External blocking, as you have already done, is simply not enough... The rate limiting controls must go into the script itself, since our SRC (Spider Rate Control) is too liberal to keep this script well behaved...

Your script is rare in that it falls just outside of our SRC zone and if it was configured to trap your script - SRC would unfairly penalize 95% of all other scripts executed on our servers...

We had issues from 3 different IPs last night:
203.76.197.150
203.76.199.235
203.76.200.34

We would firewall the IP, then the person would drop the IP and pickup a new one... Each time our Server Guardian having to step in and shutdown *everyones* CGI when the server load hit 40.00+, all due to one script 'amazon_products_feed.cgi'... It became a whack-a-mole situation...

Overall, we have been having off-and-on problems with the 'amazon' style scripts and this is not the first time this particular script has ended up in the line of fire...

Even if you were to move, this problem is only going to cause issues elsewhere - creating a vicious cycle... The ultimate solution is to add the rate limiting controls to your script, since that is the correct and responsible thing to do... Consider it an evolutionary change...

--
Terra
--it is like adding a safety to an uzi--
FutureQuest
Terra is offline   Reply With Quote
Old 12-07-2003, 02:26 PM   Postid: 101983
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
what numbers are we talking about as far as making it "safer"...the script, that is.

if I can make it friendly only to humans, I will. Even it if means blocking the great Google in the sky.

if you can give me some hard numbers, I can go to the script support forum and begin to get it sorted this evening.

One other thing....was it not possible to disable just THAT script last night?? Or is that an ignorant question?
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-07-2003, 02:33 PM   Postid: 101984
mromero
Site Owner
 
mromero's Avatar

Forum Notability:
166 pts: Ambassador of Goodwill
[Post Feedback]
 
Join Date: May 1999
Location: Belmopan, Belize
Posts: 484
I am curious as to which script is being referred to. I recall reading on Webmaster World about someone who had used a script to corner (temporarily) the market on a particular Google search word. That is until he was found out and banned by Google.

Regards
mromero is offline   Reply With Quote
Old 12-07-2003, 02:41 PM   Postid: 101985
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
this is the amazon_product_feed.cgi which is very very VERY widespread on the net as a tool to implement Amazon Web Services.

It is apparently a tad too fast though and needs to be slowed down...

It is NOT the hack of the self same script which causes google to index hundreds of thousands of mod_rewrite versions of the standard apf pages, hence without the parameters that make the url's so SE unfriendly.
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-07-2003, 03:05 PM   Postid: 101988
 Terra
CTO FutureQuest, Inc.
 
Terra's Avatar
 
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,672
Quote:
what numbers are we talking about as far as making it "safer"...the script, that is.

Two posts to reference first:
SRC Nutshell
http://www.aota.net/forums/showthrea...3458#post93458

Rate Limiting Metrics:
http://www.aota.net/forums/showthrea...9212#post99212

General Goal for Rate Limiting:
No more than X 'search' instances per IP
No more than Y 'search' instances in operation total (all IPs inclusive)
Deny 'search' request if (1 min OR 5 min) server load is above 3.0 (groked via /proc/loadavg)
**Using either 1min or 5min, both have their pros and cons due to differences in rise and decay rates

Quote:
if I can make it friendly only to humans, I will. Even it if means blocking the great Google in the sky.

Extremely difficult to do, I can assure you... Most of the problems now are caused by 'ill willed' spider operators that try to cloak themselves as 'humans'... This forces server/script operators to look at the heuristics of their behavior instead of relying on specific items or pattern recognition...

e.g.:
"If it walks like a duck and talks like a duck, then it must be a duck"
no longer applies...

The main problem with Googlebot now is that it comes in via spider (wolf) packs... Each spider alone is server & script friendly, but have about 10 Googlebots (nicely) spidering your site, and you have a resource intensive situation happening...

I am not faulting Google over this, as the vast quantity of information they have to spider and index is astronomical... Therefore they do have to scale the task by issuing multiple bots to keep up with the web... I for one do not see an easy solution for Google, as what they are doing for the Internet is a GoodThing™...

Quote:
if you can give me some hard numbers, I can go to the script support forum and begin to get it sorted this evening.

X = 3
Y = 10
SL <= 3.0

The above values will offer your script a bit of latitude to operate, without thrashing the server...

'Y' is sufficiently high enough to cover bursting, however this is not to mean it can be a 'sustained' load with 10 of your scripts in memory all the time... The 'SL' should help to maintain that the 'Y' is for burst only.

Quote:
One other thing....was it not possible to disable just THAT script last night?? Or is that an ignorant question?

Disabling a single script is extremely dangerous, therefore it is now our policy to completely disable any/all CGI activity...

The main problem, is that another one of your scripts may be dependent upon the script we just deactivated... If it cannot execute that deactivated script, then it may not handle that condition gracefully (lost $$$ Orders)... The potential of data corruption (or lost data) is at a much higher risk when individual scripts are deactivated... By freezing the CGI, we remove pretty much all risk (and liable) of a BadThing™ happening...

Think of a shopping cart where the main cart scripts are active, however we have just deactivated the 'credit card processor' script... This is just one simple scenario, of millions, where things can just plain go wrong when a single script has been deactivated...

In short, we play it safe for all parties involved...

--
Terra
--But your honor, FQ got their Peanut Butter in our Chocolate - we'd like 1 million please--
FutureQuest

<EDIT: libel != liable>

Last edited by Terra : 12-07-2003 at 08:59 PM.
Terra is offline   Reply With Quote
Old 12-07-2003, 06:01 PM   Postid: 101995
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
is there anything external I can add to slow the script down and put a limit on it? Any general Unix command I can put in my .htaccess or at the top of the script itself????
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-07-2003, 06:30 PM   Postid: 101996
dank
Registered User

Forum Notability:
410 pts: Community Guru
[Post Feedback]
 
Join Date: Mar 2000
Location: MWV
Posts: 3,986
If this is one of the typical Amazon XML scripts, I've got a few thoughts on the matter. First, even if the server allowed it, I wouldn't recommend running it as is for the simple reason that it can be incredibly tedious for the visitor. Sometimes it's snappy, but other times there are very lengthy delays that can keep an entire page from loading. Not good.

My solution was to set up a customized version of one of the scripts to run nightly via a cron job, grab the XML feed, copy the product images locally and update the database with product info (in case prices or anything changed), and serve up that product info when the pages are visisted. The odds of a title or price changing more than once in a day are pretty slim, so it ought to be as good as a live feed as far as the visitor is concerned.

Now, I'm only doing this for half a dozen books, so the nightly update is pretty minimal. If you're trying to pull the feed for a large directory, that could be ugly. Maybe pull the most oft-visited products nightly, and the less popular ones on demand? That would lessen the server load due to regular visitors, but it still wouldn't address spider control...

Dan
dank is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 09:31 AM.


Running on vBulletin®
Copyright © 2000 - 2013, Jelsoft Enterprises Ltd.
Hosted & Administrated by FutureQuest, Inc.
Images & content copyright © 1998-2013 FutureQuest, Inc.
FutureQuest, Inc.