FutureQuest, Inc. FutureQuest, Inc. FutureQuest, Inc.

FutureQuest, Inc.
Go Back   FutureQuest Community > General Site Owner Support (All may read/respond) > General FutureQuest Hosting Support
User Name
Password  Lost PW

Reply
 
Thread Tools Search this Thread Display Modes
Old 12-08-2003, 02:52 PM   Postid: 102062
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
dank, stephen...thanks for those suggestions. dank, I hadn't even seen your post. There must have been two posts I hadn't seen and when I clicked the time of the last post on the forum index, it took me straight past yours. Sorry for that!

I will get back to the author although I am not too hopeful. There is already some sort of caching device on the script but it clearly isn't powerful enough. I don't know how the caching works but when you consider the hundreds of thousands of items that amazon has, the chances of caching the same product as someone else wants a bit later is quite small.
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-08-2003, 03:22 PM   Postid: 102068
 Terra
CTO FutureQuest, Inc.
 
Terra's Avatar
 
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,678
Quote:
once the script is open the damage is done. limiting the number of concurrent instances must be handled before the script is ran. a web server mod could do this. i just don't see how this could be handled from within the script itself.

*sigh* - figures they would take that (easy-way-out) stance...

I am now pretty much concerned about the aptitude/skill level of the program's author - and if even such a program should even be publicly available to begin with... Seems that the author is following the stick-your-head-in-the-sand-avoidance-plan and pray-it-just-goes-away...

No - the damage has not already been done yet - only the cost of spawning the CGI script...

If there is rate limiting built into the script, it will execute these functions *first*... If the spider has exceeded the limits imposed, then kick back a '403' status code... If the spider gets caught in a loop, then redirect them to microsoft.com or something and let them spin their wheels over there...

One of my favorite quips is:
"No Bond, I expect you to die!"

In short, if your script has ~3 seconds (sometimes more) of runtime, then by running the rate limit code first, then the total runtime would be well under 1 second, not a guaranteed 3 seconds per each invocation... It is not so much the memory that is hurting, but rather the runtime is what is causing the log jam...

If the spider is really brain damaged, then when it reaches a point - kick out an email to yourself... Most likely by this time, it has already hit my radar screen and will most likely earn a slot in my firewall...

Quote:
the chances of caching the same product as someone else wants a bit later is quite small.

Caching is not going to help in an appreciable way, since the behavior of spiders is to walk through a site consecutively... The only way caching would help here is if two spiders walked the same path back-to-back, where spider #2 would catch the items cached via spider #1 romp...

Overall though, it is a good idea to pull from a local cache, if Amazon provides a way to check to see if your cached item is current or stale... If stale, toss the item and re-retrieve from Amazon, else avoid the network interaction and provide the cached item...

--
Terra
--silicone - breakfast for the timid--
FutureQuest
Terra is offline   Reply With Quote
Old 12-08-2003, 03:28 PM   Postid: 102071
dank
Registered User

Forum Notability:
410 pts: Community Guru
[Post Feedback]
 
Join Date: Mar 2000
Location: MWV
Posts: 3,986
Quote:
if Amazon provides a way to check to see if your cached item is current or stale...
I'm not sure even that would be a big enough improvement. My experience with the Amazon XML feeds is there can be very lengthy delays communicating with the Amazon server from time to time. That would likely tie up the cache check, unless the script were written intelligently enough to give up after a set period of time (similar to what I've been trying to find a way to do in the PHP forum).

Dan
dank is offline   Reply With Quote
Old 12-08-2003, 03:50 PM   Postid: 102077
Stephen
Site Owner
 
Stephen's Avatar

Forum Notability:
89 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Feb 1999
Location: L.A.
Posts: 671
true enough, you can't cache the pages of millions of items. if you can't predict approximately (i.e. mostly) what the user is retrieving then caching loses its value. on the other hand, if you are specializing in a genre, such as "music of the sixties" the set of possible pages to cache goes down dramatically and the likelihood that two users are interested in the same item goes up.

personally, i think that sites that mirror most of what Amazon.com has to offer are somewhat pointless. a well-focused interest site that adds content that Amazon cannot, on top of the offered items makes much more sense, and is more likely to be useful/successful in the long run. these sites would probably focus on less than a few thousand items, where caching makes sense. and, of couse, you can't cache everything. the results of a "search" are usually entirely unpredictable, though detail pages for the items listed might well be cached.
Stephen is offline   Reply With Quote
Old 12-10-2003, 10:53 AM   Postid: 102198
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
if anyone knows this script, amazon_products_feed.cgi and would like to earn a few bucks making it "safe" to avoid the problems discussed above, get in touch with me through my site, esl-lounge.com, or through this forum. This script is vital for the financial survival of the site.

Thanks
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-10-2003, 04:45 PM   Postid: 102212
Wassercrats
Site Owner
 
Wassercrats's Avatar

Forum Notability:
291 pts: An Honor To Be Around
[Post Feedback]
 
Join Date: Nov 2001
Posts: 7,089
Here's something in PHP. Maybe you could get someone to intergrate it with Amazon's script. There's a post in that thread that gives a tip on how to convert part of it to Perl. I haven't read most of the thread though.

You'll have to click a Google link to get it. http://www.google.com/search?sourcei...le+accesses%22

Even better--takes you to the first page of the thread: http://www.google.com/search?hl=en&l...%27t+honour%22
Wassercrats is offline   Reply With Quote
Old 12-10-2003, 09:03 PM   Postid: 102226
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
thanks Wassercrats. That looks the sort of thing I am after.

Terra, is the script mentioned in these WebMasterWorld threads that Wassercrats linked to sufficient to safeguard the server????
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 12-12-2003, 04:01 AM   Postid: 102308
esc
Site Owner
 
esc's Avatar

Forum Notability:
192 pts: Ambassador of Goodwill
[Post Feedback]
 
Join Date: May 2000
Location: Vienna, Austria
Posts: 333
You could perhaps embed the link to your script into a small Flash movie which will prevent bots from triggering the script while impose no problem to most (> 98%) human users. I doubt that bots are so sophisticated that they can parse SWF files.

You might even secure the cgi-bin with password and call your script with ‘user:password@mysite.com/cgi-bin/myscript’ so that it cannot be used directly.

An other more elaborated option would be to recode the whole thing totally in Flash ActionScript which is quite good in XML parsing. So the load would be shifted from the server to the client-side. Perhaps some people are already working on this, as you can find free Flash newsfeed readers that parse RSS which is a similar technology.

Erich
esc is offline   Reply With Quote
Old 12-12-2003, 07:41 AM   Postid: 102311
esllou
Site Owner
 
esllou's Avatar

Forum Notability:
102 pts: Helpful Contributor
[Post Feedback]
 
Join Date: Sep 2002
Location: Buenos Aires
Posts: 320
thanks for that esc.

will be a looooooong weekend thinking about all of this.

have a good one yourself....
__________________
Neil
www.esl-lounge.com
esllou is offline   Reply With Quote
Old 02-15-2004, 02:44 PM   Postid: 106644
mromero
Site Owner
 
mromero's Avatar

Forum Notability:
166 pts: Ambassador of Goodwill
[Post Feedback]
 
Join Date: May 1999
Location: Belmopan, Belize
Posts: 484
Following up on this post, anyone using the Anaconda or the Cusimano scripts for Amazon? As these are commercial scripts as opposed to free, presumably they are more server friendly?

Regards
mromero is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -4. The time now is 08:09 AM.


Running on vBulletin®
Copyright © 2000 - 2013, Jelsoft Enterprises Ltd.
Hosted & Administrated by FutureQuest, Inc.
Images & content copyright © 1998-2013 FutureQuest, Inc.
FutureQuest, Inc.