PDA

View Full Version : Yahoo spider crashing mysql on non FQ site


phppete
05-13-2007, 05:50 AM
We seem to be experiencing Yahoos spider triggering a mysql crash on a non FQ server. It is always one particular site and always the user agent (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

We don't have the luxury of separate mysql servers or SRC on this server so I was thinking of detecting Yahoo! in the user agent and adding usleep() for 1/2 second. Would 1/2 second be too long or too little to slow Yahoos spider? Would the spider just give up and not index the site?

Or would it be better to do as instructed here :

http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html

Ideally we don't want to mess up SE rankings.

kennylucius
05-13-2007, 11:14 AM
It never occurred to me that you could slow a spider that way. I assumed from G's spastic crawling that they sent for pages on an inhuman schedule regardless of the response they receive (at least in the short term).

Yahoo has always been well-behaved, and is supposed to honor the "crawl-delay" directive. Have they been ignoring that?

SneakyDave
05-14-2007, 12:36 PM
Yahoo's slurp has been going crazy on my sites lately too. 50 or 60 instances of it crawling around at any given time. I just wish there was a way to limit the number of visits from it, rather than block it completely.

phppete
05-14-2007, 12:41 PM
Yahoo's slurp has been going crazy on my sites lately too. 50 or 60 instances of it crawling around at any given time. I just wish there was a way to limit the number of visits from it, rather than block it completely.

There is http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html

My original question though is which is better, slow it down using usleep() or doing as instructed in the info. FQ already have SRC (Spider Rate Control) but the site in my original post is not on FQ.

If FQ didn't have SRC we would probably see lots more downtime, crashes, high server loads.