PDA

View Full Version : Searchbots - tagging and listing


garyamort
08-21-2007, 08:17 AM
In the vein of informing the customer, since FQ is in the position of dealing with malicious/badly designed searchbots - it would be nice if they could post a list of the search engine identifications they have discovered, linked back to whom the engine is searching for where appropriate.

That way site owners that want to restrict search engines would have a list that we could use and periodically update from as well as scouring our own logfiles and the internet.

Since generally the point of detecting a search engine is to deny it access to the site, slow it down, or give it specialised access, it would be a win/win for everyone. As the more site owners limit search engines, the lower the impact on the community servers of those search engines.

Arthur
08-21-2007, 11:13 AM
Bots that have caused problems;
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
"msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
"Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

Malicious bots;
"Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.5"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; en) Opera 8.54"

But seriously, the bots that cause the most problems and thus are most visible to us are either the well-known SE bots flocking together on one or more heavy sites, or bots that don't want to be identified as bots.

Bots that are up to no good generally try not to attract attention and stay under the radar as much as possible.

-Arthur

Andilinks
08-21-2007, 03:34 PM
But seriously...Whew, I needed that Arthur.

Using Mach5 (http://www.mach5.com/) I check visiting IP's regularly and it is easy to identify bots that are not behaving like humans and are wasting my bandwidth. Most that do this arrive only once with an IP and are never seen again.

I only block the IP's of those that appear repeatedly. Blocking all the IP's would create an unweildy .htaccess file.

Mach5 lets me filter for 403's, so every few months I screen for IP's that continue to arrive and trim the deny list of those that have disappeared to make room for new bot IP's.

This has been working for me, I hope it helps.