PDA

View Full Version : Block Bot


Stecyk
07-03-2007, 11:57 PM
Hi,

Will the following code block the following bot?

BrowserMatchNoCase Mozilla/4.0 stupid_bot3
Deny from env=stupid_bot3

64.28.xxx.xxx - - [03/Jul/2007:03:39:43 -0400] "GET /beblog/archives/2005/09/blahblahblah.php HTTP/1.1" 200 33352 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"

Any guidance appreciated.

Best regards,
Kevin

kitchin
07-04-2007, 02:01 AM
Looks like "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" is a really common string, it's (pretending to be) IE 6 on WinXP.
http://www.user-agents.org/index.shtml?m

And... your pattern "Mozilla/4.0" would match pretty much every IE6, IE7, IE5, IE4, etc. And it should be "Mozilla/4\.0" but that doesn't matter. :(

Stecyk
07-04-2007, 05:24 PM
So what you are saying is that I can't lock in on this bot and close the door. He's cloaked himself to be like all the other user agents. Is that the correct interpretation?

kitchin
07-04-2007, 05:57 PM
Yeah, so what's left. The IP address 64.28.xxx.xxx. Also, the bot's behavior. That's it. To distinguish bots from people, or at least bots you don't want, could you use some kind of token or captcha at the front door?? Or do your own rate limiting per IP?

Did you already do the robots.txt thing?

Stecyk
07-04-2007, 07:00 PM
Hi,

Robots.txt, yeah I do that. But this bot is from Russia someplace. I don't think it is going to be a well behaved bot, especially since it doesn't identify itself.

As far as using captchas, I think the cure is worse than the disease. I don't want to discourage legitimate people from looking at my weblog.

Or do your own rate limiting per IP?

I don't understand the question, so I am sure I don't.

I have blocked the entire range of ip addresses from that bot's isp. That said, I occasionally analyze my logs, and I note that occasionally a specific article gets hammered all day from all over the globe. They try to insert comments and trackbacks. My weblog automatically rejects comments and trackbacks if another was entered within the last two minutes. Moreover, the weblog checks two spam services to see if the ip address is a known spam address. On the trackbacks, the trackback must have originated from the source it is tracking back to. Given that and the previous requirement, only spam three or four trackbacks have made it through in the last year. As far as comments are concerned, they are all moderated (wait for review until they are published). So nothing gets through.

All that said, my visitor traffic from a year ago has doubled, though it is still quite low. But my bandwidth has gone up six or seven times. I am still okay and not getting too close to my limits yet. Periodically I go through my logs and try to eliminate those that are wasting bandwidth and server resources.

Another trick I employ periodically is renaming my trackback and comment scripts. It takes many of the bots a few months to catch on.

So I am just trying to see what I can do reasonably easily to reduce my bandwidth (and server load).

Best regards,
Kevin

Monty
07-04-2007, 07:03 PM
this is just an observation, but today seems to have been run over with various bots indexing things. I have one site that has all of maybe 10 people on it, but today, I was the only human on there for quite a while. Google and Yahoo seemed to be out in force. I can only guess they are attempting a new search index while things are slow.