View Full Version : Twisted PageGetter
Stecyk
08-15-2005, 12:49 PM
Hi,
I am not sure if this is the correct place to ask this question. If not, moderator please move the question to a more appropriate spot.
I have some bot that is coming by every minute. I have got 360 hits from this bot this morning. I just noticed it has been taking a break during the last ten minutes. It is probably catching its breath and gathering up its strength for another run.
6X.1XX.75.XX - - [15/Aug/2005:11:31:22 -0400] "GET /blog/index.rdf HTTP/1.0" 200 36832 "-" "Twisted PageGetter"
Same IP address every time. Is there a way to slow this bot down or block it?
Best regards,
Kevin
sheila
08-15-2005, 01:11 PM
The Twisted PageGetter is just a component of a free, open source Python framework called Twisted (http://twistedmatrix.com/). There's nothing inherently bad about that user agent. However someone bad/stupid may be using it.
You might try putting that IP address into a Deny command in your .htaccess file. If that doesn't work, you could try redirecting requests from that IP address elsewhere...although I don't know where...
Here are some previous, related discussions:
http://www.aota.net/forums/showthread.php?t=19209&highlight=deny+%2Ahtaccess
http://www.aota.net/forums/showthread.php?t=19777&highlight=deny+%2Ahtaccess
Stecyk
08-15-2005, 01:21 PM
You might try putting that IP address into a Deny command in your .htaccess file.
Can you please give me instructions on how to do that?
I didn't get much out of those referenced posts as they seem to be beyond my level of comprehension.
Best regards,
Kevin
Hello Kevin,
To create a .htaccess file please view this Knowledgebase Article:
http://Service.FutureQuest.net/index.php?_a=knowledgebase&_j=questiondetails&_i=121
Information regarding Denying access to IP addresses is contained here:
http://www.aota.net/forums/showthread.php?postid=92325#post92325
Hope this helps,
Bob
Here's your culprit:
http://xblabs.com
--
Don
Stecyk
08-15-2005, 02:04 PM
Thank you.
I already had a .htaccess file. I just didn't know how to deny an IP address. Stopped the bot dead in its tracks. Annoying.
I will remove that deny access in a few days and see if that person/individual/entity has gotten the hint.
I wouldn't have a problem, once an hour, or maybe even twice an hour, but every minute is excessive.
Again, thank you for your help.
Best regards,
Kevin
Stecyk
08-15-2005, 02:33 PM
Hi Don,
From the link that Don provided:
XB Labs
We're too busy to put up a proper website. For now, check out our first public project:
YouReadMe
That's a strong hint, no?
Don, how did you manage to track this "entity" down. Given that they are too busy to put up a proper website, I don't think that they are legitimate.
I going to keep them blocked for a lot longer. I don't feel the need to be included in their project.
Thank you Don!
Best regards,
Kevin
I did a reverse DNS lookup on their IP address at this site:
http://www.dnsstuff.com
Of course, that was before you disguised it. :smile:
--
Don
Stecyk
08-15-2005, 03:14 PM
Don,
Great, thank you!
I appreciate your efforts. :bow:
Best regards,
Kevin
Randall
08-15-2005, 09:39 PM
Given that they are too busy to put up a proper website, I don't think that they are legitimate. It looks pretty benign to me: So we created YouReadMe (http://www.youreadme.com/) and let the computer do the reading for us. It reads tens of thousands of blogs multiple times throughout the day and finds the really important posts: the ones linked to the most by other bloggers. But perhaps the "multiple times throughout the day" part could use some fine-tuning. :wink:
Randall
Stecyk
08-15-2005, 11:11 PM
It looks pretty benign to me: But perhaps the "multiple times throughout the day" part could use some fine-tuning. :wink:
My guess is that this site will simply scrape content from various blogs and then rebroadcast it on its site with some advertising. I haven't seen the advertising yet, but I suspect it is coming. The problem is, bloggers' content is usually copyrighted. All stuff is copyrighted the moment it is created, unless copyright is waived. So by taking bloggers' stuff and rebroadcasting it, especially with advertising, that site is stealing.
That aside, I think their model is doomed. Why would anyone want to go to a site that collects tens of thousands blogs that have nothing in common with each other. Wouldn't it be simpler to create your own list using Bloglines, Yahoo, Feeddemon, Newsgator, or whatever. Then at least you have content that interests you. So I can't see very many people having much interest in their site.
And yes, I agree, their multiple times throughout the day could use some tuning. If they simply scanned ten thousand sites three times a day, that would provide plenty of fresh content. And I might not have noticed or cared. But as it stands now, they're blocked and will remain blocked. :rasberry:
Thank you for the comments Randall! :smile:
Best regards,
Kevin
Randall
08-15-2005, 11:52 PM
That aside, I think their model is doomed. Why would anyone want to go to a site that collects tens of thousands blogs that have nothing in common with each other. A very good point -- when I first got to their site I wondered what the heck I was looking at. There seemed to be no logical (or thematic, or political, or even geographic) relationship between one post and another.
Then I read the About page and it made more sense. I think.
Randall :umm:
Wassercrats
08-16-2005, 12:06 AM
They have a good page rank and lots of content. You don't need a good website to make money advertising.
Stecyk
08-16-2005, 12:45 AM
They have a good page rank and lots of content. You don't need a good website to make money advertising.
That doesn't sound good, does it? :wah:
Wassercrats
08-16-2005, 01:02 AM
No, it doesn't. I've put off creating a cron job to fetch someone's newsfeed, which I was already entitled to republish according to the creative commons license, to give the author a few days to complain about it. He hasn't, so I guess I'll do it now. Just one fetch a day and I was reluctant.
I'll be running it in the middle of the night, and I'm still going to use skolnick's script (http://www.aota.net/forums/showthread.php?t=12265) to check the server load first because I'm afraid of Terra too.
Stecyk
08-16-2005, 01:21 AM
Just one fetch a day and I was reluctant.
I'll be running it in the middle of the night, and I'm still going to use skolnick's script (http://www.aota.net/forums/showthread.php?t=12265) to check the server load first because I'm afraid of Terra too.
I think one fetch per day is acceptable. A lot of people might even like the addtional publicity. But, when it is literally every minute, that is a bit much.
Good luck!
Randall
08-16-2005, 02:45 PM
I'm still going to use skolnick's script (http://www.aota.net/forums/showthread.php?t=12265) to check the server load first because I'm afraid of Terra too. Aren't we all? :rasberry:
Randall
# Oh no, not the turtles! Anything but that...
vBulletin® v3.6.8, Copyright ©2000-2009, Jelsoft Enterprises Ltd.