|
|
|
08-03-2001, 09:17 PM
|
Postid: 50991
|
|
Visitor
Join Date: Apr 2001
Location: Manchester UK
Posts: 44
|
Major Problem
Help
I have a major problem and I am hoping somebody can find an answer to the problem ASAP. Something is calling the home page of my site repeatedly and it's using heaps of my bandwidth.
I have just logged some of the stats for the home page and 99.9% of the hits are coming from various IP address all with no referring URL and no user agent. In the first 2 days of this month my home page has been loaded 214,083 times, the whole of last month my home page was loaded 136,564 times. The same thing is happening with the bandwidth, 2,798,145 KB of bandwidth has been consumed by my home page in the first 2 days of this month, the whole of July it used 944,956Kb, a very big difference.
I would like to think that my site has just become incredibly popular, but the rest of my site is running like normal, normal amounts of hits etc.
I have now replaced my homepage with a very small gateway page to conserve bandwidth until a solution can be found or I have to take the site offline.
Does this new NT based worm cause this problem?
Is this somebody trying a DOS attack on my website?
I really don't know what to think, I do know if this keeps up I will need to take out a very big loan to pay for the bandwidth costs.
Help
Pete Kelly
|
|
|
08-03-2001, 09:20 PM
|
Postid: 50992
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,683
|
and the domain you are referring to is?
--
Terra
sysAdmin
FutureQuest
|
|
|
08-03-2001, 09:34 PM
|
Postid: 50993
|
|
Visitor
Join Date: Apr 2001
Location: Manchester UK
Posts: 44
|
Sorry
http://www.trafficg.com
Thanks
Pete Kelly
|
|
|
08-03-2001, 09:52 PM
|
Postid: 50994
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,683
|
Ok, now it makes sense as your account has become a magnet for personal spiders or offline downloaders...
I have to frequently address this account from time to time as the spiders come in and unleash with rapid transversal... Since you are mostly dynamic content, then it places a strain on the servers...
9 out of 10 times, I deal with the spider transparently by blocking it and move along to the task it interrupted, but it looks like they are using methods of circumvention that I cannot easily stop...
In conclusion, yes - I know your site well... It has become a spider heaven, and is currently on my radar screen... If the site starts ramping up and adversely affecting the server, then I will let you know your domain is in jeopardy and for you to do whatever it takes to get it under control...
We cannot do this for you as this is well beyond our realm of responsibility as FutureQuest's focus is ensuring that the servers are online and well maintained... You would need to contract the services of a professional to assist with your web sites development and/or seek out expert advice... It's all part of operating one's online internet presence...
--
Terra
sysAdmin
FutureQuest
|
|
|
08-03-2001, 10:09 PM
|
Postid: 50995
|
|
Visitor
Join Date: Apr 2001
Location: Manchester UK
Posts: 44
|
Hi Terra
Quote:
|
In conclusion, yes - I know your site well... It has become a spider heaven, and is currently on my radar screen...
|
Is there anything I can do to stop these spiders?
Quote:
|
If the site starts ramping up and adversely affecting the server, then I will let you know your domain is in jeopardy and for you to do whatever it takes to get it under control...
|
I would prefer to act now before it becomes a server problem, I am constantly optimising my site to make it as server friendly as possible. If banning all spiders from the site is the answer then so be it, I can live without spiders.
Pete Kelly
|
|
|
08-03-2001, 10:45 PM
|
Postid: 50996
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,683
|
Quote:
|
Is there anything I can do to stop these spiders?
|
Nope, not really... For reasons wrapped in my second answer...
Quote:
|
If banning all spiders from the site is the answer then so be it, I can live without spiders.
|
The problem is that the defense measures can only go so far... What you are seeing now is that they are either:
a) Emulating or Masquerading as common browsers with no identifying marks that they are a spider...
b) (As you have seen) They are not sending any browser/agent type information at all...
If I had the bulletproof answer to blocking 'hostile' spiders/bots, I would have done it long ago... However, it is not an exact science and you can only work with what the client sends...
The other defense is wrapped up in 'heuristics' and 'behavior' pattern, unfortunately the overhead trying to determine this (per request) would only turn our servers into a vast tar pit of slowness... Short of giving each request a Turing test, there is not much you can do...
--
Terra
--It's like trying to use a sledgehammer to drive a thumb tack--
FutureQuest
|
|
|
08-03-2001, 10:59 PM
|
Postid: 50998
|
|
Site Owner
Join Date: Aug 1999
Location: Metro Los Angeles Area
Posts: 7,398
|
As mentioned in this thread
http://www.aota.net/forums/showthrea...&threadid=3959
I had an agent, MSIE Crawler, from the same IP address that was banging away on my site for hours at a time, getting a 403 Forbidden, and continuing to request the site. (It was an IRM and so there is some sort of wierd re-direct looping problem in that case with a 403 error.) I finally had to take matters into my own hand to get rid of this creep.
The key is, that I knew the IP address and could block on that IP.
Here is what I did:
Along with other commands, I put this in my .htaccess file:
Code Sample:
ErrorDocument 403 /cgi-bin/k12groups/403.py
deny from 128.158.104.168
deny from 65.27.182.165
|
|
Then, in my /cgi-bin I put this script:
Code:
#!/big/dom/xthinkspot/bin/python2.0
import os, httplib
denyList =['128.158.104.168', '65.27.182.165']
if os.environ["REMOTE_ADDR"] in denyList:
print "Content-type: text/html"
print
print "<HTML><HEAD><TITLE>403 - Go away</TITLE></HEAD><BODY>"
print '<a href = "http://www.microsoft.com">Go away. </a>'
print "</body></html>"
else:
f = open("403.html", "r")
output = f.read()
f.close()
print "Content-type: text/html"
print
print output
What this code does, is if someone in the Forbidden IP list requests a page from my site, it prints a page with a link to microsoft.com that says, "Go away." For eveyone else, it opens up the 403.html file in the same directory, and displays that content instead.
I see that while I was trying to post this, that Terra has already told you that there is really nothing you can do. I guess he says that, because if your site is attracting spiders, they will probably just keep coming and it will be like playing whack-a-mole to keep adding IP addresses to the deny list. Anyhow, this worked for me, since it was only one person doing it and I knew the IP address in question.
Last edited by sheila : 08-04-2001 at 12:12 AM.
|
|
|
08-03-2001, 11:55 PM
|
Postid: 51001
|
|
Visitor
Join Date: Apr 2001
Location: Manchester UK
Posts: 44
|
Personally I don’t think this is a spider, it is not attempting to access any other pages, just my homepage. I have looked at my log file for August the 2nd and I have 102,000 lines that look just like the one below, the IP addresses are constantly changing though.
65.7.80.144 - - [02/Aug/2001:00:00:03 -0400] "GET /?did=150&ver=1.51&duid=elybdrqmehqdaiyxoqdppsimaeuqw HTTP/1.1" 200 0 "-" "-"
I have now set-up a script on my homepage to look at the user-agent, if the user-agent is empty it just exits and serves no content to the program.
Pete Kelly
|
|
|
08-04-2001, 12:50 AM
|
Postid: 51004
|
|
CTO FutureQuest, Inc.
Join Date: Jun 1998
Location: Z'ha'dum
Posts: 7,683
|
Then that is a 'brain damaged' spider that has gotten itself stuck and continually re-request the same page...
I've seen this happen many times, where someone sets a 'personal download', fires it up, walks off and goes to bed...
So yes, in your case, block the IP...
I would caution on wanton blockage of "-" referrer or agent string as many people are 'scrubbing' their web browsing with privacy filters that strip this information... However, this is easily another Holy War between the pros and cons of people doing as such... Personally, I think 'washing' the referrer is -ok-, but 'washing' the agent is not...
--
Terra
--I'd like to slip them a Brillo pad--
FutureQuest
|
|
|
08-04-2001, 08:52 PM
|
Postid: 51038
|
|
Visitor
Join Date: Apr 2001
Location: Manchester UK
Posts: 44
|
Hi Terra,
Quote:
|
I've seen this happen many times, where someone sets a 'personal download', fires it up, walks off and goes to bed...
|
It wouldn't be too bad if it was just over night but when it has gone on for 3 days it's a bit much.
Quote:
|
I would caution on wanton blockage of "-" referrer or agent string as many people are 'scrubbing' their web browsing with privacy filters
|
I have only blocked the agent not the referrer, I would have blocked the IP addresses but they seemed to be constantly changing and coming from many different ISP's. It seemed to be more like a Trojan/Robot DOS attack than a search engine spider.
Quote:
|
I know your site well... It has become a spider heaven
|
Do you want me to modify my robots.txt and place no index robot tags in all the html, would this help?
The last thing I want to do is to cause you any hassle or to put the server at risk in any way. I only get a very small proportion of my traffic from search engines, so if you want me to ban as many spiders as possible then that’s what I will set out to do. I can live without the 300 visitors from search engines per month.
Thanks
Pete Kelly
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 visitors)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -4. The time now is 05:22 AM.
|
| |
|
|
|