View Full Version : Stats show LOTS of hits on two days
sheila
02-20-2001, 11:48 PM
OK, now this is very strange.
My website doesn't get all that much traffic. Which is fine. It's a hobby, mostly. No revenue. Just expenses.
Anyhow, I look at my stats a few days ago, and the graph is REALLY skewed to this month. Really high, all of a sudden, just for Feb.
So I look at the stats by day, and there were more hits on Feb. 14th than I normally get in a whole month. When I looked at the stats for User Agents, there is one that is used more than any other:
428887[nbsp][nbsp]96.20%[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] 0[nbsp][nbsp] 0.00%[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]127787608[nbsp][nbsp]59.74%[nbsp][nbsp]| Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90; MSIECrawler)
OK, that day is wasn't as high as 96%. But today, it happened again.
Basically, it seems like someone is downloading my whole website. Is this anything to be concerned about? Should I worry about blocking this person from my site???
I don't mind too much, but I don't want to have any problems with bandwidth (which is a joke, given my current stats. I could share with several people), and it bothers me a bit that someone would download all my content. Why would anyone do that?
bellgamin
02-21-2001, 03:09 AM
Heavy hits of Feb 14. Valentines Day. Hmmm, very interesting.
I have nothing to offer on your question, but I will be watching to see if anyone offers possible answers. It's a fascinating situation.
Aloha
Terra
02-21-2001, 03:18 AM
Yes - strange indeed because I have 'No Trespassing' signs up for that bot...
I decided to dig deeper and have found the problem...[nbsp][nbsp]If you look at your logfile for the 14th, you will see that it went into an *infinite* loop, pulling the same file over-and-over-and-over:
GET /403.html
Each GET cost you 299 bytes...
To journey forward, I do in fact deny this particular Bot...[nbsp][nbsp]So in fact, there was nothing wrong with my Apache Bad Bot mechanism...
Then I looked at your .htaccess file:
ErrorDocument 403 /403.php
In there, you make a few tests and force a 'Location:' redirect right back into your domain...
Remember the old fire engines that run into the wall and are supposed to turn themselves around???[nbsp][nbsp]Microsoft's MSIECrawler is incredibly stupid and just sat there and kept pounding it's head in the wall...
MSIEC --> thinkspot.net --403--> thinkspot.net/403.php --> thinkspot.net/???/403.html --403--> thinkspot.net/403.php --> thinkspot.net/???/403.html --403--> ..... ..... ....[nbsp][nbsp]Feel free to draw the infinite loop here .... ... .....
Now imagine this idiotic MSIECrawler in the hands of someone with a T-3 and ripping through a large Message Forum (like the UBB) and get caught in an endless loop...[nbsp][nbsp]It is *NOT* pretty as the server can quickly be overwhelmed before it's had a chance to react...
Overall, the following statistics are:
Dim Wit: 65.27.220.54
start: 2/14/01 03:33:01
__end: 2/14/01 09:31:42
Requests per Second: 12 (avg)
Bytes per Request: 299
I never saw it as this was just a lot of ErrorDocument redirections and a small bit of PHP 4...[nbsp][nbsp]Now if this was CGI, or heavier PHP - then your domain would have set off a 4 alarm fire...
From here, you need to pull out the Infinite Loop for any potential denials going to your domain...[nbsp][nbsp]Do *not* redirect out of an ErrorDocument handling routing, as strange things will happen (detailed above)...
In short: you created a self-referential mobious...
--
Terra
--The definition of insanity is doing the same thing over
and over and over and over again, but expecting a different result.--
FutureQuest
<EDIT: trying to explain things after a 34 hour shift doesn't always produce the best grammatical results>
[This message has been edited by ccTech (edited 02-21-01@03:53 am)]
sheila
02-21-2001, 09:23 AM
Wow. Thanks for the personal attention.
I guess I will have to go back to that 403 document and write in some testing code for when to simply .. um .. exit the loop .. somehow?
Hm. It was a long time ago I put that one up, and I haven't written PHP in a long time. I'm open to suggestions on what to test form
Perhaps, test for useragent and redirect it somewhere else? (Like Yahoo?)
Please advise, anybody.
Here is the code in question. My .htaccess file redirects the 403 call to this page:
<?
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]if(eregi("^/materdei.*", $REQUEST_URI))
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]{
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] header("Location: http://www.thinkspot.net/materdei/403.html");
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] exit;
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]}
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]elseif(eregi("^/sheila.*", $REQUEST_URI))
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] {
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] header("Location: http://www.thinkspot.net/sheila/403.html");
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] exit;
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]}
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] else
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]{
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] header("Location: http://www.thinkspot.net/403.html");
[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]}
[nbsp]?>
</body>
</html>
Basically, I have three different sites, and it redirects to a different 403 page depending on which one is calling the error document.
Hm. I could put separate .htacess files in each subdirectory, instead of using php to handle this. But would it really prevent such a thing from happening again?
I mean, is this a stupid way I have of handling the 403 errors, and is there something *I* can do to prevent it from happening again?
sheila
02-21-2001, 09:45 AM
OK, I shouldn't post right when I get out of bed at 6 am.
I understand (now) what Terra was saying:
"Do not redirect out of an error document."
I should take out the php page and handle all error docs through my .htaccess files.
That page that I have up for 403 errors comes out of a discussion here on this UBB from way back when I was fairly new here.
Here are the discussions in question. Perhaps someone, like an FQ representative, should post messages on those threads, that that is a BAD way to handle things?
http://www.aota.net/ubb/Forum15/HTML/000161-1.html
http://www.aota.net/ubb/Forum15/HTML/000113.html
I will clean up this mess later. I think I know what to do to handle it through .htaccess. That will avoid re-directs, since I will tell the server exactly what document to serve, rather than tell it a document that then sends to another document.
<edit: It's not 7am, yet --in CA-- and I still haven't had my coffee>
[This message has been edited by sheila (edited 02-21-01@09:46 am)]
tedloh
02-22-2001, 12:41 AM
I had a *major* spike on the 12th - and yet the bandwidth went LOWER than normal.[nbsp][nbsp]Could it be that something similar happened to me or did Cupid fire a premature and wayward arrow? :)
Must've been an omen, because as of today my traffic is about to match that spike for real reasons - but I would really like to know what blindsided me on that day...
------------------
Ted (Chief Do-It-All)
Got2Bet.com - The Net's Winner's Circle
http://www.got2bet.com
ted@tygresystems.com
sheila
03-11-2001, 03:44 PM
It happened again.
And I have NO redirects on any of my error documents. I have removed them all.
My stats for 3/10/01, which I downloaded, unzipped, and analyzed myself, show the following:
Starting with this first 403 request yesterday:
24.88.156.153 - - [10/Mar/2001:15:47:11 -0500] "GET /403.html HTTP/1.1" 302 299 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90; MSIECrawler)"
until this last one:
24.88.156.153 - - [10/Mar/2001:23:27:04 -0500] "GET /403.html HTTP/1.1" 302 299 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90; MSIECrawler)"
There were, during that iterim, a total of 408817 of the document 403.html , which has NO dynamic content, NO redirects of any kind.
So, in approximately 8 minutes, there were 408,817 requests. Apparently all came from the same IP number, which is an end-user with Road Runner (cable? I assume?).
I sure hate how it screws up my graphs on the stats package. :(
This time we can't blame it on my re-directing error documents. So, now what do we blame it on?
sheila
03-11-2001, 03:52 PM
OK, on further viewing my stats pages, I find the following, under the URL analysis page:
409972[nbsp][nbsp]98.01%[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp] 0[nbsp][nbsp] 0.00%[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]122174640[nbsp][nbsp]74.02%[nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp][nbsp]0[nbsp][nbsp]| Code 302 Moved Temporarily
OK, so these are not showing up under 403 requests, but rather, they are showing up under 302 requests.
Does this have something to do with having an IRM?
Maynard
03-11-2001, 09:22 PM
a new MicroSoft 'feature' ?
sheila
03-16-2001, 08:01 AM
Could I get some sort of response on this, please? It happened again on the 14th. Please read my last two posts from the 11th.
I'd like to know if there is something that can be done to prevent massive numbers of hits occuring due to 403 errors.
We are talking on the order of 40,000 hits on one day due to 403 errors for a site that normally gets less than 2000 per day. My graphs on my stats are all screwed up.
vBulletin® v3.6.8, Copyright ©2000-2012, Jelsoft Enterprises Ltd.